Putting machine learning to use in natural resource management— improving model performance

. Machine learning models have proven to be very successful in many fields of research. Yet, in natural resource management, modeling with algorithms such as gradient boosting or artificial neural networks is virtually nonexistent. The current state of research on existing applications of machine learning in the field of social-ecological systems is outlined in a systematic literature review. For this purpose, a short introduction on fundamental concepts of neural network modeling is provided. The data set used, a prototypical case study collection of social-ecological systems—the common–pool resources database from the Ostrom Workshop—is described. I answer the question of whether neural networks are suitable for the kind of data and problems in this field, and whether they or other machine learning algorithms perform better than standard statistical approaches such as regressions. The results indicate a large performance gain. In addition, I identify obstacles for adapting machine learning and provide suggestions on how to overcome them. By using a freely available data set and open source software, and by providing the full code, I hope to enable the community to add machine learning to the existing tool box of statistical methods.


INTRODUCTION
By now, machine learning algorithms have proven themselves as powerful problem-solving tools in many domains. Examples include complex strategy games like chess and Go (Silver et al. 2017), strategic decisions under uncertainty against human players-e.g., poker (Brown and Sandholm 2019)-or complex cooperative games (Mnih et al. 2015). Other areas include autonomous driving, translations in many languages via a universal interlingua (Johnson et al. 2017), and image recognition (multiclass object detection) in the millisecond range, beating human performance (Le et al. 2012, Girshick 2015. For many scientific domains, the question arises as to whether and how these advances in machine learning can be applied to their own research questions. It has become apparent that application of machine learning algorithms varies greatly between individual disciplines. In particular, in the fields of natural resource management and social-ecological systems it seems that machine learning methods are still used rather infrequently. However, applying machine learning algorithms to natural resource management problems may result in various benefits: improving explanatory power for many models-thus, for example, being better able to distinguish important from irrelevant factors for successful management; generating more robust results by using different algorithms with the same workflow (see Discussion), and finally, providing an extension to the toolbox of methods for analyzing case studies.
Given that machine learning algorithms have demonstrated their potential for modeling in many fields (LeCun et al. 2015), I aim to estimate the potential of machine learning methods, especially deep neural networks, for modeling natural resource systems. I evaluate the general suitability or unsuitability in both theory (through a literature review) and practice (a systematic search for neural network architectures for a typical data set). By assessing the potential of machine learning in natural resource research and by summarizing the state of research and best practices as well as directions, future research may profit. Such evaluations have also been done for other research fields such as biology and medicine (Ching et al. 2018).
The next section of this article outlines the state of research on existing applications of machine learning in the fields of socialecological systems, (community-based) natural resource management, and common-pool resources in order to assess for which problems other authors have applied machine learning methods.
The Data section describes the data set used, a prototypical case study collection from social-ecological systems-the commonpool resources database from the Ostrom Workshop (n = 122). In the Results section, I discuss whether deep neural networks perform better than other methods in terms of model quality (goodness-of-fit). By comparing different architectures, it will become clear which kinds of networks may serve as a base for improved models in the future. After that, in the Discussion, I review whether neural networks could indeed be a methodological step forward in the area of natural resource management.
The Methods section provides a short introduction on fundamental concepts of neural network modeling in order to facilitate future analyses. A prototypical data set is analyzed by using many different variations (architectures) of neural networks to establish the general suitability of this method for natural resource management data.
To make the agenda more concrete, I strive to answer three research questions, in particular: " 1. Can shallow or deep neural networks achieve a decisive improvement in model quality compared to previously employed statistical techniques such as linear regressions?" The keywords that were searched for were "machine learning", "neural network", and "deep learning". For the more general journals, which, in contrast to the topic-specific journals, are not restricted to natural resources, I added to each of the three search terms the keywords "natural resource" or "social ecological". Using no quotation marks on both search terms resulted in thousands of irrelevant hits (e.g., on learning); using them for both search terms simultaneously was too restrictive and resulted in 0 hits. Some journal search engines interpreted phrases in quotation marks as logical "ORs", which resulted in many irrelevant hits (e.g., "machine learning" as machine OR learning).
The exact figures for each search combination are provided in Table A1. Typically, searches produced 20-150 hits. These were screened. If a hit seemed to be about any topic in natural resource management and used any kind of machine learning techniques, it was included in the final data set (Table A2). Of course, many other machine learning classifiers and algorithms exist (Elith et al. 2006, Fernández-Delgado et al. 2014). However, I was concerned with only the most widely used algorithms-neural networks, gradient boosting, and generalized linear modelssince even they are rarely used for natural resource management problems.
All in all, very few hits were found. Although the first screening resulted in n = 2.616 hits for topic-specific journals and 3.287 for more general journals, only 32 papers were about applying machine learning to natural resources in any way. This was rather surprising given the spectacular advances in other fields. This number proves that machine learning does not yet play a role in natural resource management. I discuss possible reasons for this in the Discussion section.
Before discussing the few relevant papers, I note that in many adjacent research fields such as renewable energies (IPCC 2018) or biodiversity research, machine learning methods, in particular neural networks, are used quite frequently. Typical fields of application include, but are not limited to, wind energy potential assessment, species biodiversity models, expansion models of species, spatial habitat modeling, evaluation of remote sensing data, and prediction of solar radiation.
One of the first applications of neural networks for socialecological systems was provided by Rusch (2013, 2014). For common-pool resources case studies, shallow neural networks are used to identify success factors. These papers also substantiate the claim often made that neural networks are able to cope better with nonlinearities between features than are regressions (Paruelo and Tomasel 1997). Very similar is the attempt to identify success patterns in fisheries with random forests ).
Among other uses of machine learning, two prominent topics for applying machine learning are modeling land use change (Cao et al. 2019, Saputra and for land use change in rivers, see ; for classifying habitats, see Václavik et al. 2013), as well as predicting and classifying fishermen behavior (Jules Dreyfus-León 1999, . For further details on these studies, see Table A2 and the Literature Cited section.
All in all, neural networks and random forests were the most popular techniques, while content-wise, predictive tasks for spatial patterns dominated. However, there were no commonly adopted workflows or any other kind of standards across papers.
Given these few existing attempts to make machine learning fruitful for natural resource topics, it is even more important to explore in practice whether neural networks can improve model quality. I therefore implemented many neural network architectures to explore this potential in more detail.

Data
The common-pool resources database was chosen for the test of neural networks and the method comparison described in the State of research section. It is a typical data set consisting of case studies of irrigation systems and fisheries (n = 122), and is available online (https://seslibrary.asu.edu/cpr). The idea is that it can stand for hundreds of other data sets that have a similar structure concerning number of variables, tabular structure, and concepts involved. Reference data sets are well-known from other fields, one famous example being the MNIST data set (http:// yann.lecun.com/exdb/mnist/), which serves as a benchmark for comparing performance of machine learning classifiers. In contrast to other data sets outside natural resource management, it is relatively small, but differences between cases are rather large, which means that pattern recognition via supervised learning is particularly suitable.
The structure of the common-pool resources database was developed at the Ostrom Workshop in Political Theory and Policy Analysis at University Indiana Bloomington. The data have been collected for several years and are the basis for perhaps the most influential analysis on social-ecological systems, Governing the Commons (Ostrom 1990). The database comprises about 500 variables that include demographic, geographical, social, cultural, climatic, economic, and technical details of irrigation systems and fisheries worldwide.
Ecology and Society 25(4): 45 https://www.ecologyandsociety.org/vol25/iss4/art45/ There were several reasons for selecting this data set. First, analyses have shown that the data set is typical for socialecological case studies (Frey 2018). Second, it has a sufficient number of heterogenous cases.
The 593 variables were aggregated; i.e., assigned to 24 abstract concepts, such as social capital, resource size, or participation opportunities. The details of assigning the variables to these concepts can be found in Frey (2018). One benefit of aggregating is that missing variables are no longer problematic, since existing variables within a concept can stand in for variables that are missing.
The dependent variable was ecological success. The variables it is composed of can be found in Table A4. All variables were normalized with zero mean and unit variance. This is a common step in data preparation for neural networks to avoid the problems of exploding and vanishing gradients.

METHODS
Given that neural networks usually work with thousands or even millions of data records, one important question to be answered first is whether neural networks are at all suited to the much smaller data sets that are typical of natural resource management.
It is yet unclear if the kind of data that are characteristic of collections of case studies (only a few hundred cases with a few hundred variables that can be aggregated to a few dozen concepts) require neural networks at all. This was one goal of this investigation.
Another important question is whether deep neural networks (with multiple hidden layers between input and output) are a suitable method to use for natural resource management. Perhaps nonmachine learning methods or very simple neural network architectures prove to be sufficient. Hence, I first introduce deep neural network architecture and shallow neural networks (only one hidden layer) before shortly characterizing other methods in order to compare their model fits on this data set, which is typical for case studies with many variables.
By now, a large variety of different architectures for neural networks exist (LeCun et al. 2015). Each type of neural network architecture is adapted to a certain kind of problem. For example, the best results on most image recognition tasks have been achieved using deep convolutional neural networks, whereas Long Short-Term Memory networks have proven to be superior to other architectures on time series analysis tasks (Hochreiter and Schmidhuber 1997). However, in principle, finding the right architecture is a matter of trial and error, especially parameter fine-tuning.
For tabular data, like those used in this article, shallow or simple deep neural networks with only a few layers have achieved good fits . Since other architectures are for other kinds of tasks, mostly highly specific, I have not further tested such architectures and have constrained my tests to feed-forward and deep feed-forward nets.
Fine-tuning such networks involves mainly adapting their hyperparameters. These are the "nuts and bolts" of a network. It is well-known that parameters like number of layers, number of hidden neurons, learning rate, or number of training epochs make a considerable difference for the final goodness-of-fit of a model (LeCun et al. 2015). In fact, besides feature construction or extraction (providing meaningful input data; e.g., by aggregating variables), hyperparameter tuning is one of the core steps of a typical machine learning pipeline.
Again, finding the best combination of parameters is a matter of trial and error. Traditionally, researchers manually tried out the most promising combinations. However, with increasing computing power and ever more complex models, this task has been outsourced to computers. This is called grid search.
There are three types of grid search: first, Cartesian grid search, where a discrete number of parameter choices (e.g., 10, 20, and 30 number of neurons, and 50, 100, and 150 epochs, which results in nine combinations) is calculated. The second type is random grid search, where values of parameters are drawn randomly from a range (e.g., number of neurons between 10 and 30; epochs between 50 and 150). Hence, the number of combinations is not fixed. Typically, the maximum number of models to be calculated is provided as a variable by the user. Third, Bayesian search, where resulting fits of parameter combinations are themselves optimized toward a decreasing error rate. This is not standard and has not yet implemented in most leading software packages (e.g., in Scikit-learn or SciPy in Python [Virtanen et al. 2019] It has been shown that random grid search usually yields better results than Cartesian search, which in turn performs better than manual tuning of parameters (Bergstra and Bengio 2012). Hence, I implemented a random grid search for a large parameter sweep. Since both methods are implemented very similarly in most software packages, changing it often means just changing one parameter. In h2o, for example, the parameter "strategy" of a grid search must simply be switched from "Cartesian" to "RandomDiscrete".
This systematic variation of more than 20,000 models tested (5000 runs x four methods) is necessary for three reasons: " 1. to be sure about the best kind of architecture, in general, for such data sets" 2. to provide very sound starting values for further parameter tuning by other researchers when modeling similar data sets " 3. to make the state-of-the-art goodness-of-fit for such kinds of models known; this makes it possible to use as a benchmark and a comparison to traditional models For all models, Table A3 presents an overview of the hyperparameters varied and the actual values of the best model. While more parameters have been tuned, those presented in Table  A3 are the most important ones. Thus, for most modelers, it might be sufficient to tune only those-the rest most probably result in only very minor improvements of model quality (< 1-2%).
While my main goal was to explore the untapped possibilities of neural networks for natural resource case study data, it could be that other machine learning algorithms might perform even better. For this reason, I provide a short comparison with another algorithm-gradient boosting, a high-performing variant of decision trees (Breiman 2001a), which are perhaps the most widely used machine learning algorithms, since they have a good https://www.ecologyandsociety.org/vol25/iss4/art45/ performance across a wide range of problems and are very robust against noise (Alpaydin 2010). In fact, in the natural resource management literature, as presented in the literature review in the State of research section, variants of decision trees are the most frequently used algorithm. Furthermore, their results are easily interpretable and feature importance is readily accessible.
In addition, since most case studies use regressions, I also compared the results with generalized linear models so as to be better estimate the performance boost that could be gained if neural networks are employed in natural resource research. By using a Gaussian distribution, the generalized linear models are identical to multivariate linear regressions, hence, are comparable to existing research. A description of parameters varied during grid search for model optimization are provided in Table A3.
All data were partitioned into two parts-a training (80%) and a test set (20%). This is standard practice in machine learning and is done to avoid overfitting. Overfitting means that a model may perform very well on the training data but is very weak on the new data (the test set) since it does not generalize very well; i.e., it captured too many details present in only the training set but not in the test set.
In addition, a five-fold cross-validation was performed. This means that a different 20% was held out for each of the five models while the training was done on 100% of the data, which was important for such a limited number of cases. Thus, metrics like goodness-of-fit are available for the training, the cross-validation, and the test sets.
I report the metrics of the test sets, which are standard, since they best explain how well the model performed on data it has not encountered before. Fig. 1 explains the workflow used. Since one of my goals was to make machine learning more widespread in the community of natural resource management and social-ecological systems, the choice of software was deliberate. I chose h2o, which is open source software (LeDell et al. 2020) and available for several programming languages; i.e., R, Python, and Scala with very similar structure and functions. Hence, adapting the R code in Appendix 1 for any of the major programming languages should be very easy-in fact, a matter of hours at most. It is a standard workflow familiar to any data scientist or machine learning researcher, so further developments should be very easy.

RESULTS
For each method, a random grid search was run for 5000 (500 batches at a time) iterations. Parameters were deliberately of a wide range so as to avoid missing good model parameter combinations ("casting a wide net"). Thus, each run represented a unique combination of parameters. The best 100 models/results for each method were selected (Fig. 2). Each combination of hyperparameters was considered one model. A first result is that model quality, in general, was very high. No median of machine learning models was less than 0.58, and the multivariate regressions were at a median of 0.27 (explanation of variance). The best generalized linear model has a goodness-offit of 0.52. As is known from other fields of research, machine learning algorithms are usually very close together in terms of explanatory value. This is true for the top-performing models of my data set with deep neural networks (0.89), gradient boosting machines (0.87), and shallow neural networks (0.84). However, there was a larger gap between the model quality of the regressions and the machine learning algorithms of about 0.32 (Table 1).
A second result is that deep neural networks were a bit better than shallow ones. The more complicated architecture with more hidden layers seems to have been responsible for finding even more general patterns in the data. As can be expected with such a small data set, there was some overfitting. However, the algorithms still generalized well on the test sets.
A third result concerns the optimal architecture (Table A3). The best deep neural network had four layers with 492, 13, 85, and 111 neurons, trains for approximately 400 epochs, and has a very high learning rate of 0.12. The number of layers and neurons determines the complexity of the problem the network is able to learn-the more layers and neurons, the more complex. However, there is a trade-off between more layers and neurons and better performance, since training time and computer resources also increase. More problematic than this, however, is that with increasing computing power of the network, overfitting occurs and generalizing abilities decrease. Finally, the learning rate defines the step size with respect to the change of weights. A higher rate means faster progress but may result in nonoptimal weights; a slower rate may result in a long training process and may get stuck in local optima.
Sometimes, combining the best, say for example five models, results in an even better predicting model. This technique of combining is called stacked ensemble. For each kind of machine learning algorithm, I calculated a stacked ensemble, altogether 40 models. However, their predictive power was not higher than the best-performing model. Thus, I do not report these results in further detail.
Hence, the results are clear-cut: " 1. All machine learning algorithms improved model quality in comparison to linear regressions." 2. The boost in model performance ranged from 35 to 40%."

3.
Deep neural networks (2-4 layers) increased model quality in comparison to shallow neural networks (one hidden layer only). The adjusted R 2 for this particular data set increased the goodness-of-fit by about 5%." 4. Gradient boosting machines are similar in performance to deep neural networks."

5.
Stacked ensembles that combine multiple models did not perform better than the best model for these kinds of tabular data."

6.
Model performance varied widely. A large parameter sweep (grid search) was necessary to identify good parameter combinations.

DISCUSSION
This comparative analysis has shown that machine learning methods, in general, and deep neural networks, in particular, may offer significant advantages for the analysis of larger collections of natural resource case studies. However, one limitation of this study is that it is unclear how well one can generalize from this particular data set to other data sets. A limitation of neural networks has been their black box character; yet, with modern algorithms, the influence of independent variables is no longer unknown. They are well capable of estimating each factor independently.
Machine learning methods offer not only substantial model improvements but also decision-making support-e.g., by visualizing the importance of variables in gradient boosting, and thus may help improve ecological sustainability. Their high performance is not surprising given their ability to deal with noisy data and nonlinearities. With various software solutions being available (e.g., Keras in Python or h2o [LeDell et al. 2020]), which no longer require deeper mathematical knowledge about the functioning of neural networks, implementing machine learning algorithms should pose no issues. Nevertheless, a good understanding of the problem and the respective methods that can be applied is necessary; otherwise, the interpretation of results leads to errors. This also applies to the choice of the architecture and the method itself, even if advanced commercial software packages like keras-automl or h2o-automl offer automated workflows.
However, despite these clear advantages, machine learning methods are very rarely applied in natural resource research. I identify three main reasons why: First, machine learning methods require large amounts of data. Therefore, individual case studies cannot be analyzed; instead, a data collection such as that available in a database is needed. In addition, these data must be fairly complete, since neural networks require complete data as input. Imputation usually leads to poor results. However, most studies deal in detail with one or fewer case studies. The lower limit for neural networks, however, is approximately 100 cases, as the demonstrated in the State of research section. Since deep neural networks can play out their advantages mostly for large data sets (e.g., images, text corpora), this may be one reason for the slow use of these techniques.
Second, data-case studies-need to be in a standardized format to be comparable (Frey 2017). Comparable, consistently https://www.ecologyandsociety.org/vol25/iss4/art45/ operationalized data sets with unambiguous definitions, concepts, and variables are rare. There is a clear lack of such large, high-quality data sets in natural resource management research (Poteete et al. 2010). Open access data are still rare.
Third, unfamiliarity with machine learning methods and the approach in general (Breiman 2001b) might lead to hesitation among researchers. Until recently, it was not evident to researchers in natural resource management that machine learning could be of help in modeling. With improved and streamlined software packages available and the success stories from other fields getting more attention, this may change.
If these obstacles are overcome, an increasing spread in methods of machine learning in the field of natural resource management may also lead to a shift in research interest from individual case studies to larger data sets. This development has already been called for (Poteete et al. 2010). This in turn may lead to a different type of data collection and may change the field if data are uniformly collected, structured on the basis of a framework, mainly longitudinal, and extend across several aspects (e.g., social, economic, technical). An example of this is the International Forestry Resources and Institutions database, which has enabled many scientific findings to be achieved (e.g., Andersson andAgrawal 2011, Salk et al. 2014).
Support could also come from increasing performance of computers, which could speed up computations of complex models considerably. Just to name a few possibilities: computing on graphics processing units, using parallel computing software like MPI (message passing interface) on local laptops, or using server clusters in the researcher's scientific institution. If even more computational power is required, high-performance or cloud computing are readily available.

CONCLUSION
The successes of machine learning in many fields of research suggest that their modeling qualities can also be used for analyses in the field of natural resource management. However, this has hardly happened so far-a literature review resulted in only 32 reviewed papers in both more general and topic-specific journals at the interface of machine learning and natural resource management.
I have identified a number of potential reasons why machine learning is rarely applied in natural resource research and have suggested how obstacles in applying machine learning could be overcome. It is not due to the unavailability of suitable data sets, as collections of case studies in meta-analyses , Brooks et al. 2012) and research using databases (Tang 1992, Lam 1998, Salk et al. 2014) have proven. I also established that machine learning algorithms are probably well suited to deal with the kind of data that exist in natural resource management.
All algorithms tested (deep and shallow neural networks, and gradient boosting) had a superior explanatory power over traditional linear regressions. However, no algorithm emerged as clearly superior to the others-results were also dependent on the data set and its features. It is important to stress again that models vary widely depending on parameter tuning. In order to identify robust patterns, it is necessary to both run many models and use multiple machine learning algorithms. Only if a pattern is stable across many models and at least two algorithms is there an indication for its existence.
Future research could be based on well-tested architectures. Since all analyses were performed on open access data with open source tools, one such workflow is presented in this article, with the full code provided in Appendix 1. Therefore, adapting such models to one's own data set may consist only of fine-tuning some parameters. For example, this is common practice in image recognition. Furthermore, standard data formats, common definitions of central concepts, and reference data sets and benchmarks for comparing different methods are future central building blocks for advancing natural resource management research.
This brings us to the conclusion that the many different methods of machine learning, not only neural networks, could enrich the methodological toolbox of social-ecological systems analysis. Machine learning methods have proven their worth in many fields, they are both theoretically and practically mature, and there are many easy-to-use software solutions and corresponding introductions and instructions (e.g., the h2o Note: The first number in a cell indicates the hits for "social ecological"; the second number indicates "natural resource".

Code
The following code section shows the full code in R to produce 500 models for deep neural networks with the h2o software package. Data preparation, loading and saving models and results are part of the workflow, but are not specified in detail here, since these are user specific and not part of the core machine learning code.