Mapping and characterizing social-ecological land systems of South America

Humans place strong pressure on land and have modified around 75% of Earth’s terrestrial surface. In this context, ecoregions and biomes, merely defined on the basis of their biophysical features, are incomplete characterizations of the territory. Land system science requires classification schemes that incorporate both social and biophysical dimensions. In this study, we generated spatially explicit social-ecological land system (SELS) typologies for South America with a hybrid methodology that combined datadriven spatial analysis with a knowledge-based evaluation by an interdisciplinary group of regional specialists. Our approach embraced a holistic consideration of the social-ecological land systems, gathering a dataset of 26 variables spanning across 7 dimensions: physical, biological, land cover, economic, demographic, political, and cultural. We identified 13 SELS nested in 5 larger social-ecological regions (SER). Each SELS was discussed and described by specific groups of specialists. Although 4 environmental and 1 socioeconomic variable explained most of the distribution of the coarse SER classification, a diversity of 15 other variables were shown to be essential for defining several SELS, highlighting specific features that differentiate them. The SELS spatial classification presented is a systematic and operative characterization of South American social-ecological land systems. We propose its use can contribute as a reference framework for a wide range of applications such as analyzing observations within larger contexts, designing system-specific solutions for sustainable development, and structuring hypothesis testing and comparisons across space. Similar efforts could be done elsewhere in the world.


INTRODUCTION
Because natural systems (i.e., not affected by human enterprise) are becoming rare across the world (Allan et al. 2017, there is an increasing need for analyzing and understanding land through the lens of coupled human-nature systems. Humans are not mere inhabitants of ecosystems but strongly influence ecological processes Ramankutty 2008, Maxwell et al. 2016). Ecoregions and biomes are useful geographic units to represent coherent patterns of biophysical features. However, to characterize the current configuration of land systems, which necessarily involves human activity ), we need a land classification scheme that integrates both the social and the biophysical dimensions.
With increasing data availability, new opportunities for largescale and synthesis research arise. Nevertheless, comparing findings from different locations and linking them to global or distant processes is still a challenge (Rocha et al. 2020), partly due to the lack of appropriate spatial frameworks at large scales to place them in context . Land system science, as a research field, is growing fast, and new methodological approaches to address this gap are diversifying and consolidating (GLP 2016). An example is the syndrome and archetype analysis , which analyzes social-ecological systems (SES) by means of identifying recurrent patterns of land use characteristics and processes, and have been used to detect the Several endeavors applied the archetype logic to generate largescale land classifications of social-ecological systems. Early global initiatives by Ellis and Ramankutty (2008) combined land cover with irrigation and population data to generate the Anthropogenic Biomes of the world. Subsequent efforts included more detailed data on production activities.  generated a global land use systems map; van Asselen and  produced the global scale land systems, and  developed global land system archetypes. At a continental scale, Levers et al. (2018) analyzed archetypical patterns and trajectories of land systems in Europe.
These studies combined data of different aspects of nature (e.g., land cover, land use intensity, biophysical factors) using computerized classification methods (e.g., multi-stage empirical process, hierarchical clustering, self-organizing maps) to produce maps with medium spatial resolution (~10 km to 20 km). However, they tended to center their classifications on land use, particularly identifying different types of agropastoral production. In most cases, information on the characteristics of the social communities was represented only through population density or accessibility, as indicators of land use intensity. Political, environmental, and socioeconomic factors were used in some cases ex-post to describe the classes, but not to generate them. Culture and governance are important to reflect the complex behavior of agents influencing the landscape (Lambin et al. 2001) and are very difficult to include in global models . The most comprehensive was the global land systems archetypes . They produced an exhaustive classification that considered several physical variables, photosynthetic activity (NDVI), gross domestic production (GDP), and political stability.
Classifications at the global scale are ideal to present general patterns across the world, but they fall short in understanding land systems at regional or local scales. For example,  classified roughly half of the South American continent (~12,000 km²) as the same class: "forest systems in the tropics." Working at finer spatial scales would allow for more detail in the classes' descriptions, the inclusion of variables of regional relevance with particular values range, and higher likelihood of finding complete and coherent sets of specific variables; such as cultural and political variables.
South America has particular characteristics that justify having a specific continental classification scheme that enhances the understanding within and across the local social-ecological systems. These include, for example, low overall human population density with more than 80% of the population concentrated in urban areas; a history of land use strongly influenced by social groups in high altitude regions, followed by a highly transformative European colonization period, including a massive replacement of wild herbivores by livestock; numerous Indigenous communities with diverse cultural heritage legacies; economy and agriculture production oriented toward exports and linked with some of the highest deforestation rates in the world.
In a first attempt to integrate social-ecological knowledge into the characterization of land systems for Latin America,  proposed a "simplified biome-level typology of social-ecological land systems (SELS)." They described seven SELS based on biophysical, economic, settlements, institutions, technology, historical legacies, and potential future trends. Nevertheless, this typology was exclusively based on expert knowledge and lacks a map connecting to a specific spatial representation, thus limiting its use and application.
In this study, we made the concept of SELS operational with a precise and systematic spatial classification for South America. Our overarching goal was to contribute to the development of a geographical reference framework to facilitate contextualizing the discussion of social-ecological findings and studies in land system science and territorial planning. More specifically, we (1) created a map of SELS typologies for South America, (2) analyzed the key variables that differentiated the typologies, and (3) described and discussed the resulting SELS map regarding the representation of our territorial knowledge and adequacy to the conceptual SELS descriptions from . Additionally, we highlight key data gaps that would allow further delving into characterizations of this kind.

METHODS
We generated a classification of South America into general typologies of social-ecological land systems by analyzing spatial patterns of characteristics along a multidimensional continuum and depicting areas with similar profiles ). Our research objective may not have a single one correct solution, thus we prioritized further applicability value by heeding the collective experience of researchers working on the region.
We designed a hybrid methodology combining machine learning techniques for analyzing a set of social and environmental spatial data, with a knowledge-based evaluation by an interdisciplinary group of regional specialists (authors). The computational spatial analysis allowed for replicability and spatial explicitness, whereas the expert-knowledge approach contributed with enhanced collective criteria for making decisions on the analysis design as well as on the interpretation of the outputs. We decided not to rely exclusively on automated data analysis, acknowledging data constraints (i.e., usage of proxies due to data gaps), which were also unbalanced across the variable's domains impacting more heavily on the social than the biophysical aspects. Under this scenario, mathematical optimal solutions might not always be the thematically most meaningful ones. Therefore, expert knowledge was applied to favor coherent territorial clusters, making subjective decisions on top of the evidence provided by the results. The potential bias of these subjective decisions was minimized through diversifying the profiles of the group of regional specialists.
The regional specialists were involved for the 22-month duration of this study. We had three stages of personal surveys on input variables and partial results, one in person workshop session at the GLP Open Science Meeting in April 2019, a subgroups' work instance to thoroughly discuss individual SELS, and overall reviews of the final manuscript. The group of specialists consisted of 21 researchers of different backgrounds, affiliations, disciplines, skill profiles, gender, and nationalities, with extensive local and regional experience covering the geographical and territorial diversity of South America (many co-authors of the To be included in our analysis, all spatial datasets were required to cover the full extent of the continent (dismissing islands) with a consistent methodology and a spatial resolution not greater than our grid size (exceptions are the national "governance indicators," and "plant diversity" with pixel size of 110 km), with preference for datasets representative of the year 2010 (or the closest available). The final set of input data for our analyses (Table 1) consisted of 26 variables organized within 7 dimensions (variables per dimension: 3 physical, 2 biological, 6 landscape, 7 economic, 2 demographic, 4 political, and 2 cultural), 11 of which corresponded to the environmental domain and 15 to the socioeconomic domain. Our input data included both quantitative and qualitative data because two of our variables were represented by categorical data, i.e, urbanization type and anthropization century.

Spatial clustering analysis
Our analysis design was largely shaped by two characteristics of our input data: we mixed quantitative and categorical data, and most of our variables do not present a normal distribution (Appendix 1, Fig. A1.3). We used a hierarchical clustering approach to map SELS, which is widely used for spatial identification of social-ecological typologies (FAO 2011, Rocha et al. 2020. For this, we (1) divided South America into a continuous grid of hexagonal units of 40 km side to side (area ~1400 km², n = 13,287), (2) aggregated variables to the hexagon level, which we then used as input to (3) calculate the distances between every 2 pairs of hexagons along the multidimensional space, and finally (4) computed a divisive hierarchical clustering (DIANA;  to group hexagons into clusters sharing similar characteristics. Distances or (dis)similarities were computed with the Gower distance method  because it is the preferred algorithm for clustering mixed data  and it is less sensitive to outliers and non-normal distributions than other popular methods such as Euclidean (Kassambara 2017, Boehmke and. Nevertheless, we applied logarithmic transformations to those variables that presented highly exponential distributions (see Table 1) and range-based standardization to all variables (forcing them to range between 0 and 1) to mitigate potential effects of data artifacts.
Divisive hierarchical clustering (DIANA) is an unsupervised method that constructs a hierarchy of clusters starting by the root (all observations in one cluster) and iteratively divides them until all observations constitute their own cluster . At each iteration, the most heterogeneous of the clusters (which contains the largest dissimilarity between any two of its observations) is divided into two new clusters, where the "splinter group" is initiated by its most disparate observation (largest average dissimilarity).
Most methods build their clusters starting from their terminal nodes (leaves), randomly selecting the initial point and considering local patterns or proximate neighbors to make decisions. Instead, DIANA starts from the root of the tree, taking into consideration the overall distribution of the data points for the initial splits, gaining in accuracy and favoring the capture of the main structure of the data prioritizing larger groups coherence rather than smaller groups purity .
We considered the results at two nested spatial levels of detail (1 st level corresponds to social-ecological regions or SER and 2 nd level to social-ecological land systems or SELS) because findings at different levels can complement each other and improve analysis robustness . The authors analyzed the clustering outputs (spatial layout, cluster's statistics, and method's performance metrics) at successive dendrogram cuts in relation to their territorial knowledge to agree on the optimal number of clusters. Further details are in Appendix 1.
To analyze which are the most informative variables for the classification, we ran boosted regression trees (BRT; ) on the cluster classification outputs. Boosted regression trees is a regression-classification technique from machine learning in which a model is trained to relate a response to their predictors by iterative binary splits, where variables' relative contributions can be measured as the mean number of times it is selected for splitting the tree. To examine case-dependent fluctuations in the relevance of variables, this analysis was repeated several times with different classification targets: two multi-nominal analysis targeting differentiation of all clusters simultaneously in SER and in SELS classifications, and n specific binary analyses for each of the SELS targeting to differentiate that particular cluster from the rest as a whole (n = number of clusters in SELS). Further specifications on model parameters in Appendix 1, Box A1.1.
Far from being homogeneous units, clusters involve some internal heterogeneity. To unravel variations in the clusters' representativity across their territorial extents, we evaluated the clusters' internal heterogeneity (as means of average dissimilarity) and generated a map that depicts core and marginal zones of clusters' representativity. We propose this metric as an indicator of spatial variations in classification uncertainty. The level of uncertainty for each hexagon was calculated by averaging the dissimilarity values between that hexagon and all the others within the same SELS cluster. Greater dissimilarity meant greater deviation of that hexagon to the average characteristics of the SELS cluster it belonged to. All analyses were performed in R version 3.6.1 (R Core Team 2019). For clustering, we used the "daisy" (distance calculation) and DIANA (clustering analysis) functions from the "cluster" package . For the boosted regression trees analyses, we used the "gbm" (multinomial models) function from the "gbm" package ) and "gbm.step" (binary models) function from the "dismo" package (Hijmans et al. 2017).

Social-ecological land systems (SELS) interpretation
To generate a sound interpretation of the resulting SELS, the authors of this publication were arranged into panels of four to seven regional specialists specific for each SELS. The panels thoroughly discussed the consistency between the SELS and their territorial knowledge, described the characteristics of that SELS, named it, and evaluated its alignment with the conceptual SELS from .

RESULTS
Our classification divided the continent into five larger-sized typologies of social-ecological regions (SER), which reflected main biomes and dominant land uses (Fig. 1A). Nested within these, 13 smaller-sized typologies of social-ecological land systems (SELS), each with distinctive characteristics representing more specific features of their territories (Fig. 1B). The SELS classification uncertainty was lower on the flat inner portion of the continent than on the coastal areas and nearby regions (including the Andes cordillera; Appendix 2). Some regions with greater uncertainty included: the eastern cordillera of the northern Andes, the eastern coast of Venezuela, the central portion of the Guayanas, and the northernmost and southernmost regions of the Brazilian coast. The map depicts the spatial distribution of the typologies of social-ecological land systems in South America. High resolution map is available in Appendix 6.

Influence of variables on the social-ecological land systems (SELS) classification
The most relevant variables for characterizing the classes varied depending on the scale of analysis. The variables relevant for separating the 5 SER were a subset of those relevant for sorting the 13 SELS (Fig. 2), which indicated that the diversity of variables facilitated the specificity of the SELS classification. This was even more evident when looking at the most relevant variables for differentiating each of the individual SELS from the rest (Appendix 3, Table A3.1). Several variables that showed very little influence over the general SER classification resulted among the most informative variables to define some of the individual SELS.
The five most relevant variables in defining the classification were shared by both SER and SELS levels: forest cover, percent of flat land, plant diversity, travel time to cities, and temperature, adding https://www.ecologyandsociety.org/vol27/iss2/art27/ up to 70.60% (SER) and 65.58% (SELS) of the explained variance of the cluster's distribution (Fig. 2). Forest cover was dominant, representing approximately one-third of the explained variance, more than double than the second-ranked variable in both classification levels. Differences arose between the sixth and tenth positions in the contribution of relative information: the SER model relied more on population and cattle density, whereas the SELS model on cropland and the century of anthropization (Fig. 2). Except for the political dimension, all other 6 dimensions were represented within the 10 most relevant variables in both cases (i.e., SER and SELS models). However, there was a domain shift in dominance with more environmental variables occupying the highest positions and more socioeconomic variables toward the middle range. On the other extreme of the relative importance ranking, 5 variables ranked 6 th or lower in all 15 models examined (Appendix 3, Table A3.1): plantation cover, land cover diversity, World Bank indicator rule of law, urbanization type, and language density.

Typologies of social-ecological land systems in South America
We describe the five typologies at the SER level. Due to length concerns, the 13 SELS' descriptions and associated diagnostic plots are in Appendix 4 and 5, respectively.

SER A. Sparsely populated, southern cold lands
Includes both forested and non-forested ecosystems, which despite this key ecological difference (driven mainly by differences in moisture) share important social-ecological characteristics: (1) cold climate and associated slow biogeochemical cycles (reflected for example in the existence of peatlands (mallines and bofedales), (2) relatively low levels of biological diversity, but high levels of endemism associated with historical biogeography, (3) little potential for cultivation outside localized irrigated valleys, (4) low human population and very extensive unpopulated areas, (5) extensive minor livestock (i.e., sheep and goats) and cattle, often in decline, (6) widespread (although often underdeveloped) mining activities, most commonly associated with the energy industry (e.g., gas, oil, coal, lithium), (7) growing importance of tourism, (8) extensive protected areas, and ongoing processes of spontaneous rewilding of native fauna (e.g., guanacos in Patagonia, vicuñas in Puna and their associated predators). The temperate forests sectors are characterized by a very distinctive biota derived from Gondwanic lineages with high levels of endemism, partly threatened by the expansion of exotic invasive species (e.g., beaver, deer, pines, many ornamental plant species). This SER comprises four SELS, detailed in Appendix 4.

SER B. Arid and semi-arid highlands and adjacent coast, with a long history of agriculture and mining
Corresponds to the Central Andes of Peru, Bolivia, Chile, and Argentina, the Ecuadorian dry inter-Andean valleys, the dry Pacific coast of Peru and Chile, as well as the Mediterranean Andes. It is characterized by a rough geomorphology, wide altitudinal ranges, high-climatic diversity (overall cool and dry), ancient settlement history, and relatively high population density (including some major cities). As a coastal area it is largely influenced by the economics of overseas trade. Due to climatic conditions, agriculture is limited to irrigated areas in valleys and coasts or seasonally rainfed subsistence cultivation in the highlands. With high biological and cultural diversity, this SER ranks highest in crop diversity, but also in mining density. This SER comprises one SELS, detailed in Appendix 4.

SER C. Consolidated large-scale agropastoral plains
Corresponds to plains and low rolling terrains with mostly fertile soils dominated by productive landscapes mostly in Argentina, Uruguay, Brazil, Paraguay, and a separate block in Venezuela and Colombia, but it also includes smaller patches within the Amazon. This SER includes the largest and most productive areas of grain and meat production and exports of the continent, as well as some of the largest cities and the most developed infrastructure for transportation and export of commodities. Biodiversity fluctuates but it is medium in most of the region and there are few protected areas. The area includes natural ecoregions of open vegetation such as the Pampas and Campos grasslands, but also sectors of tropical and subtropical forests such as the Amazon, Chaco, and Espinal. Those forest-embedded sectors are represented by consolidated agricultural clusters commonly developed around middle-sized urban centers or major roads that facilitate their connection to the main cities and to the exporting outlets. A large fraction of the agricultural commodities exported by the continent originated in the area covered by this SER. This SER comprises two SELS, detailed in Appendix 4.
SER D. Historically populated tropical areas with low potential for mechanized agriculture Includes the south-eastern region of Brazil, mountain regions of Colombia and Ecuador, and a narrow strip along the eastern slopes of the tropical Andes (both humid and dry). In general, Ecology and Society 27(2): 27 https://www.ecologyandsociety.org/vol27/iss2/art27/ the areas have been the basis of pre-Hispanic and early colonial settlements. Human population density continues to be high, however, these areas have become comparatively marginal agricultural lands because they have low capacity for expansion of modern mechanized agriculture due to steep slopes, limited accessibility, comparatively poor or degraded soils, sometimes suboptimal climatic conditions, and land tenure characterized by high fragmentation and small farm size. The region has high biological diversity and endemism. Mainly in association with steep topography, many areas are experiencing forest recovery. The SELS within this SER include a gradient of accessibility to ports, with SELS D2 (South-East Brazil) being the most connected, and in consequence the most developed with largest cities. This SER comprises three SELS, detailed in Appendix 4.

SER E. Tropical forests with low anthropization
Includes the whole Amazon biome, extended to the south over Bolivia, western Paraguay and the north of Argentina. It corresponds to plains and hilly terrains dominated by natural forests with high biodiversity and a huge stock of biomass. It extends over warm and moist climates, with mostly poor and acidic soils, including a gradient of human transformations that encompasses relatively unmodified forests (SELS E3), transition zones with active deforestation frontiers (SELS E1), and areas with a high fraction of protected areas (SELS E2). A dynamic history of agricultural expansion over the plains and low rolling terrains of the continent suggests that in the future the contact between this SER and SER C will experience displacements and zones with characteristics of SER C may expand over areas currently classified as SER E. This SER comprises three SELS, detailed in Appendix 4.

DISCUSSION
Novel ways to use data and synthesis methods that improve our understanding of land systems are among the featured innovations needed to advance key thematic research areas in land system science (GLP 2016); specially by combining social and natural sciences, as well as quantitative and qualitative data . Our SELS approach improves the understanding of characteristics, extent, and location of humannature interactions operating at regional scales in South America, carved through centuries of human intervention on the environment. As such, our approach provides new insights into the Anthropocene as well as a transferable geographical framework that facilitates contextualizing and articulating research on land science.

Relevance of variables in defining the social-ecological land systems (SELS)
Both levels of classification (SER and SELS) relied on the same five key variables according to their explanatory power of SELS' patterns. A handful of variables concentrated most of the relative information for our classification, especially for the coarse SER typologies. However, it is at smaller/detailed scales that we see the real contribution of incorporating larger and more diversified sets of variables that highlight the individual characteristics that differentiate the SELS typologies. For example, the "century of anthropization" was key to differentiate the SELS within the SER D sorting the areas with longer history of use (SELS D1 and D3) from the most recently settled (SELS D2); "density of mine sites" was the second most relevant variable for SER B; "shrub cover" was the most relevant variable for SELS D3 and A4 (Appendix 3, Table A3.1; Appendix 5).
Several of the most relevant variables for the classification (e.g., forest cover, relief, plant diversity, temperature) corresponded to the environmental realm. Hence, our results suggest that biological and physical characteristics, similar to a biome/ ecoregion classification scheme, continue to prevail regardless of human impact at that scale. This suggested they have a power to determine or place limits on the development possibilities of certain socioeconomic activities.
The single most relevant variable in defining the SELS distribution was "forest cover," which accounted for one-third of the explained variance. Such a relevance is reasonable considering that forests occupy a large area of the continent with an uneven distribution (FAO and UNEP 2020), and that "forest cover" is a complex and synthetic variable. It summarizes the combination of physical variables such as altitude, precipitation, and temperature, but it also informs indirectly about anthropic historic and present land use. For example, in cases in which physical conditions are suitable for forests, its absence in certain areas sorts a physically homogeneous region into deforested vs. not converted forest.
The second most relevant variable was topographic relief, represented here as the "percentage of flat terrain," not only for the general multinomial models SER and SELS, but also ranked within the top 5 for 9 out of the 13 SELS (Appendix 3, Table  A3.1). Topography is a main conceptual differentiation for the current and potential of land use in South America because it largely dictates the suitability for mechanized agriculture. In our analysis, the differentiation between mountains vs. rolling and flat plains was critical and probably underpins multiple biophysical and socioeconomic properties. The third explanatory variable was "plant diversity," and the same as with forest cover, it summarizes major aspects of climatic conditions and resource availability , which is often assumed as the main organizing variable of biophysical diversity in the continent. The fourth was "travel time to cities," the only socioeconomic variable within the top five in the relevance ranking. The presence of large cities encompasses two interlinked geographical properties. On one hand, they represent access to infrastructure utilities and economic opportunities, generating a sort of gravitational power over human activities (Lambin et al. 2001). On the other hand, most of the cities were strategically settled centuries ago to best serve colonial South America (i.e., warfare against Indigenous people and transporting goods to Europe) and the persistence of their location may have influenced the distribution of human land uses in the present. The fifth was "temperature," which is not a surprise considering the wide range of temperatures on the continent (mean air temperature from 6°t o 24° C; Collins et al. 2009), varying mostly with latitude and altitude.
"Precipitation," which is often assumed as the main organizing variable of biophysical diversity in the continent, showed up in the 14 th place instead of standing out among the main physical determinants such as relief and temperature. However, it was in the top five for those SELS particularly related to dry climate (A1, A2, A3, and B).  ) and spatial social-ecological regions (SER; left) and spatial SELS (right) along two gradients of population and terrain roughness. Circles represent the mean values and bars represent the first to third quantile range of the spatial SELS percent of flat land (y axis) and population density (x axis). Dashed lines depict the hypothetical distribution of the conceptual SELS along those axes. Acronyms refer to the conceptual SELS from : SAHA = South American highlands and altiplano; CAL = coastal agricultural lands with long colonization history; DML = dry and mediterranean lands; SAL = South American Lowlands: new agropastoral areas; SAPL = South American plateau lowlands: agropastoral historical areas; STFD = southern temperate forests and drylands. "Cattle density" was an important human-related variable, even more than crop cover. Cattle are the main herbivores in the world, and their significance is disproportionally high in South America (Bar-On et al. 2018). Three of the five countries in the world with large ratios between cattle and people occur in the region (Argentina, Brazil, and Uruguay;FAO 2022). Cattle density serves to characterize both intensive production (e.g., intensive systems that compete with croplands in the Pampas or Cerrado) but also to discriminate between non-agricultural regions because extensive cattle production characterizes mesic ecosystems that are not too dry (where sheep and goats dominate herbivory) and not too humid as the Amazon rainforest, where cattle do not occur outside deforested areas ).
The political dimension had in general an intermediate to low influence in characterizing SELS, possibly due to their broad spatial resolution (i.e., country level) of the data. However, some political aspects were shown to be relevant for particular locations (e.g., regulatory quality was the 2 nd variable to sort SELS A1). The low impact of "language density" on the SELS classification was however notable, and contrary to expert expectations and literature findings . It is possible that our measurement unit (i.e., number of languages spoken within a 100 km buffer zone) may have been inadequate, although difficult to contrast given the lack of guiding references from other publications. We encourage future work to further examine this concern and to look for alternative variables to reflect cultural diversity.
In the last decade, there was a clear evolution in land systems classifications to incorporate the complexity of the human-nature interactions. Compared to previous classifications, we delved into a holistic consideration of the social-ecological land systems. We further diversified the input variables achieving the representation of seven complementary dimensions of social-ecological systems: physical, biological, land cover, demographic, economic, political, and cultural. In addition, we prioritized the inclusion of attributes more pertinent to the continent such as mining and distance to ports. Our effort to explicitly incorporate deeper social aspects of human societies represents a clear step toward a qualitative leap in the field from mapping land use systems to mapping social-ecological systems. However, a series of limitations need to be addressed to fully achieve that goal, especially regarding data gaps and quality.

Alignment with the conceptual social-ecological land systems (SELS)
The SELS definitions produced by this study allowed for the refinement of the expert knowledge-based conceptual SELS described in . Some social-ecological regions had a high correspondence with the conceptual SELS (Fig. 3). These included: (1) the consolidated large-scale agropastoral plains (SER C), which corresponded with the conceptual SELS "South American plateau lowlands/agropastoral historical areas," and (2) the tropical forests with low anthropization (SER https://www.ecologyandsociety.org/vol27/iss2/art27/ E), which corresponded with the conceptual SELS "South American lowlands/new agropastoral areas." In this last category, our study added more remote tropical lands, which were not addressed by  because of their primary focus on land-use change. Such high correspondence showed the importance of the historical occupation in shaping socialecological characteristics of the South American lowlands.
We found only medium correspondence between spatial and conceptual SELS in the Andean and Patagonian regions. The arid and semi-arid highlands and adjacent coast, with a long history of agriculture and mining (SER B) covered the dry Central Andes and roughly fell within the conceptual SELS "South American highlands and altiplano." It however differed with the inclusion of Mediterranean Chile and the exclusion of the Northern Andes. Instead, the Northern Andes were included in the "historically populated tropical areas with low potential for mechanized agriculture" (SER D), which corresponded with the conceptual SELS "coastal agricultural lands with long colonization history" covering the Brazilian Atlantic forest and Caribbean and Pacific coastlines. Finally, the highest and coldest areas of the Central Andes fell within the "sparsely populated southern cold lands" (SER A), showing more affinity to the Patagonian Andes due to sparse population and cold climate. Apart from this inclusion, SER A highly corresponded to the "southern temperate forests and drylands" conceptual SELS.
Drylands were the most challenging areas in terms of correspondence in our analyses. The conceptual SELS "dry and mediterranean lands" appeared to be split into three different SER, namely (1) the Mediterranean Andes, which had more affinity with the Central Andes within SER B, (2) the Brazilian Caatinga, which corresponded to SER D representing historically used tropical areas, and (3) Western Argentina, which was assigned to the SER A also covering Patagonia. This showed the ambiguity of the category of drylands that had very different social-ecological configurations depending on the geographic location and settlement histories. This suggests that humans may interact very differently with drylands depending on both biophysical and socioeconomic factors at play.
Nevertheless, given the differences in the methodological approach, the similarity of the two classifications is remarkable. This is underlined by considering the disparity in our input variables. Although we made advancements in quantitative rigor, reproducibility, operability, and spatial explicitness, it is worth noting that the attributes mentioned by  were included in our analysis through approximate renders and proxies, mainly because of limitations in data availability. The characterization of conceptual SELS by  were unobstructed by such data constraints and thus were more consistent with the authors' understanding of the systems. Furthermore, the role of trends in land-use change was central for the conceptual SELS, whereas in this study, we considered the current state only, leaving to future work the mapping of land changes and transitions.

Methodological considerations
Models are inherently simplifications of reality, and as such our maps do not reproduce precisely all the features of the territory to its full extent. Compromises of mapping complex systems are many and we discuss some of them in the following paragraphs. We highlight the hybrid methodology as a strength of this study. Interdisciplinary researchers' opinions contributed enormously by assessing the performance of the automated process, screening plausible data sources, and discussing the results in light of sound territorial knowledge.

Data constraints
The largest downside of data-driven approaches is that they are limited by the availability of adequate datasets. Often data availability and quality restrict the characterization of important aspects of the systems. In this section, we highlight and discuss a brief summary of the main data gaps we faced in this study that potentially could have enriched it, hoping they can be addressed in the future. .
(1) Socio-environmental conflicts: the only dataset we found was by , who are developing a comprehensive spatial database, although currently based on self-reporting cases instead of a systematic registry. A potential source worth exploring is data mining through Google searches. .
(2) Natural ecosystems degradation: it modifies environmental processes and ecosystem services with varying impacts on sustainability Putz 2009, Garrett et al. 2019). Ecosystem degradation is a complex concept, partly value-driven and with extremely variable situations, moving in a continuum from pristine to fully transformed. The lack of consensus on its definition ) makes its assessment difficult. .
(3) Governance: it influences land systems in a multi-level, partly hierarchical scheme. National level variables are often accessible, but they underestimate the importance of local formal and informal governance rules, which sometimes can be highly influential on land use , Rajão et al. 2020.
. (4) Exports: much of South America's land use is aimed at net food exports (UN 2003). Having export data at a subnational resolution would represent a great improvement. Initiatives such as TRASE (SEI and Global Canopy 2022) can help fill this gap but they do not yet provide wall-to-wall datasets for all of South America.
. (5) Cultural variables: This is probably the least represented dimension within SELS inputs. Some countries such as Bolivia, Brazil, and Colombia have good spatial records of Indigenous and/or traditional communities, but no unified dataset was found at the continental level. Other aspects of cultural diversity, reflecting community cohesion or preferred land use practices would be valuable too. This would be a priority to better synthesize societies' landrelated, decision-making processes into land system science in connection with local governance.
. (6) Land tenure (or farm size): it informs about the most likely farm management types, as well as the degree in which smallholders have access to land. The datasets we could find to represent this variable were either partial (not covering the whole continent; Graesser and Ramankutty 2017), had Ecology and Society 27(2): 27 https://www.ecologyandsociety.org/vol27/iss2/art27/ a country-level resolution, or were heterogeneous in their methodology (compendium of national statistics).

Fuzzy borders, spatial detail, and isolated pixels
We emphasize the importance of considering the classification uncertainty map (Appendix 2) to assist in the interpretation and application of the SELS map.
In our SELS map, observations are hexagons of 1385 km², which include a fair amount of heterogeneity summarized to a single value. A map may appear fuzzy due to classification artifacts or properties of the landscape that may blur the general appearance, but at the same time may present important information. Some spatially succinct events, such as the presence of a city or a humid valley, may differentiate the classification of one hexagon from its surroundings, generating scattered patterns. Mountain regions or heterogeneous landscapes may also show a fuzzy classification. We decided to display our classification output without filtering out the isolated pixels due to the relevant information they can often contain. On the other extreme, some regions appearing homogeneous in the map (e.g., Chile, Western Amazon) do not necessarily have uniform landscapes. Apparent homogeneity should rather be interpreted as having unique characteristics that make those hexagons more similar to each other than to the rest of the hexagons in the continent.

Temporal dynamics and social ecological land systems (SELS)
For this study we only considered static variables, prioritizing consistency of the model structure, however trends and directions of change are very important characteristics of social-ecological land systems and can also be used to differentiate them. We encourage future studies to generate a SELS classification that incorporates land change regimes. In addition, changes could potentially modify the characteristics of regions enough to merit future revision of the typologies assigned in this study, as described for the SELS within the SER A and the SELS within the SER E (Appendix 4).

CONCLUSION
This study presents three major contributions: (1) it provides a comprehensive and reasonable characterization of the socialecological land systems of South America (SELS), (2) it offers a spatial representation of the SELS in an easily operable and freely available format, and (3) its methodological approach bridges hurdles of social-ecological land classifications such as the combination of qualitative and quantitative data, and the blending of data-driven and expert knowledge-based perspectives.
The hybrid methodology represents a major strength of this study. The inclusion of a group of interdisciplinary experts was crucial to guide the data search and contrast the automated classifications with the territorial knowledge. In addition, it improved the utility of the resulting maps because of the increased coherence and relevance for the researchers' community and territorial planners.
The SELS classification is a reproducible, sound, and operative characterization of social-ecological land systems of South America that facilitates the incorporation of regional contexts for analyzing local realities in the Anthropocene. We envision the SELS map will provide an orientative geographical framework for analyzing observed patterns within a larger context and for designing system-specific solutions for sustainability.
Responses to this article can be read online at: https://www.ecologyandsociety.org/issues/responses. php/13066 Acknowledgments:

This study is part of Lucía Zarbá's PhD thesis supported by a scholarship from Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina. Partially support was provided by the grant PICT 2015-0521 from Fondo para la Investigación Científica y Tecnológica (FONCyT). ESRI Travel Grant and GLP Travel Grant supported MPR and LZ attendance to the GLP OSM
2019. We thank GLP for holding an in-situ meeting of the project as well as to the external attendees that participated in that meeting enriching the discussion.

Data Availability:
The data/code that support the findings of this study are openly available in GitHub at https://github.com/luciazarba/SELS-SA This appendix is dedicated to expand on the details, rationales and performance evaluation of the methodology followed through this study. The sections are organized following the methodological steps listed in Fig. A1.1, however the actual work implied numerous feedback loops and reiterations of previous steps which are omitted for simplicity. (1) Variables: analyze and systematize the conceptual SELS descriptions in Table 1 of  defining a list of variables to use as inputs for the clustering. (2) Datasets: search and retrieve the spatial data to best represent the selected variables. (3) Clustering analysis: generate automated classifications through hierarchical cluster analysis. (4) Clustering results: analyze the clustering outputs and agree on the SELS representation according to the specialists group's territorial knowledge. (5) SELS descriptions: arrange in subgroups of regional specialists to discuss and describe each particular SELS. Arrows pointing backwards in relation to the numerical steps represent the feedback loops and local iterations of our process.

Variables
We used as a reference the biome-level SELS typologies described in Table 1 of hereafter conceptual SELS) to guide the variable selection process. Such descriptions were in a narrative form with no shared standard structure. We first exhaustively analyzed the conceptual SELS descriptions and listed all attributes mentioned for each of them. We then synthesized the list of attributes into general variables that represented the data we needed to acquire in order to capture those properties. The product of this process was our ideal input data list including 25 general variables (Table A1.1), from which we discarded and added variables through a heavily iterative process connected with step (2) Datasets.
On one hand we had to discard all general variables lacking a dataset that was adequate (representative proxy) and spatially continuous (covering the whole continental extent) with a coherent methodology. On the other hand, we discarded all variables that referred to trends, since combining measures of state and trajectories raised concerns among the authors about methodological philosophical inconsistencies.
Finally, to visualize whether our data was balanced across different aspects of the social-ecological systems we arranged the general variables within broader dimensions following the framework of . Compared to other popular frameworks, such as Ostrom's framework for analyzing sustainability of socialecological systems (Ostrom 2009) ideal for addressing specific issues, the Winkler's framework has a more general scope, which fits better the continental-scale broad multifaceted typologies of our study. We considered all Level III Winkler categories except Health, due lack of data. We recognized underrepresented dimensions in our original list of 25 general variables (Table A1.1), such as the Physical dimension, mentioned in the conceptual SELS names yet not in their descriptions; the Political dimension, which was indirectly suggested but not explicitly addressed; or infrastructural aspects of the Economic dimension. To complement and balance the representation of all different dimensions we incorporated the following variables: Flat relief, Temperature, Precipitation, Irrigation, Cities traveltime, Ports traveltime, and Governance indicators. 3

Datasets
To be used in this study, all spatial datasets were required to cover the full extent of the South American continent (dismissing islands) with a consistent methodology, in addition we preferred those closer to the year 2010 and a spatial resolution not greater than our grid size (exceptions are the governance indicators which are at the national scale, and plant diversity at 110km pixels). Country level data, as well as biomes and ecoregions, were allegedly discarded since they imply an artificial homogenization of the territory within arbitrary boundaries which may impact on the spatial representation of the SELS by misleading them to resemble those boundaries. It was a decision taken by the group of authors to avoid using country resolution data for all our variables except those representing political aspects.
We tested for correlations, considering correlation coefficient of |0.75| (absolute value) as the maximum accepted correlation for two variables in the model ( Fig. A1.2). We selected Spearman's rank correlation coefficient due it is non-parametric, assesses monotonic relationships, and poses less strict data requirements than Pearson's method (e.g. normal distribution or linear relationships).
The final list of input variables for our analyses consisted in 3 physical, 2 biological, 6 landscape, 7 economic (includes infrastructure), 2 demographic, 4 political, and 2 cultural variables; 11 of which corresponds to the biophysical domain and 15 to the socio-economic domain ( Table 1). Most of the variables are non-normally distributed ( Fig A1.3), the implications of this on the results are addressed in the next section. Below we expand on the details of calculation of hexagon values for all input variables.
Flat relief: Proportion of the hexagon covered by non-mountain classes in Karagulle et al. (2017) landforms classification. In this classification the mountain classes are four: high mountains, scattered high mountains, low mountains, and scattered low mountains. We chose this variable due it performs better than others in recognizing mountainous terrains embedded in other terrain types (Sayre et al. 2018).
Temperature: Hexagon median of mean annual temperature based on the climate maps generated by ClimateSA. ClimateSA data averages the climatic conditions between 1981 and 2010.
Precipitation: Hexagon median of mean annual rainfall based on the climate maps generated by ClimateSA. ClimateSA data averages the climatic conditions between 1981 and 2010.
Plant diversity: Vascular plant species richness based on the  global patterns of vascular plant species richness calculated with the ordinary co-kriging method. We consider Plant biodiversity as a proxy of overall biodiversity since diversity of different taxa such as mammals, birds, plants, reptiles and amphibia were found to be correlated regardless of environmental conditions (Qian and Ricklefs 2008) and vegetation heterogeneity has shown to be a strong predictor of species richness (Qian andRicklefs 2008, Stein et al. 2014).
Protected areas: Percent of the hexagon covered by protected areas, considering all categories of protection in the World Database on Protected Areas by UNEP-WCMC and IUCN. The data was downloaded in May 2019 and there is no information to sort protected areas created after our year of reference 2010. Although not the ideal situation, we consider the potential error is acceptable for the purpose of this study.
Land cover: Percent of the hexagon covered by each of the considered classes (i.e. forest, shrublands, grasslands, crops and plantations) based on Graesser et al. (2015) annual land cover classification for South America. To represent our reference year we used the average land cover between 2009 and 2011.
Cover diversity: The land cover diversity of each hexagon was calculated as the shannon diversity index of the area covered by each of the nine land cover classes included in Graesser et al. (2015). To represent our reference year we used the average land cover between 2009 and 2011.
Centrality: This variable is a proxy of the hexagon share of the country's economy, indicating the economic relevance of a particular region to the country. It was calculated by distributing the national gross domestic product (GDP) over the country's territory following the relative distribution of nighttime lights (NTL). The value for each hexagon was calculated as the national GDP * hexagon sum NTL/national sum NTL. For hexagons that overlays with more than one country we consider it part of the one with major area. National 2012 GDP data was obtained from the World Bank database, and 2012 nighttime lights map from the NASA Earth Observatory. Crop diversity: Shannon diversity of the area covered by all different crops in the hexagon based on the 175 crop types by Monfreda et al. 2008.
Irrigation: Percent of the hexagon equipped for irrigation based on the layer "gmia_v5_aei_pct_cellarea" of the Global Map of Irrigation Areas (GMIA) by FAO AQUASTAT (Siebert et al. 2005).
Cities travel time: Mean of travel time in hours to the nearest city of 50,000 or more people (Nelson et al. 2008).
Ports travel time: Mean of travel time in hours to the nearest port. The map was produced for this study following the methodology of Weiss et al. (2018). The road network data was downloaded from the Global Accessibility Map project repository (https://forobs.jrc.ec.europa.eu/products/gam/). We considered all sea ports and inland ports on rivers included in the Río de la plata and Amazonas basins. Ports locations were obtained from Natural Earth (https://www.naturalearthdata.com/) and Ports.com accessed in February of 2018. The distance to ports map together with a detailed explanation of its development (including input data and the reproducible script), are available to download through this link. https://github.com/luciazarba/SELS-SA.
Population density: Mean environmental population by hexagon, based on the Landscan environmental population for the year 2012 (Bright et al. 2012).
Urbanization type: Category of biggest city in a 100 km buffer zone. Cities categories were: rural (no cities within the buffer zone), small city (less than 100,000 inhabitants), medium city (less than 1,000,000 inhabitants), big city (less than 10,000,000 inhabitants), and metropolis (more than 1x10 7 inhabitants). Cities' data was downloaded from the Global Accessibility Map project repository (https://forobs.jrc.ec.europa.eu/products/gam/). Languages density: Number of different languages spoken within a 100 km buffer zone around each hexagon. The map of language distributions for South America was kindly provided by Mutur Zikin (Zikin 2007), and it was georeferenced and vectorized by the authors of this publication.

WBI governance indicators:
Anthropization century: The earliest century in which a 30% of the hexagon was covered by anthropic land cover classes based on Ellis et al. (2010) classification. It consists of anthrome classification maps for each century from 1700 to 2000. We considered as anthropic all classes except for water, remote croplands, remote rangelands, remote woodlands, wild woodlands, and wild treeless and barren lands.

Clustering analysis.
We analyzed 26 variables across a grid of 13287 hexagonal cells (40 km side to side, area ~1,400 km 2 ) covering the entire continent of South America in order to identify general typologies of social-ecological land systems (SELS). The process required to calculate the statistical distances between all pairs of hexagons along the multidimensional space and arrange them into groups based on such distances. All calculations were performed in R statistical software (R Core Team 2019) and the scripts are available through this link. https://github.com/luciazarba/SELS-SA.

Statistical distance
Two of our input variables were ordinal: urbanization type and anthropization century, which represented a major constraint due most distance calculation algorithms only accept continuous data. We followed the Gower distance method  since it is the recommended algorithm for mixed data (Kassambara 2017, Boehmke and. As calculated in R with the daisy function (cluster package, Maecheler et al. 2019) the dissimilarity between two rows is computed as the weighted mean of the contributions of each variable. Contributions for numeric variables are defined as the absolute difference of both values, divided by the total range of that variable. For ordinal variables' the contribution calculation function applies "standard scoring" (replacement of the variable's levels by their integer codes); similar to using their ranks but avoiding ties.
Several of our input variables did not follow a normal distribution (Fig. A1.3). Despite many data analysis algorithms require specific data distributions, the reference literature for gower  and DIANA (Kaufman and Rousseeuw 1990) algorithms do not mention particular requirements or considerations regarding data distributions. We found in more recent literature that the Gower distance algorithm is the appropriate metric when clustering non-normally distributed data (Kassambara 2017, Boehmke and since it is less sensitive to outliers and non-normal distributions than other popular methods like Euclidean distances . Furthermore, searching through the gray literature we found a very interesting statement in a scholarly blog discussing the applicability of normality tests for machine learning techniques. One user pointed out that he/she was not aware of any clustering method that assumes normality, and that the cluster-structured data implies a multimodal (and thus non-normal) distribution (Cross Validated blog entry "How to Cluster with Nonnormal data" https://stats.stackexchange.com/questions/373404/how-to-cluster-with-nonnormal-data).
To account for potential issues with non-normally distributed data we deliberately used the Gower distance metric. Nevertheless, to mitigate the effect of data artifacts on the distance calculations we applied logarithmic transformation to those variables that presented highly exponential distributions (Table 1), and min-max standardization to all variables (forcing them to range between 0 and 1) to avoid unequal impact of variables on the distance measures due their different scales of values.

Clustering Method
We decided a priori, based on conceptual adequation, that the most appropriate clustering algorithm for the purpose of this study was Divisive Hierarchical Clustering (DIANA).
As defined in the software vignette (sensu stricto Maechler et al. 2019 page 33): "The DIANA algorithm constructs a hierarchy of clusterings, starting with one large cluster containing all n observations. Clusters are divided until each cluster contains only a single observation. At each stage, the cluster with the largest diameter is selected. The diameter of a cluster is the largest dissimilarity between any two of its observations. To divide the selected cluster, the algorithm first looks for its most disparate observation (i.e., which has the largest average dissimilarity to the other observations within the same cluster). This observation initiates the "splinter group". In subsequent steps, the algorithm reassigns observations that are closer to the "splinter group" than to the "old party". The result is a division of the selected cluster into two new clusters." Most methods build their clusters starting from their terminal nodes (leaves), considering local patterns or proximate neighbors to make decisions. Instead, DIANA starts from the root of the tree, taking into consideration the overall distribution of the data points for the initial splits, gaining in accuracy and favoring larger groups coherence rather than smaller groups purity (Kassambara 2017, Dey 2019, Boehmke and Greenwell 2020). The first step of the algorithm involved consideration of all possible divisions of the data into two subsets (and so forth in every iteration), which is computationally demanding for large datasets, but allows to capture the main structure of the data .
Since this study is not about sorting elements into distinct natural units that exist in the field but classifying the landscape into general typologies of similarity along a multidimensional continuum, we consider DIANA to be the most appropriate approach. Anyways, for the sake of exploration and following the recommendations of an anonymous reviewer, we tested alternative clustering methodologies (Table A1.2) and compared them through a series of clustering stability and internal validation metrics (Table A1. 3). The endeavor was not straightforward since many clustering algorithms were not compatible with mixed data nor gower distances, therefore we had to make adaptations: the two ordinal variables in our data set were converted to numeric (equidistant fractions of 1) and similarities were calculated with the Manhattan method, one of the most popular methods that is capable of dealing with outliers and no-normal distributions (similar to Gower). The results do not show any of the methods to be definitely better than the others (Fig. A1.4), therefore we found no reason not to use DIANA. Disclaimer, due the mentioned modifications the results of this experiment are incommensurable with the results of other analysis of our study. each observation is initially considered as a cluster of its own (leaf). Then, the most similar clusters are successively merged until there is just one single big cluster (root).
K-means 1 partition the points into k groups such that the sum of squares from points to the assigned cluster centres is minimized. At the minimum, all cluster centres are at the mean of their Voronoi sets (the set of data points which are nearest to the cluster centre).
PAM 1 it is based on the search for k representative objects or medoids among the observations of the data set, instead of using the mean, for partitioning a data set into k groups or clusters.
SOM 2 type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional, discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction.
DIANA 1 the inverse of agglomerative clustering. It begins with the root, in which all objects are included in one cluster. Then the most heterogeneous clusters are successively divided until all observations are in their own cluster.  Connectivity 2 reflects the extent to which items that are placed in the same cluster are also considered their nearest neighbors in the data space -or, in other words, the degree of connectedness of the clusters. And yes, you guessed it, it should be minimised.
Dunn index 2 represents the ratio of the smallest distance between observations not in the same cluster to the largest intra-cluster distance. As you can imagine, the nominator should be maximised and the denominator minimised, therefore the index should be maximized.
Silhouette width 2 defines compactness based on the pairwise distances between all elements in the cluster, and separation based on pairwise distances between all points in the cluster and all points in the closest other cluster. Values as close to (+) 1 as possible are more desirable.
avg. within 3 average distance within clusters.  Table A1.2) are compared in terms of stability and internal validation metrics (boxes , Table A1.3) along a gradient of number of clusters (K). Note the distance calculation algorithm for these analyses was Manhattan distance. Disclaimer: due the mentioned modifications the results of this experiment are incommensurable with the results of other analysis of our study.

Clustering results
In this section we describe how we analyzed the results of the DIANA analysis and agreed on a clustering output as the best SELS representation according to the specialists group's territorial knowledge. This included the decision on the number of clusters and its map layout, examination of the spatial representativity of the SELS across their territory, and evaluation of the relative contribution of each input variable to the classification.

Number of clusters
The output of DIANA is a dendrogram of hierarchical clusters. To decide at which height to cut the dendrogram we considered quantitative validation metrics ( Figure  A1.5) and analyzed the resulting spatial layout and clusters' statistics at the successive dendrogram cuts in relation to our territorial knowledge to agree on the optimal number of clusters. We disregarded clustering outputs with less than 5 or more than 16 clusters since we considered them not informative or too complex for the purpose of this study, respectively. As shown in Figure A1.5, alternative validation metrics did not converge into one unique "optimal number of clusters", therefore the decision was made based mostly on expert's knowledge. After analyzing the output maps and variable's statistics the authors agreed the map depicting thirteen clusters was the most adequate representation of smaller-size SELS for the purpose of this study, and we found no evidence in the quantitative validation metrics to contradict that decision. Figure A1.5. Identification of optimal number of clusters. Representation of three internal validation metrics performance: average silhouette width, average within distance, and dunn index (y axis) along the gradient of number of clusters (x axis).

Input variable's relative contributions
To measure the input variable's relative contribution we used Boosted Regression Trees. Regression trees are a regression/classification technique from machine learning where a model is trained to relate a response to their predictors by recursive binary splits. In boosted regression trees (BRT) the model accuracy is improved by repeating the regression tree algorithm adjusting the parameters in each iteration, similar to the "functional gradient descent" concept . BRTs have very little restrictions, can handle different types of variables with no need of data transformation or outlier elimination, and can fit complex non-linear relationships. Through BRTs we can estimate the relative contribution of each input variable to the classification, measured as the number of times a variable is selected for splitting the tree, weighted by the model improvement by that split, and averaged across all trees ).
We fitted 15 BRT different models in total, seeking to unravel the relative contribution of each variable in defining different target clusters: one multinomial for the 13 SELS simultaneously, one multinomial for the 5 SER simultaneously, and then individual binary models for each of the 13 SELS classes. Calculations were performed in R with the gbm function (gmb package, ) for the multinomial models and gbm.step function for the binomial models (dismo package, Hijmans et al. 2017). Model parameters are shown in Box A1.1. To evaluate how well the BRT models fit for each case we monitored the evolution of the holdout deviance along the iterations ( Figure  A1.6). . The level of uncertainty for a hexagon belonging to a SELS was calculated as the average dissimilarity between that hexagon and the rest of hexagons within the same SELS. Greater dissimilarity means greater deviation of that hexagon respect to the average SELS characteristics. These figures help distinguishing areas where the classification was more consistent from those where it was less representative due to the heterogeneity of the territory. We labeled the 20% of hexagons with lower uncertainty (blue), the 20% of higher uncertainty (red), and the rest 60% medium uncertainty (green). Model SER: classification into the five SER; Model SELS: classification into the 13 SELS; Models from A1 to E3 consist in distinguishing one particular SELS from the rest as a whole. While SER and SELS models are comparing several classes simultaneously, the others are binary models focused in one SELS at a time. Hyphens means less than 0.01 % relative importance. The region has a very low population density, associated with limitations for human agency due to a rigorous climate, which explains why it is separated from the northern Altiplano. Traditional land uses (i.e., extensive sheep raising and marginal agriculture) are experiencing dis-intensification. Tourism, in contrast, is on the rise, with less conventional forms prevailing (e.g., ecotourism, cultural tourism). Mining has strong potential, with both ongoing active expansion (e.g. Lithium salt brines in the Puna) and socio-environmental conflicts resulting from advanced planned projects (e.g. Gold mines in Patagonia). Protected areas are extensive, widespread, and with comparatively few conflicts, but in some areas invasive species are expanding their range. Wildlife is generally in good shape and often recovering. Most of the SELS occurs along the Chilean-Argentine border, which implies some associated social dynamics (e.g. government investments associated with infrastructure, military and bureaucratic jobs, and relatively mild international conflicts during the 20th century).

SELS A2. Remote cold ecotonal extra-tropical Andes
It includes areas bordering SELS A1 to the east, in the ecotone with places at lower elevation, both in the Puna highlands and in the southern temperate forests (area=32 million hectares). Temperature and human population are low, but higher than SELS A1, and rainfall is never as high as in SELS A1. Protected areas are common, and there are relatively small but prosperous urban centers, often associated with tourism and small-scale intensive agriculture. With a more mesic environment than SELS A1, vegetation alternates grasslands, shrublands and forest woody patches. Fire is a relatively common component of ecological functioning and of human-environment relationships (Veblen et al. 1999). This region has high travel time to ports, hence qualifying as "remote".

SELS A3. Low-diversity cold and temperate grassy rangelands
Dominated by a shrub-grass steppe, with low plant diversity, low forest cover, and medium shrub cover (area= 73 million hectares). The northern part corresponds to the "Monte-Arid Chaco-Espinal" and the southern part corresponds to the "Patagonian Steppe". The climate of this region is arid and semi-arid, and cold or seasonally cold, reaching freezing temperatures throughout the region. Plant diversity is low. Human population is also low and concentrated in humid valleys with irrigated agriculture. The rural inhabitants depend mainly on livestock grazing, such as sheep and goats. Overgrazing has led these systems to show signs of desertification, which intensifies the low productivity in the region (Jobbágy and Sala 2000), including the decreasing of low-cover palatable species and increasing relative cover of unpalatable grass species (Perelman et al. 1997).

SELS A4. Low-diversity low-populated shrubby rangelands
Located at the interphase between the "Monte" and the "Patagonian steppe" (area= 29 million hectares). It includes the well-developed irrigated valleys of northern Patagonia (Negro and Colorado rivers with important production of fruits such as apple and vineyards), and the surrounding drylands. It is characterized by a high coverage of shrubs, very low population density and extensive livestock grazing (Pol et al. 2005). There is also oil exploitation and irrigated production in the valleys, favored by good access to ports (e.g. San Antonio Oeste, Madryn). The dominant vegetation is grasses, shrubs, and small scattered trees (Cabrera 1976). In addition to its biogeographical core of Southern Monte shrublands, this SELS seems to capture shrub-encroached areas elsewhere, like in the Chaco plains. This SELS has higher shrub and cattle density compared to SELS A3. These differences between both SELS are useful to highlight the dynamism of the system in the Monte-Espinal transition, as overgrazing or abundance of fires can transform a portion of SELS A3 into SELS A4.

SELS B1. Arid and semi-arid highlands and adjacent coast, with long history of agriculture and mining
This is the only SELS integrating the homonymous SER B, which spans over the Southern and Central Andes (area= 126 million hectares). It has a cool and overall dry climate which limits agriculture to irrigated areas in valleys and coastal areas and seasonal rainfed cultivation in higher lands. This SELS ranks highest in crop diversity due to its rough geomorphology, high climatic diversity but also its ancient settlement history (before 1700) and relatively high population density (including some large cities such as Lima and Santiago). Therefore, it represents a hotspot of agro-biodiversity linked to both biological and cultural diversity (Mathez-Stiefel et al. 2012, Sietz andFeola 2016). The combination of urbanization, subsistence agriculture, seasonal rainfall and rough topography makes the highland areas very sensitive to climate change and to land use change (Ochoa-Tocachi et al. 2016, Tito et al. 2018). The narrow semi-arid Pacific coast is characterized by export-oriented, irrigated agriculture and concentrates most of the economic and political power, especially in Peru. Overall short travel time to ports implies high influence of the overseas trade on regional processes. It ranks highest in mining density, highlighting its social-ecological impact including the ongoing and potential conflicts between extractive activities, traditional and commercial agriculture, environmental conservation and tourism (Tovar et al. 2013, Pérez-Rincón et al. 2019).

SER C. Consolidated large scale agropastoral plains SELS C1. Urbanized large scale agricultural plains
Covers the grasslands, savannas and shrublands of Uruguay, central east Argentina, and east Paraguay, in addition to patches in south Brazil, Bolivia, Colombia and Venezuela (area= 196 million hectares). This SELS is defined by the presence of agriculture in flat sedimentary landscapes. It is dominated by highly productive rainfed agriculture (e.g. wheat, maize, soybean, sunflower) and outstanding cattle production, but also includes places with irrigated crops. There is internal heterogeneity, with gradients of anthropization levels and different histories of agricultural expansion. In general, these are densely populated areas, with the presence of large farmers and economical power concentration. Land use and land cover changes have transformed the landscape structure and dramatically altered the original vegetation cover (Baldi and Paruelo 2008, Gasparri and Grau 2009, Vallejos et al. 2015. These changes have a major impact on the provision of ecosystem services, and have also generated asymmetries in the use and access to natural resources between stakeholders.  (Nanni et al. 2019, Aide et al. 2019. Grass, for cattle grazing, is the dominant cover, followed by trees, and crops. Important crops in the region include coffee and cacao (Rueda and Lambin 2013), and irrigation helps to support a high diversity of Andean crops. Legal and illegal coca plantations are an important feature.

SELS D2. Intensive, market-connected hilly agropastoral systems with long colonization history
It is dominantly located in hilly to partly mountainous terrain, but with excellent access to larger markets and economic hubs (area= 105 million hectares). These areas have a long history of early colonial occupation, and experienced several periods of political instability (Dean 1997, Joly et al. 2014). Land use is diverse and heterogeneous, yet agricultural systems are characteristic for this SELS, dominated by high-intensity cattle husbandry and croplands. The grassland cover is oftentimes composed of planted pasturelands. Croplands are of relative low diversity (i.e. monocultures) and include annuals (e.g., soybean, maize), perennials (e.g., coffee, orange, eucalyptus), and semi-perennials (e.g., sugar cane). Population density is among the highest on the continent, with many communities living in medium-sized cities, but also in metropolitan regions such as Sao Paulo and Rio de Janeiro. The main contiguous area is dominated by fragmented tropical rainforest corresponding to the biome "Mata Atlântica" (i.e., Atlantic Forest) in Brazil (Ribeiro et al. 2009). To the northwest of this main area still within Brazil, climate is dryer and vegetation transitions to the "Cerrado." The most northern regions encompass parts of the Colombian and Venezuelan Llanos.

SELS D3. Highly populated and biodiverse historical semi-arid areas
Corresponds mostly to the Caatinga and some parts of the Cerrado in Eastern Brazil and includes dry valleys in the eastern slopes of the tropical-Andean valleys of Peru and Bolivia (area= 99 million hectares). It also includes the Santa Catarina area in Southern Brazil and some portions of Central Colombia and NW Ecuador. These areas seem not to fit this description (being more rather humid areas) which is supported by the high classification uncertainty associated with some of these regions. The definition of this SELS appears to be a result of high deforestation and a long history of landscape transformation. It is characterized by a semi-arid climate with very high temperatures, a rough topography and is covered by dry forests and shrublands. This historical settlement area (before 1700) still maintains densely populated areas and has high levels of both plant and crop diversity, including irrigated agriculture.

SER E. Tropical forests with low anthropization SELS E1. South American lowlands: new agropastoral frontiers
This SELS corresponds to agricultural frontier regions located in the flat warm lowlands of South America, and includes biomes such as the Amazon, Cerrado and Chaco (area= 183 million hectares). While this SELS is dominated by forested landscapes, some other areas include naturally open ecosystems (e.g. the Bolivian Llanos de Moxos, or the Humid Chaco ecoregions). Although the landscape is mainly dominated by natural vegetation, many areas have been subject to active land use changes throughout the past five decades (e.g. the colonization of Brazilian states of Pará, Mato Grosso, and the Argentinian East Chaco began around the 1970's along highway constructions, indicating many of these settlements have been long established and are no longer "active" frontiers). Accessibility levels are intermediate, and while the population is predominantly rural, small and medium cities are growing in importance as the service economy develops, especially in association with agricultural production. Conflicts around land use are common, involving clashes between existing populations, landless people, and new settlers and between agribusiness and subsistence agriculture (Caldas et al. 2010, Aldrich et al. 2020. These conflicts are related to vast inequities in land distribution and associated production opportunities; and an overall pressure on natural resources for the production of global commodities (Simmons et al. 2010). Thus, this is a highly dynamic SELS where some regions may currently be in transition, with an unstable equilibrium of natural landscapes affected by different land use practices (e.g. extensive cattle ranching, commodity crop production, fires), and climate change (Silvério et al. 2013, Nobre et al. 2016).

SELS E2. Remote and mountainous tropical lands
This SELS is mainly located in the very humid foothills and lower montane areas of the Amazon and Orinoco basins (area= 117 million hectares). It also includes the Guiana highlands of Venezuela, Guyana, Suriname, and Brazil, the Eastern slope of the Andes (upper Amazon in Ecuador, Perú and Bolivia), as well as a few other scattered forested highlands in Argentina, Bolivia, Brazil and Colombia. The SELS is mainly characterized by high levels of forest cover or natural vegetation; however, it also includes some agricultural land uses such as coffee and cacao plantations. It is characterized by rugged/mountainous geomorphology, and low levels of accessibility. It has the largest proportion of protected area, which includes high profile conservation areas reflecting the importance of this SELS for biodiversity and related ecosystem services. The management of many of these lands is tied to national systems of protected areas as well as widespread and vast indigenous territories (Achtenberg 2013, Rodriguez 2017. This SELS should not be mistaken with intact "wilderness". Instead, it exemplifies a Social-Ecological system where many traditional communities co-exist with conservation, tourism, forestry, and other extractive activities. These extractive activities are also relevant for SELS E3.

SELS E3. Tropical forests with low anthropogenic conversion
This SELS mainly covers the most isolated regions of the Amazon basin, plus other highly forested regions, such as the deciduous forests of northern Argentina, northern Paraguay and eastern Bolivia (area= 437 million hectares). The spatial extent of this SELS overlaps with old growth or minimally disturbed forests by post-Columbian populations (Tyukavina et al. 2016, Potapov et al. 2017. Environmental characteristics include vast, relatively flat areas often flooded, high temperatures, high precipitation, and high forest cover. Human settlements tend to be small and sparsely distributed along rivers, with low levels of accessibility by roads. Overall, this SELS has fewer anthropogenic pressures on the environment, but also lower levels of monitoring, enforcement, and governance. While some regions of this SELS do include small-scale subsistence agriculture, other land uses related to extractive activities exist, yet are difficult to detect with current remote sensing technologies, as they do not necessarily coincide with extensive land cover changes. Some of these extractive activities might include forest degradation, forest fires and burned areas, defaunation processes catalyzed by rural and indigenous communities that practice hunting or poaching (
Proponemos esta métrica como un indicador de la variación espacial de la incertidumbre en la clasificación. El nivel de incertidumbre para cada hexágono fue calculado como el promedio de los valores de disimilitud entre ese hexágono y todos los otros pertenecientes a su mismo SELS. Mayor disimilitud indica un mayor desvío de ese hexágono en relación a las características promedio del SELS al que pertenece.
Algunas regiones con mayor incertidumbre incluyeron: la ladera este de la porción Norte de los Andes, la costa este de Venezuela, la porción central de las Guayanas y las regiones de los extremos norte y sur de la costa brasilera.

Influencia de las variables en la clasificación de SELS
Las variables más relevantes para caracterizar las clases variaron dependiendo de la escala de análisis.
Este SER incluye las mayores y más productivas áreas de producción y exportación de granos y carnes del continente, como también algunas de las ciudades más grandes y la infraestructura de transporte y exportación de commodities. La biodiversidad fluctúa pero es mediana in la mayor parte de la región,

Limitaciones de los datos
La mayor desventaja de los enfoques basados en datos es que están limitados por la poca disponibilidad de sets de datos adecuados. Frecuentemente la disponibilidad y calidad de los datos restringen la caracterización de aspectos importantes de los sistemas. En esta sección destacamos y discutimos un breve resumen de los principales vacíos de información que encontramos al realizar este trabajo que potencialmente lo podrían haber enriquecido, esperando que puedan ser resueltos en el futuro.