Birds were fitted with transmitters (Hunan Global Messenger Technology Company, China) programmed to record GPS position and speed every 1-3 hours depending on the battery condition. Transmitters were solar powered to enable the global system for mobile communication (GSM) to transmit data via the short message service (SMS). These back-pack design transmitters were 55x36x26 mm in size and weighed 22g (appr. 1.6% of the bird’s body mass; Lei et al . 2019a). As Mobile network coverage is sparse or non-existent in summering sites of North-East Russia, the stored data obtained from that area were downloaded when birds returned to China.
GPS records of locations (accuracy of <1000 m) were used in the analysis of A. erythropus journeys to Russia. For non-breeding A. erythropus (the longest one-way migration recorded was 16,172 km in 60 days, Lei et al . 2019b), it was assumed the spring migration turned to summering activities when the trans-latitudinal movement became mostly trans-longitudinal. Like spring migration, we assumed summering was terminated when a pronounced southbound movement was detected. For breeding birds, the date of arrival at a breeding site was used to indicate the start of summering. The site was classified as staging if the bird stayed at a location for more than four days.

Environmental predictors

To model the potential summering habitat, a range of environmental variables were used including bioclimatic, geomorphological, land production and human disturbance.
Bioclimatic Bioclimatic variables were taken from the 30 second WorldClim (v2.1) climate data, downloaded from http://www.worldclim.org, which were generated through interpolation of monthly mean temperature and rainfall data from weather stations for the period of 1970-2000 (Hijmans et al ., 2005). We selected five variables that are relevant to geese summering including Max Temperature of Warmest Month (Bio5), Mean Temperature of Wettest Quarter (Bio8), Mean Temperature of Warmest Quarter (Bio10), Precipitation of Wettest Month (Bio13) and Precipitation of Warmest Quarter (Bio18).
GeomorphologicalTopographic heterogeneity is important for species distribution (Austin and Van Niel 2011). Three topographic variables were included in the modelling, namely elevation, LDFG (Local Deviation from Global Mean) and TRI (terrain ruggedness index). The global 1 km resolution digital elevation model (DEM) for the study area was downloaded from (http://srtm.csi.cgiar.org/) and cropped with the study. Based on the DEM, LDFG and TRI were calculated as:
\(LDFG=y_{i}-\overset{\overline{}}{y}\) (1)
where \(\overset{\overline{}}{y}\) is mean evaluation of the 3 by 3 window, and \(y_{i}\) is the elevation of the focus grid. Positive LDFG values represent locations that are higher than the average of their surroundings, as defined by the neighborhood (ridges). Negative LDFG values represent locations that are lower than their surroundings (valleys). LDFG values near zero are either flat areas (where the slope is near zero) or areas of constant slope (where the slope of the point is significantly greater than zero).
\(TRI={(\sum{(Z_{c}-Z_{i})}^{2})}^{1/2}\ \) (2)
where Zc is the elevation of the central grid andZi is the elevation of one of the eight neighboring grids. The terrain ruggedness index (TRI) is a topographic measurement developed by Riley, et al . (1999) to quantify topographic irregularities in a region.
As A. erythropus is ecologically dependent on wetlands, and often observed breeding along river valleys (Solovieva et al . 2011), we included a layer of distance to steams in the modelling. We generated the raster using polylines in the Global River Widths from Landsat (GRWL) dataset (George, 2018) as the central lines. The polylines were checked to be a good represent of the rivers in the study area.
Land production To characterize land production, we calculated three variables (EVImax, EVIhom and EVIrange) using EVI (Enhanced Vegetation Index) time series (2000-2009). The 10-day global EVI images with 333 × 333 m resolution were downloaded from Copernicus Global Land Service (https://land.copernicus.eu/global/products/ndvi, data downloaded on 28 August 2019). EVImax is an indicator of peak land productivity and was calculated as the 10-year mean of annual max EVI. EVIrange is the range of land productivity (i.e. EVImax -EVImin). EVIhom is the similarity of EVI between adjacent eight pixels, and was computed as (Tuanmu and Jetz 2015):
\(\text{EVI}_{\hom}=\sum_{i,j=1}^{m}\frac{P_{i,\ j}}{1+{(i-j)}^{2}}\)(3)
where m is the number of all possible scaled EVI values (i.e. 100) and\(P_{i,\ j}\) is the probability that two adjacent pixels have scaled EVI values of i and j , respectively. Both EVIhom and EVIrange can be indicator of habitat diversity.
Human Disturbance Human disturbance can lead to declines and local extinctions of avian species as well as habitat loss (Vollstädt et al . 2017). The inclusion of human disturbance data can increase the performance and accuracy of SDM (species distribution model - Stevens and Conway, 2020). We compiled a database of all human settlements including villages and towns in the study area (i.e. Republic of Sakha, Magadanskaya Oblast and Chukotskiy Autonomous Okrug) and generated a layer of distance to settlements as a proxy of human disturbance. Settlements with zero registered inhabitants (abandoned and closed before 2011) were excluded.
Land Cover Forcey et al . (2011) found that land use has strong effects on waterbird distribution, and the percentage of waterbird abundance is positively related to the area of wetland. In this study, we used the 2015 global land cover map derived from satellite observations by Land Cover Climate Change Initiative (CCI) and available fromhttps://maps.elie.ucl.ac.be/CCI/viewer/download.php . The map classifies the global terrestrial system into 28 major classes using United Nations Food and Agriculture Organization’s land cover classification system (Di Gregorio 2005).
R (R Core Team, 2019) packages “raster” (Hijmans et al . 2015) and spatialEco (Evans and Ram, 2018) were used for raster manipulation and calculation.

Modeling

A total of 96 geo-referenced records were compiled by combining the tracking data and historical surveys (post 1999) (Table S2 in Supplementary). To analyze the potential breeding range, maximum entropy implemented in the Maxent package (version 3.4.1) was used. Maxent is among the most robust and accurate SDM techniques (Elith et al., 2006). In the past two decades, it has gained popularity in conservation studies, partly because the technique is less sensitive to the number of recorded sites and uses presence-only data (Elith et al., 2011). In developing the SDM, the program was set to take 75% of the occurrence records randomly for model training and the remaining 25% for model testing. The mean area under the receiver operating characteristic curve (AUC) was used to evaluate model performance, and AUC values > 0.75 are considered as suitable for conservation planning (Lobo et al., 2008). The modelling process was replicated 100 times and we reported the mean as summering ranges to reduce the sampling bias (Merow et al ., 2013).
Although collinearity is less of a problem for machine learning methods in comparison with statistical methods (Elith et al ., 2011), minimizing correlation among predictors prior to model building is recommended (Merow et al ., 2013). We used VIF (Variance inflation factor) to select predictors (Dupuis and Victoria-Feser, 2013). Nine variables with VIF less than 10, including two bioclimatic variables (Bio10 and Bio18), two topographic variables (DEM and LDFG), two productivity variables (EVIhom and EVIrange), land cover, Distance to stream, and Distance to settlement, were included in model building.
Using the logistic outputs of MaxEnt, we applied the minimum training presence threshold (MTP) to produce binary habitat map. MTP threshold finds the lowest predicted suitability value for an occurrence point and ensures that all occurrence points fall within the area of the resulting binary model (Elith et al ., 2011).