2.1 Statistical methods
The occurrence of the fish and lamprey species in relation to the environmental variables was modelled using binary logistic regression (BLR) analysis. In the preprocessing phase, highly (>0.7) multicollinear predictors (latitude and precipitation) were removed from the BLR analysis. The final number of environmental variables (predictors) accepted for BLR analyses was therefore nine (Table 1). To avoid pseudoreplication, only one randomly selected electrofishing sample per site was included (N=487). Rare species, present in less than 3% of the sites, were excluded from the analysis, resulting in 13 species for the modelling (Table 2). The statistical significance of each predictor was assessed by a chi-square test, with p-value <0.05 indicating a significant impact. To assess the fit of the models to our data, Nagelkerke (pseudo) R2 was calculated for each model. Also Hosmer-Lemeshow goodness-of-fit test (Hosmer & Lemeshow, 1989) was used with p-value >0.05 indicating an acceptable model fit. Accuracy of the BLR model was calculated as the percentage (%) of the studied sites where the presence or absence of a fish species was predicted correctly. BLR analyses were conducted by IBM SPSS Statistics 26.
The interactions between 13 species occurrences and 9 environmental variables were further studied using a self-organizing map (SOM, Kohonen, 1982, 2001). In contrast to BLR, all species were processed in a single model. In general, SOM is an unsupervised dimensionality reduction method that visualizes high-dimensional data in a low-dimensional map. In ecology, SOM has been extensively implemented for information extraction, visualization, and clustering of community data (Chon, 2011). Compared to some conventional statistical methods (e.g. PCA, NMDS) used for community ordination, SOM has performed well, for example, by allowing the visualization of interspecific association even if it differs in different parts of the data space (Giraudel & Lek, 2001). In addition, the network tolerates noise (Vesanto, Himberg, Siponen, & Simula, 1998) by allowing outlying samples to affect only one map unit and its neighborhood. The other areas of the map are not affected by these data (Kaski, 1997). In this study, unsupervised SOM was used to patternize 22 predictors (13 species + 9 environmental variables) and 487 samples with a two-dimensional map which were then grouped, i.e. clustered. This two-stage procedure, first using SOM to produce the prototypes that are then clustered in the second stage, has been found to perform well compared with direct clustering of the data (Vesanto & Alhoniemi, 2000). The two dimensions of SOM were clustered using the k-means algorithm (Kohonen, 2014). The Davies Bouldin validity index (Davies & Bouldin 1979), which measures between- and intra-cluster distances, was used as a performance criterion. In the parameter optimization, SOM net sizes (number of nodes in x and y dimensions) and the number of clusters in parameter k were altered, using a grid search until the minimum of the Davies Bouldin index was found, using the elbow criterion. In parameter optimization, the SOM net size roughly followed the map size rule (of thumb) of Vesanto & Alhoniemi (2000; N(nodes) = 5 x sqrt(Nrows)). Each trial SOM consisted of 10,000 training rounds. In the preprocessing phase, the occurrence of each fish species was dummy (zero or one, absence or presence) coded. All predictors were then normalized with a zeroed mean and variance of one. The learning rate function was inverse of time, which ensures that all samples have an approximately equal influence on the results. The statistical analyses were performed using RapidMiner software (version Studio Large 9.7.000., https://rapidminer.com /, Mierswa, Wurst, Klinkenberg, Scholz, & Euler, 2006).
Results
As anticipated for the small catchment areas of this study, there was high variation among sites in the catchment land-cover variables (Table 1). The average catchment size and altitude at the sites occupied by each of the fish species varied considerably. To illustrate this, the positioning of three species in the catchment size – altitude space suggests that three-spined stickleback occupied small low-altitude brooks, whereas brook trout, Salvelinus fontinalis (Mitchill 1814), dwelled in tributaries, and grayling, Thymallus thymallusL., in larger streams (Figure 1).
The BLR models were statistically significant (ꭓ2 (9)= 35.4 – 154.7, p<0.005) for all fish species, with the exception of brook lamprey, Lampetra planeri (Bloch), (ꭓ2 (9) = 10.1, p=0.340). The highest Nagelkerke R2 values were recorded for the two stickleback species (Table 2). In Hosmer-Lemeshow goodness-of-fit tests, the p-values were >0.08 for all the models, indicating an acceptable model fit for the data. The accuracy of the models was usually high, ranging from 96.9% with ninespine stickleback to 62.8% with brown trout. The absence of fish species was predicted by the models much more correctly than presence, as indicated by specificity (average 94.6%, SD 13.6%) vs. sensitivity (average 28.1%, SD 21.1%) (Table 2).
The statistical performance of the best SOM model including all species was good (explained variance 55.94%, Davies Bouldin index 0.71). The net size (11 x 11 = 121 nodes) of the best-performing SOM model roughly followed the map size rule of thumb (110 nodes). The smallest Davies Bouldin index was attained with 4 clusters (Figure 2). Cluster 2 at the top right of the SOM was occupied by bullhead, burbot, Lota lota(L.), grayling, and minnow, Phoxinus phoxinus (L.), and characterized by a large catchment area, high altitude, and low mean temperature (Figure 2, Table 3). Cluster 1 at the top left of the SOM was occupied by perch, Perca fluviatilis L., roach, Rutilus rutilus (L.), and northern pike, Esox Lucius L., and characterized by a high water temperature at sampling, high annual mean temperature, large catchment area, and low altitude (Figure 2, Table 3). Cluster 0 at the bottom left of the SOM was occupied by the two stickleback species, and characterized by a low altitude, high annual mean temperature, low water temperature at sampling, and high percentage of urban areas in the catchment. Brook trout was present in sites clustered at the bottom right of the SOM, indicating preference for cold high-altitude tributaries. Brown trout and stone loach, Barbatula barbatula (L.), seemed to occupy two clusters simultaneously, whereas the occurrence of brook lamprey could not be linked with any of the studied environmental variables (Table 3, Figure 2).
The ranking of species in the mean air temperature gradient revealed the two stickleback species favoured a warm environment, whereas minnow appeared to be the ultimate cold-water species (Figure 3).
Discussion
Modelling fish species occurrence in small boreal streams with a logistic regression and self-organizing map indicated clear species-environment relationships. The obtained species clusters and their associations with mainly map-based variables appeared ecologically reasonable and largely concordant with species groupings in the current bioassessment developed for larger boreal streams (Vehanen, Sutela, & Korhonen, 2010). The results support the development of fish-based bioassessment for small streams and help in predicting fish assemblage changes in a warming climate.
The effect of small-scale local factors on controlling the occurrence of lotic fish species has been found in numerous studies (e.g. Watson & Hillman, 1997; Lamouroux, Capra, Pouilly, & Souchon, 1999; Wang et al., 2003). However, the dominance of large-scale regional factors affecting riverine fish assemblages has also been documented (e.g. Koel & Peterka, 2003; DeRolph et al., 2015; Mitsuo, 2017). A wide variety of hypotheses or theories has been put forward concerning the balance of local and regional factors affecting riverine fish assemblages. It has been hypothesized that large-scale processes determine the pool of the fish species available to occur, whereas small-scale processes eventually define the subset of fish species inhabiting a given site (Pont, Hugueny, & Oberdorff, 2005). Although local habitat conditions may be important determinants of fish abundance, they may be of limited importance in determining presence and absence (Porter et al., 2000). Sensitivity to local- and regional-scale processes has been found species-specific (Pont et al., 2005). It was suggested that local factors were most important to fish in minimally impaired watersheds, but the effects of landscape-scale factors become increasingly important as watersheds are increasingly modified by human activities (Wang et al., 2003). However, a combination of local and regional variables has often managed to explain a great deal of the variance in riverine fish occurrence or density (e.g. Ripley, Scrimgeour, & Boyce, 2005; Pont et al., 2005; Park, Grenouillet, Esperance, & Lek, 2006). Obviously, both local and regional variables have an effect, and the inclusion of local variables in our models would probably have enhanced the predictive power. However, our results encourage the use of map-based (regional) variables in modeling the species-environment relationships in small streams, especially when confronting limited resources to control site-specific local variables.
The sensitivity of the BLR model was rather poor, at least compared to specificity (Table 2). The relatively small size of the electrofishing area and the use of a single-run electrofishing sampling in this study may have decreased the probability of getting all the fish species in the catch. The information generated by single-visit surveys of fish occurrence cannot account for intra-annual or interannual variation in the upstream extent of fish distribution (Fransen et al., 2006). Small streams are vulnerable to drought events inducing temporal variation in fish assemblages (Grossman, Ratajczak, Crawford, & Freeman, 1998; Keaton, Haney, & Andersen, 2005). Our model’s prediction of species occurrence and absence may be of use in extending the current fish-based bioassessment (Vehanen et al., 2010) to small brooks. For management and inventory purposes, we recommend the application of larger data and cross-validation in BLR.
The SOM clusters of fish species and environmental variables appeared plausible. Cluster 0 was occupied by two stickleback species that seemed to favour warm regions, low altitude, and the high share of urban areas in the upper catchment. Sticklebacks have been considered to indicate degradation in lowland brooks (Fieseler & Wolter, 2006). Freshwater fish communities have been found sensitive to watershed urbanization (Chen & Olden, 2020).
The occurrence of perch, roach, and northern pike (Cluster 1 in SOM) was associated with a high annual mean temperature, a relatively large catchment area, low altitude, and lakes in the upper catchment. These three fish species are common lake species (Maitland & Campbell, 1992) possibly spreading to small streams at warm water periods (Degerman & Sers, 1994; Sutela, Vehanen, Huusko, & Mäki-Petäys, 2017). This trait was supported by the frequent occurrence of these species with high temperature at sampling (Table 3). The occurrence of bullhead, burbot, grayling, and minnow was associated with a relatively large catchment area, a high altitude, a low mean temperature, and open mires in the catchment (Cluster 2). The fish species in this cluster can be characterized as cold-water species (Logez, Bady, & Pont, 2012) living in forested peatland regions. The only fish species centering cluster 3, brook trout, favoured cold and small high-altitude streams. Brook trout is an alien invader species in Europe, having been stocked in many Finnish tributary streams. Brook trout also prefers small tributary streams in its home district in North America (Kanno, Letcher, Rosner, & O´Neil, 2015). Alien brook trout has been found to exclude brown trout in small Finnish brooks (Korsu, Huusko, & Muotka, 2007).
The appearance of the most frequently encountered fish species, brown trout, was centered in clusters 0 and 3 with avoidance of ditched peatland in the upper catchment. The drainage ditching of peatland for forestry causes the erosion and deposition of fine sediments in headwater streams, accompanied by nutrient loading (Marttila & Kløve, 2010; Nieminen et al. 2018). Deposited sediment can diminish salmonid embryo survival by decreasing redd gravel permeability, interstitial water exchange, and therefore oxygen supply (Greig, Sear, & Carling, 2007; Louhi, Ovaska, Mäki-Petäys, Erkinaro, & Muotka, 2011; Michel et al., 2014). These impacts may have suppressed the occurrence of brown trout in catchments with a high coverage of ditched peatland in this study.
Climate change scenarios forecast a high increase in the mean air temperature for the European boreal ecoregion (Schneider, Laize, Acreman, & Flörke, 2013). Fish species have evolved to fit distinct thermal niches where they can optimize physiological, reproductive, and ecological performance (Coutant, 1987; Graham & Harrod, 2009). Temperature is one of the key abiotic factors affecting fish species distribution (Matthews, 1998). Globally, fish species living in small headwater streams are especially vulnerable to climate change (Buisson, Thuillier, Lek, Lim, & Grenouillet, 2008; Buisson & Grenouillet, 2009). The presented ranking of the fish species along the mean air temperature gradient can help in predicting the effects of a warming climate on fish assemblages in the studied region. The breadth of the thermal range largely delineates the ability of fish species to adapt to climate change (Buisson & Grenouillet, 2009; Logez et al., 2012). In this study, minnow expressed a relatively narrow thermal range at the cold end of the gradient (Figure 3), suggesting high vulnerability to the warming climate in this region. The thermal ranges of some fish species (e.g. brown trout, perch, and northern pike) may vary, depending on the size of the catchment area and stream power (Logez et al., 2012). Accordingly, the inclusion of large rivers in the analyses could result in a different outcome for fish species ordination along the mean temperature gradient. These findings suggest that local stream characteristics should be taken into account when predicting the effects of climate change. Besides the increase in the mean air and river water temperature in the European boreal ecoregion, future winter discharges are likely to increase from the natural flow regime, while summer flows will be less impacted (Schneider et al., 2013). The discharge aspect, although probably of minor importance in the boreal region, should also be taken into account when predicting the effects of a warming climate on boreal riverine fish assemblages.
The assessment of the ecological status or integrity of surface waters has been widely established around the world (Karr & Chu, 2000; Xu et al., 2014; Poikane et al., 2020). In Europe, the legislation to achieve a good ecological status in surface waters is guided by the WFD (European Commission 2000). Bioassessment methods in rivers have been developed using three biological groups: periphytic diatoms, benthic invertebrates, and fish fauna. Stream biota is often impaired by multiple pressures interacting in additive, synergistic, or antagonistic ways (Schinegger, Trautwein, Melcher, & Schmutz, 2012). Diagnostic tools for distinguishing the impacts of different pressures have been called for to target the diminishing measures in water pollution control (Lemm, Feld, & Birk, 2019; Poikane et al., 2020). In this study, map-derived pressures of agriculture (fields), urban land cover, and drainage ditching for forestry seemed to affect the occurrence of certain fish species. These results encourage for the development of diagnostic fish-based pressure-specific metrics for small boreal streams.
A simple diagnostic tool (index) for evaluating direct effects of climate change could be calculated as an average of two metrics, the proportion of cold-water species (climate change intolerants, scaled to 0-1), and the proportion of warm-water species (climate change tolerants, scaled to 0-1, inverse values) of an electrofishing sample. Referring to Figure 3, in our case the cold-water species could be minnow, grayling and brook trout, and the warm-water species three-spined stickleback, ninespine stickleback, and stone loach. For a wider use of this index, temperature preferences could be achieved like in this study, or by using existing knowledge and references about temperature preferences of fish species, such as Logez et al. (2012). Possible indirect effects of climate change stemming from flushing of nutrients (Wilby, Orr, Hefger, Forrow, & Blackmore, 2006), for instance, could be integrated to the index following the basics presented in Hering, Feld, Moog, & Ofenböck (2006).
In the fish-based integrity indices developed in bioassessment for boreal and northern temperate zone, cool- or cold-water fish species are often classified as intolerant species (Kanno, Vokoun, & Beauchene, 2010; Vehanen et al. 2010). This feature is also seen in the Figure 3, where seven species from the left indicating favour of cold water can be classified as intolerant (grayling, brook trout, bullhead, brook lamprey, and brown trout) or intermediately tolerant (minnow and burbot) referring to Holzer (2008). Respectively, the six species on the right-hand side indicating favour of warm water can be classified as tolerant (perch, roach, ninespine stickleback, and three-spined stickleback) or intermediately tolerant (northern pike and stone loach). The classification to tolerant and intolerant species by Holzer is highly compatible with those used in fish integrity indices, such as Pont et al. (2006), Hughes, Howlin, & Kaufman (2004), Vile & Henning (2018), and Vehanen et al. (2010). Observed pattern in the sequence of species in the temperature gradient in relation to tolerant–intolerant division of species suggest that the integrity indices developed for regions inhabited by these species should respond to the water temperature rise in streams on itself, without the possible influence via indirect effects such as altered discharge regime and flushing of extra nutrients (Wilby et al., 2006). In other words, the effect of climate change strictly as warming of the streams should be (by chance) at least to some extent inborn in many of the present fish indices. As an example, cool-water vs. warm-water species balance obviously affects the FiFI index values, which can be easily approximated or calculated based on the metrics by Vehanen et al. (2010). At any rate, when aiming to integrate the effect of global warming to fish indices, the effect of warming on the reference sites should be controlled by referring to the earliest reliable electrofishing data or other historical fish data. This somewhat different approach from the more adaptable attitude to the direct effects of climate change in WFD (Nõges, Van de Bund, Cardoso, & Heiskanen, 2007; Kristensen, Whalley, Néry, Zal, & Christiansen, 2018) could be considered also with other biological quality elements.
Acknowledgments
We utilized data collected in the Life IP project FRESHABIT (LIFE Programme of the European Union) in this study. The study reflects the views of the authors, and neither the European Commission nor the EASME is responsible for any use that may be made of the information it contains. We thank Hanna Hentilä, Minna Kuoppala, and Kati Martinmäki-Aulaskari from SYKE and Auli Immonen from Luke for catchment delineation and other GIS work.