Novel Method for Correcting Observer-Specific Bias in Species Distribution Models

Author: Petr Balej

The prediction accuracy of Species Distribution Models (SDMs) is often compromised by sampling bias in species occurrence records. This issue is widespread in collaborative databases like the worldwide GBIF, iNaturalist, and the local Czech Species Occurrence Database. These platforms are filled with opportunistic species records of various quality collected by both expert and inexperienced observers. To address this critical limitation, we have developed a new method called the Presence-Weighted Observer-Oriented Approach (PW-OOA). Our research shows it mitigates this sampling bias more effectively than existing methods.

How does the PW-OOA method work?

The primary innovation of the PW-OOA method is its ability to correct for species-specific bias by modeling the heterogeneous intensity of individual observers sampling effort. The process involves three key steps:

1) Observer-Specific KDE: A unique kernel density raster is estimated for each individual observer based on all of their submitted records. This creates a spatial „signature“ of each person’s sampling patterns.

2) Presence-Weighting: For a given focal species, each observer is assigned a weight proportional to their relative contribution to that species‘ total presence records. Observers who document the focal species more frequently receive a higher weight.

3) Weighted Summation: The individual observer density rasters are multiplied by their corresponding weights and then summed to create a single, composite bias raster for the model.0

This figure illustrates the principle of weight assignment using kernel density (KDE) for a sample of two observers. Note the difference (yellow highlight) that results from using a different grid size configuration. This change is reflected in both the final sum (Σ) and the lower intensity in the area circled in red. The lower intensity is due to observer1 (x) having fewer occupied squares at this grid size.

What data and methods did the study build on?

Our study used a dataset of 108 bird species from the Czech Republic. To characterize the environmental niche, we used key bioclimatic variables and vegetation indices. Specifically, the study incorporated NDVI (Normalized Difference Vegetation Index) and EVI (Enhanced Vegetation Index) derived from MODIS satellite imagery for April, May, and June to capture critical seasonal variations in vegetation relevant to avian breeding habitats.

Our work builds upon established methods for addressing observation bias. The first is the Target-Group Occurrences Background (TGOB), which generates background points from the locations of all species in the target group (e.g., all birds). The more advanced TGOB+ approach enhances this by applying kernel density estimation (KDE), which generates a continuous „bias raster“ representing the intensity of sampling effort. This smoothing is controlled by a bandwidth parameter. Our PW-OOA method is the next logical step: it takes the individualized KDE approach and incorporates the contribution of each specific observer to the total observation sum of each species.

Comparison of SDM prediction performance of 108 bird species differing in species prevalence measured as AUC improvement over the random method (AUC (diff) = difference between the STSP, TGOB, TGOB+, or PW-OOA versus the random method). Spatial thinning of species presences (STSP), Target Group Occurrences Background (TGOB), TGOB+ (tuned up TGOB by adjusting kernel smoothing bandwidths) and presence-weighted observer-oriented approach (PW-OOA).

Conclusions

When independent validation data is available, the PW-OOA method is the optimal choice for bias correction. In its absence, the tuned TGOB+ method provides the most reliable and robust alternative. This research underscores the importance of considering individual observer behavior in SDMs and provides a powerful new tool to enhance the accuracy of biodiversity models that rely on unstructured opportunistic data.

For more details please visit: https://doi.org/10.1002/ecog.08202