Improving Canopy Height Models in Temperate Regions: The Key to Accurate Forest Monitoring

Author: Vojtěch Barták

Canopy height is one of the most important characteristics of forest structure. It is closely linked to stored biomass, forest productivity, and biodiversity. As we face increasing pressure from climate change and environmental degradation, consistent global monitoring of forest canopy height is crucial for understanding the state of our ecosystems and making informed decisions about their management and conservation.

In this blog post, we discuss our recent preprint submitted to Science of Remote Sensing, which examines methods for improving canopy height models in temperate regions using satellite laser altimetry combined with machine learning.

The Challenge of Global Forest Monitoring

Currently, satellite laser altimeters are the primary technology for mapping canopy height at larger scales. These instruments, mounted on satellites orbiting Earth, send laser pulses down to the surface and measure the time it takes for the light to return, allowing scientists to calculate the height of vegetation. However, their use has significant limitations – they provide only sampled data that are sparse and discontinuous, covering just a small fraction of the Earth’s surface at any given time.

The Global Ecosystem Dynamics Investigation (GEDI) mission, launched to the International Space Station in 2018, has been collecting unprecedented amounts of high-quality vegetation height data. GEDI uses a full-waveform lidar system that can penetrate forest canopies and provide detailed information about vertical structure. However, like all spaceborne lidar missions, GEDI samples only along narrow tracks, leaving vast gaps between measurements.

To address this limitation, researchers have been developing methods to combine GEDI measurements with other satellite data sources, using machine learning to create continuous maps. This approach has shown promising results globally, with various studies demonstrating the potential of integrating optical, radar, and terrain data to predict canopy height between GEDI tracks. Our study contributes to this growing body of research by specifically examining temperate forests across diverse regions and investigating key methodological questions that affect operational implementation.

In our work, we combined GEDI L2A data with auxiliary remote sensing datasets and applied machine learning algorithms to create continuous, wall-to-wall canopy height maps (CHMs) at 10-meter resolution. What sets our study apart is the systematic evaluation across multiple temperate regions with different characteristics, allowing us to assess model robustness and identify which factors most strongly influence prediction accuracy.

Figure 1. Four study areas with temperate forests whose height has been estimated by combining GEDI satellite altimetry with Sentinel-based remote sensed predictors. The middle row shows the spatial density of GEDI measurements. The bottom row shows the reference canopy heights measured by Airborne Laser Scanning (ALS).

A Global Perspective on Temperate Forests

Our research focused on four diverse test areas in temperate regions across three continents. We selected the Swiss Alps in Europe, representing mountainous terrain with mixed forests; Marlborough in New Zealand, featuring complex topography and diverse forest types; Trinity County in California, USA, showcasing tall coniferous forests; and the Giant Mountains (Krkonoše) in Czechia, characterized by mid-elevation mountain forests. This diversity was intentional – we wanted to test the robustness of our model under various topographic and forest conditions to ensure our methods could be applied more broadly.

Each of these study areas presented unique challenges. The Swiss Alps feature steep slopes and high-elevation forests that can be difficult for satellite sensors to characterize accurately. Marlborough in New Zealand combines coastal forests with mountainous terrain, requiring the model to handle diverse environmental gradients. Trinity County’s tall conifers, some reaching over 60 meters, test the upper limits of canopy height estimation. The Giant Mountains, meanwhile, provided an ideal test case for typical Central European mountain forests that have been shaped by both natural processes and human management.

Our Approach and Key Questions

We used a Random Forest machine learning algorithm to predict canopy height based on multiple types of remote sensing data. Each study area was divided into overlapping tiles and a separate machine learning model was fitted in each tile. Then, predictions from individual tiles were overlaid and averaged.

Our predictor variables included:

Optical data from Sentinel-2 satellites, providing multispectral information about surface reflectance
Radar data from Sentinel-1, which can penetrate clouds and provide information about vegetation structure
Terrain data from FABDEM (Forest And Buildings removed Copernicus DEM), giving us elevation and topographic information
Spatial predictors including coordinates and distance metrics

The key questions we sought to answer were:

What is the optimal tile size for modeling? Larger tiles include more training data but may obscure local patterns, while smaller tiles capture local variation but may suffer from insufficient training samples.
How do our models compare with existing global products like Lang et al.’s global CHM or Meta’s tree canopy height map?
Which input variables are most important for predicting canopy height, and does this vary across different forest types and regions?
What are the main sources of error, and how can we address them in future work?

Figure 2. Comparison of different canopy height models on an example vertical canopy height profile in the Trinity study area. The top row shows the reference („true“) canopy heights derived from ALS data. The three middle rows show the three existing global canopy height products (all based on combining GEDI measurements with auxiliary remote sensed data). The bottom row shows our model based on the Random Forest machine learning model.

Figure 3. Example of a predicted canopy height map using the Random Forest model for the Trinity study area, compared with global canopy height models and an Airborne Laser Scanning (ALS) reference data. An arbitrary rectangle of 56 km² in the middle of the Trinity area is shown.

Main Findings: Balancing Efficiency and Accuracy

Optimal Tile Size: One of our important findings relates to computational efficiency in large-scale mapping projects. We tested tile sizes ranging from 5 km to 20 km and found that variation in prediction accuracy across different tile sizes was smaller than variation between study areas. Tiles of 10 km proved to be the sweet spot – offering the best compromise between computational efficiency and accuracy. This finding has practical implications for operational forest monitoring, as it allows researchers to process large areas efficiently without sacrificing too much accuracy.

Model Performance: Our models achieved root mean square errors (RMSE) ranging from 6 to 10 meters across the four study areas, with coefficient of determination (R²) values between 0.58 and 0.78. To put this in perspective, the best performance was in the Swiss Alps (RMSE = 6.0 m, R² = 0.78), while the most challenging area was Trinity County (RMSE = 10.2 m, R² = 0.58). Importantly, our newly developed CHMs consistently outperformed existing global models in all study areas. For example, in the Giant Mountains, our model achieved an RMSE of 7.3 m compared to 12.5 m for one of the leading global products.

Predictor Importance: Perhaps one of the most interesting findings from our analysis was the consistent pattern in predictor importance across all study areas. Optical predictors from Sentinel-2 consistently ranked highest, particularly the near-infrared and red-edge bands that are sensitive to vegetation characteristics. This was followed by terrain variables from FABDEM, which help account for topographic effects on canopy height. Somewhat surprisingly, radar data from Sentinel-1 and spatial predictors contributed less than we initially expected, though they still provided valuable information for reducing local prediction errors.

This pattern held true across diverse forest types and topographic conditions, suggesting that optical data contains the most information for canopy height prediction in temperate regions. However, the relative importance of different predictors varied somewhat between study areas, reflecting the influence of local forest characteristics and environmental conditions.

Figure 4. Importance of the four predictor groups (optical, radar, spatial, and terrain) in Random Forest models fitted on 10 km tiles in the four study areas. Optical data (Sentinel-2 raw bands and vegetation indices) was always the most important group, followed by terrain (elevation and slope). Radar (Sentinel-1) and spatial (X and Y coordinates) were less important, although occasionally also somewhat important for certain tiles.

Understanding the Limitations: Systematic Biases in GEDI Data

Our results also revealed how previously documented limitations in GEDI data affect the predicted canopy height maps. Like other studies have shown, we observed systematic bias that causes overestimation of canopy height in areas with low vegetation and underestimation in areas with tall trees. This pattern was consistent across all our study areas and is evident when comparing predicted heights to reference data from airborne laser scanning.

These systematic biases in GEDI footprints propagate through to our predicted maps, affecting their accuracy for certain applications. For example, predictions tend to smooth out extreme values, showing heights between 10-30 meters in areas where reference data indicates heights either below 5 meters or above 35 meters. This has important implications for applications that rely on accurate characterization of canopy height extremes or spatial heterogeneity. Understanding how these known GEDI biases manifest in wall-to-wall predicted maps is crucial for interpreting and using these products appropriately.

What This Means for Forest Science and Management

Despite these limitations, our results demonstrate significant progress in canopy height mapping. Our study confirms findings from previous research showing that while satellite data provide unprecedented spatial coverage and temporal consistency, in areas with available airborne laser scanning (ALS) data, these remain the gold standard. Even decade-old ALS data typically provide more accurate information than any predicted map available today.

However, for the vast majority of the world’s forests where ALS data are not available, or for applications requiring consistent data across large regions and time periods, our methods represent a significant step forward. The 10-meter resolution of our CHMs is fine enough to capture many important forest structural patterns while remaining computationally feasible for large-area applications.

Practical applications of this work extend across multiple domains:

Carbon accounting and climate change mitigation: More accurate canopy height maps lead to better estimates of stored forest biomass and carbon, essential for national greenhouse gas inventories and carbon offset projects.
Biodiversity conservation: Canopy height is a key habitat variable for many species. Our maps can improve species distribution models and help identify critical habitats that need protection.
Forest management: Foresters can use these maps to assess timber resources, plan harvesting operations, and monitor forest growth over time.
Ecosystem monitoring: The methodology can be applied repeatedly to track changes in forest structure, detect disturbances, and assess recovery after natural disasters or management interventions.
Scientific research: Consistent, high-resolution canopy height data enable new research on forest ecology, biogeography, and ecosystem functioning at scales previously impossible to study.

Looking Ahead: The Future of Canopy Height Mapping

Several exciting developments are on the horizon that could further improve global canopy height mapping. NASA’s upcoming NISAR mission will provide high-resolution radar data that could help address some of the limitations we identified with Sentinel-1 data. The continued operation of GEDI and ICESat-2 will build up multi-temporal datasets that could be used to track forest growth and changes over time.

Future research should continue to focus on several key areas. First, continued development of methods for correcting systematic biases in spaceborne lidar data will be important. This might involve developing footprint-specific corrections based on local forest characteristics or using multi-temporal data to calibrate measurements. Second, integrating data from multiple sensors and platforms could improve accuracy – for example, combining GEDI’s detailed vertical structure information with ICESat-2’s complementary sampling patterns or radar’s all-weather capabilities.

Third, we need to move beyond static maps toward dynamic monitoring systems that can track forest changes in near-real-time. This will require developing automated processing pipelines and change detection algorithms that can identify when and where forest structure is changing. Finally, it’s crucial to continue validating these products with high-quality field measurements and ALS data, particularly in underrepresented forest types and regions.

Conclusion

Our study demonstrates that combining spaceborne lidar data with machine learning and auxiliary remote sensing datasets can produce accurate, high-resolution canopy height maps for temperate forests. While challenges remain, particularly regarding systematic biases in the training data, these methods offer a powerful tool for forest monitoring at scales that were previously impractical or impossible to achieve.

As we continue to refine these approaches and integrate new data sources, we move closer to the goal of comprehensive, accurate, and frequently updated global forest monitoring. This capability is increasingly critical as forests face mounting pressures from climate change, land use change, and other environmental stressors. Understanding where our forests are, how tall they are, and how they are changing is essential for their conservation and sustainable management.

More information about our study can be found in the preprint submitted to Science of Remote Sensing, available on SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5526018

This research was supported by the EarthBridge project – Building Bridges between Earth Observation and Environmental Sciences.