data and code accompanying the following manuscript published in the International Journal of Applied Earth Observation and Geoinformation 
	https://doi.org/10.1016/j.jag.2021.102353
	Modeling tree canopy height using machine learning over mixed vegetation landscapes

Dataset Citation:
	Wang, H., Seaborn, T., & Wang, Z. (2021). Data from: Modeling tree canopy height using machine learning over mixed vegetation landscapes [Data set]. University of Idaho. https://doi.org/10.7923/VJ7D-KS92

This document clarifies the data used in the project of modeling tree canopy height by implementing the geographical random forest algorithm over the Mann Creek Watershed in the state of Idaho. The information of data source and important result is explained as below. Some necessary caveats are also provided for future data usage. The findings of this project have been published in a form of research paper in the International Journal of Applied Earth Observation and Geoinformation (https://doi.org/10.1016/j.jag.2021.102353). For more detail regarding this research, please feel free to download the paper or reach out to the corresponding author Hui (William) Wang at huiwang@uidaho.edu.

Introduction
	This study explored the spatial autocorrelation pattern of residuals in modeling tree canopy height and investigated the relationship between canopy height and model performance. By combining Light Detection and Ranging (LiDAR) and Landsat datasets, we used geographical random forest (GRF) and traditional random forest (TRF) methods to predict tree canopy height in a mixed dry forest woodland in complex mountainous terrain.

Data and Code:
	Data:
		LiDAR:
			The LiDAR point cloud data are comprised of 4 (column) by 3 (row) image sets. The data were obtained from the National Geospatial Program (https://apps.nationalmap.gov/downloader) developed by the U.S. Geological Survey (USGS) in the American Society for Photogrammetry and Remote Sensing (ASPRS) LAS format (September 9-October 14, 2017). A triangulation-based pit-free algorithm was used to generate the rasterized Canopy Height Model (CHM) through the lidR package in R (v.3.6.3). Final resolution of the CHM derived from the point cloud data was 0.25×0.25 m.
		
		Landsat:
			Canopy Height & Landsat Features: 
				The canopy height of each 30×30 m pixel was calculated from the mean value of the LiDAR CHM. Landsat features are derived from the Landsat 8 Operational Land Imager (OLI) level 2 product. It has a spatial resolution of 30 m ×30 m and was obtained from the USGS Earth Explorer website (https://earthexplorer.usgs.gov/). The data acquisition date was October 6, 2017.
			Field description:
				MIN: The minimum value of the CHM covered by Landsat pixel
				MAX: The maximum value of the CHM covered by Landsat pixel
				MEAN: The mean value of the CHM covered by Landsat pixel
				ndvi: Normalized Difference Vegetation Index (NDVI)
					formula: (NIR-Red)/(NIR+Red)
				gsavi: Green Soil Adjusted Vegetation Index (GSAVI)
					formula: (NIR-Green)/((NIR+Green+0.5)*(1+0.5))
				gndvi: Green Normalized Vegetation Index (GNDVI)
					formula: (NIR-Green)/(NIR+Green)
				cvi: Chlorophyll Vegetation Index (CVI)
					formula: NIR/Green*Red/Green
				ndgi: Normalized Difference Greenness Index (NDGI)
					formula: (Green-Red)/(Green+Red)
				nbr: Normalized Burn Ratio SWIR2 (NBR)
					formula: (NIR-SWIR2)/(NIR+SWIR2)
				ndii: Normalized Burn Ratio SWIR1 (NDII)
					formula: (NIR-SWIR1)/(NIR+SWIR1)
				gdvi: Green Difference Vegetation Index (GDVI)
					formula: NIR-Green
				msavi: Modified Soil Adjusted Vegetation Index (MSAVI)
					formula: ((2*NIR+1-√((2*NIR+1)^2-8*(NIR-Red) )))⁄2
				dvi: Difference Vegetation Index (DVI)
					formula: NIR-Red
				savi: Soil adjusted Vegetation index (SAVI)
					formula: (NIR-Red)/((NIR+Green+0.5)*(1+0.5))
				msr: Modified Simple Ratio (MSR)
					formula: ([(NIR⁄RED)-1])⁄([(√(NIR⁄RED))+1])
				r_g: Red/Green
				s1_n: SWIR1/NIR
				n_g: NIR/Green
				s2_g: SWIR2/Green
				n_r: NIR/Red
				s2_r: SWIR2/Red
				s1_g: SWIR1/Green
				s2_n: SWIR2/NIR
				s1_r: SWIR1/Red
				swir2_1: SWIR2/SWIR1
				green: Green
				red: Red
				nir: NIR
		Vali_Residual: These 11 raster datasets show the result of spatial distribution patterns of residuals for TRF model (0% LM) and the other 10 fusion models at the validation step.
				TRF_model: a traditional random forest model includes 50% weights from local model and 50% weights from global model
				10_LM: a fusion model consists of 10% weights from local model and 90% weights from global model
				20_LM: a fusion model consists of 20% weights from local model and 80% weights from global model
				30_LM: a fusion model consists of 30% weights from local model and 70% weights from global model
				40_LM: a fusion model consists of 40% weights from local model and 60% weights from global model
				50_LM: a fusion model consists of 50% weights from local model and 50% weights from global model
				60_LM: a fusion model consists of 60% weights from local model and 40% weights from global model
				70_LM: a fusion model consists of 70% weights from local model and 30% weights from global model
				80_LM: a fusion model consists of 80% weights from local model and 20% weights from global model
				90_LM: a fusion model consists of 90% weights from local model and 10% weights from global model
				100_LM: a complete local model consists of 100% weights from local model and 0% weights from global model
		residual_quantile_calc:
			data and code used to create visualization and downstream calculations (e.g., residuals, quantiles, differences). See additional readme.txt within sub-folder for data and code explanation.	
	Code:
		The triangulation-based pit-free algorithm can be accessed at https://github.com/Jean-Romain/lidR/wiki/Rasterizing-perfect-canopy-height-models

		The geographical random forest algorithm can be accessed at https://cran.r-project.org/web/packages/SpatialML/index.html






