The Wetland Identification Model (WIM) – A New Arc Hydro Functionality for Predicting Wetland Locations Using LiDAR Elevation Data and Machine Learning

10695
18
03-06-2020 04:40 PM
Labels (1)
GinaO_Neil
Esri Contributor
7 18 10.7K

The Importance of Wetland Protection

Wetlands are an important ecosystem that provide habitat for many plant and animal species, improve water quality, recharge groundwater, and ease flood and drought severity. However, the quality and existence of wetlands are threatened by agricultural or development repurposing, pollutant runoff, and climate change. Given the ecological value provided by wetlands and the ongoing threat to wetland health, wetland management and conservation efforts are imperative. Rapid and reliable creation of wetland distribution maps can benefit these efforts. The Wetland Identification Model (WIM) is a proposed framework for creating these data.

 

How the WIM Aims to Support Wetland Protection Efforts

While there are many types of wetlands, all wetlands can be identified by common features, including the presence of hydrologic conditions that inundate the area, vegetation adapted for life in saturated soil conditions, and hydric soils. Light detection and ranging (LiDAR) data offer new opportunities to observe these features at varying scales and provide higher resolution and wider availability than other remote sensing options. LiDAR returns can be interpolated to create high-resolution digital elevation models (DEMs), which can then be used to derive topographic metrics that describe flow convergence and near-surface soil moisture to indicate wetlands. Furthermore, deriving topographic metrics from LiDAR DEMs has been shown to increase the accuracy of saturation extent mapping compared to coarser DEMs (i.e., > 2 m).

The WIM uses LiDAR DEMs to derive topographic metrics that describe hydrologic drivers of wetland formation and uses these as predictors of wetland areas through the random forests algorithm (Breiman, 2001). The WIM consists of three main parts: preprocessing, predictor variable calculation, and classification and accuracy assessment. Required input data are a high‐resolution digital elevation model (DEM) and verified wetland/nonwetland coverage (i.e., ground truth data), both in TIFF format. The current implementation also requires a surface water input raster, although future implementations will derive this directly from the input DEM. Final model outputs are wetland predictions and an accuracy report.

  1. The input DEM is preprocessed using methods specific to hydrologic parameter derivation from high-resolution DEMs.
  2. The preprocessed DEM is used to calculate the predictor variables: the topographic wetness index (TWI), curvature, and cartographic depth‐to‐water index (DTW).
  3. Training data are derived from the ground truth data.
  4. The training data are coupled with the merged predictor variables to train the random forests algorithm (Breiman, 2001).
  5. The ground truth data that were not used to train the model are used to assess the accuracy of predictions. The accuracy metrics generated are chosen to minimize unrepresentative evaluations of model performance due to imbalanced target classes.

Previous Performance of the WIM and Potential Applications

      The WIM was created through original research at the University of Virginia. It was originally developed and evaluated for environmental planning applications, specifically to streamline the wetland permitting process by providing accurate wetland inventories that limit manual surveying to likely wetland areas. After calibration for four geographic regions in Virginia using a rich ground truth dataset of jurisdictionally confirmed wetlands and nonwetlands, the WIM was able to identify 80‐90% of true wetlands across the sites. The proportion of wetland predictions that were correct varied from 22 to 69%. Overall, the results suggest strong potential for the WIM to support wetlands delineation. However, success in other landscapes will depend on the quality of the DEM and available ground truth data.  These data allow for the necessary calibration of WIM parameters to specific landscapes. This iterative process will likely reveal unique DEM preprocessing parameters that improve the representation of the land surface for wetlands specific to the region. Further, reliable and abundant ground truth data will allow the model to learn a range of wetland characteristics and provide representative accuracy assessments.  

 

Getting Started with WIM

         WIM has been implemented within Arc Hydro Pro 2.5 and higher. For further documentation on the WIM as an Arc Hydro toolset, see Arc Hydro – Wetland Identification Model. Note that implementation of the WIM tools requires installation of the Scikit-Learn python package to the Pro python environment.

toolset.png

 

Further Reading

For further reading on the development and evaluation of the WIM, see the following publications:

 

O'Neil, G. L., Goodall, J. L., Behl, M., Saby, L. (2020). Deep Learning using Physically-Informed Input Data for Wetland Identification. Environmental Modelling and Software. 104665. https://doi.org/10.1016/j.envsoft.2020.104665.  

O'Neil, G. L., Saby, L., Band, L. E., Goodall, J. L. (2019). Effects of LiDAR DEM Smoothing and Conditioning
Techniques on a Topography-Based Wetland Identification Model. Water Resources Research, 55. https://doi.org/10.1029/2019WR024784.

O'Neil, G. L., Goodall, J. L., Watson, L. T. (2018). Evaluating the potential for site-specific modification of LiDAR DEM derivatives to improve environmental planning-scale wetland identification using random forest classification. Journal of Hydrology, 559, 192-208.
https://doi.org/10.1016/j.jhydrol.2018.02.009.

 

Citations

Breiman, L. (2001). Random forests. Machine learning45(1), 5-32.

18 Comments
GinaO_Neil
Esri Contributor

If you are just getting started with WIM, be sure to check out the user guide here!

by Anonymous User
Not applicable

First, thanks for putting this together, it has great potential.

Just working my way through this for the first time and have hit a snag. The python environment is all set up ok as per instructions but when trying to run Train Random Trees I get this error:

('line 243', 'c:\\program files\\arcgis\\pro\\Resources\\ArcToolbox\\Scripts\\archydro\\trainrandomtrees.py', "ModuleNotFoundError: No module named 'gdal'")

gdal is definately installed according to python manager.

 

Tom

 

GinaO_Neil
Esri Contributor

Hi @Anonymous User ,

 

Very happy to hear you are finding WIM useful. What version of ArcGIS Pro are you using? I've heard of some issues stemming from the version of gdal packaged with the newly released Pro 2.9. 

by Anonymous User
Not applicable

Yes, I am using ArcGIS Pro 2.9. You are correct about the gdal upgrade being the problem. After finding the upgrade notice I made changes to the python code and it now works as expected .

Here is the info from the upgrade notice. 

  • GDAL has been upgraded to version 3.3. This version includes improvements to the Python bindings.
    • Top-level imports of gdal, gdalconst, gdalnumeric, ogr, and osr are no longer supported, and should be converted to use the osgeo module. For example, convert import gdal to from osgeo import gdal.
    • The new utility module osgeo_utils can be accessed with import osgeo_utils.
KavianKoleini
New Contributor

Hello,

In trying to run the Train Test Split, I get the following error:

'archydropro.traintestsplit'

Error HRESULT E_FAIL has been returned from a call to a COM component

Any suggestions about how to resolve this type of error?

GinaO_Neil
Esri Contributor

Hi,

This error was tied to conflicting Pro 2.9 and Pro 3+ versions of tools. There are updated versions of Arc Hydro for both 2.9 and 3 available here 

Does installing the latest version solve the error?

 

Best,

Gina

KavianKoleini
New Contributor
This error is occurring in Pro 2.9 and the Arc Hydro v2.9
GinaO_Neil
Esri Contributor

What version of Arc Hydro 2.9 do you have?

KavianKoleini
New Contributor

Arc Hydro Tools for ArcGIS Pro Version: 2.9.59

ArcGIS Pro Version: 2.9.32739

GinaO_Neil
Esri Contributor

Great. That is an older version of Arc Hydro for Pro 2.9. The error should be fixed if you uninstall arc hydro and reinstall the 2.9.88 version found here https://downloads.esri.com/archydro/archydro/Setup/Pro/2.9/ 

ArcGIS
by
New Contributor

I am trying to run Train Random Trees and it keeps failing with the error:

('line 368', 'c:\\program files\\arcgis\\pro\\Resources\\ArcToolbox\\Scripts\\archydro\\trainrandomtrees.py', 'numpy.core._exceptions._ArrayMemoryError: Unable to allocate 73.4 GiB for an array with shape (120036, 164241) and data type float32')

Can I get advice on how to fix this?

kawakawa4
New Contributor III

Hi, 

 

I have just installed the latest version of the WIM, but the toolbox seems to be missing a tool: Hydrocondition High-Resolution DEM

kawakawa4_0-1694493366067.png

Any ideas where I can find this tool please?  Many thanks

CandaceM
New Contributor II

@kawakawa4 @GinaO_Neil 

I have also just come across this oversight. Kawakawa4 - have you gotten an answer from ESRI about this missing tool yet?

Update: after running my own Fill Raster tool to stand in for this one, I continue on and see that the most updated tools do not call for the hydroconditioned raster. The Topographic Wetness tool only asks for the smoothed raster, which we have, and it will hydrocondition the raster in this step. (I tried both continuous flow and fill, and fill worked much better for my area). Hope this helps!

CandaceM
New Contributor II

Hi @GinaO_Neil,

I keep getting this issue when running Train Random Trees. It doesn't matter how many input parameters I use, the same error arises:

 ('line 472', 'c:\\program files\\arcgis\\pro\\Resources\\ArcToolbox\\Scripts\\archydro\\trainrandomtrees.py', "sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'log2', 'sqrt'} or None. Got 'auto' instead.")

Any suggestions on getting past this? Thanks 

DanJansson
New Contributor

Hello!

I am experience an error with the Train Random Trees.

A raster error has occurred. The messages that follow will provide more detail.
ERROR 160331: The table name is invalid.
The table name is invalid.
No spatial reference exists.
The table was not found. [train_info_temp.tif]
The table was not found. [train_info_temp.tif]

I am using Arcgis pro 3.2 and ArcHydro 3.1.51

Do you have any workarounds?

Thank you

Dan

GinaO_Neil
Esri Contributor

Hi Dan,

Are your input datasets in a projected coordinate system? I recommend aligning all data with the projection of your DEM. DEMs should have linear horizontal units that match the vertical units (e.g., XY and elevation all in meters).

 

Best,

Gina

DanJansson
New Contributor

Thank you @GinaO_Neil!

It seemed like the issue was related to the cloud servers where i stored my project. It worked perfectly when i ran the tool on my local drive.

Best wishes

Dam

GinaO_Neil
Esri Contributor

Good to know. Thanks for the update, Dan!