Wetland Identification Model--Train Random Trees Error

anninarupe17 · ‎01-28-2021

Hello, all!

I'm using the Wetland Identification Model (following the paper Esri has provided), and I have been getting an error when running the Arc Hydro tool "Train Random Trees." I started out running through the WIM manually, just so I could understand what was going on. I made it to the "Train Random Trees" tool and I receive the following error:

'line 352', 'c:\\program files\\arcgis\\pro\\Resources\\ArcToolbox\\Scripts\\archydro\\trainrandomtrees.py', 'IndexError: boolean index did not match indexed array along dimension 0; dimension is 17000 but corresponding boolean dimension is 17421'

I have the 'scikit-learn' Python package installed (v 0.23.2) and the project environment is using the clone that this package is installed on. I've also tried running the model as a whole, but it fails at the same step and gives the same error. Any ideas on what might be happening?

Annina

GinaO_Neil · ‎01-28-2021

Hi Annina,

This comes up when the extents between your ground truth raster and the predictor variable raster (composite) do not match. On the backend, the script takes both of these rasters, flattens them into a long 1-dimensional array, and for each position along the array it compares the landscape class to the predictor information in the same location. Through this process, the model learns what characteristics are common among each landscape class represented in the ground truth raster. So, when the number of cells in the rasters in the X and Y direction do not match, this one-to-one matching fails and the model could learn erroneous characteristics.

The fix for this is to clip your composite data to the exact same extents as your ground truth data. The best way to do this is to run Extract By Mask, with the ground truth raster as the mask and the composite raster as the input raster. In the environments of this tool also set the extents and snap raster constraints to be the ground truth raster. Before re-running the Train Random Trees tool, you can look at the raster properties for the ground truth and composite and check that the number of rows and columns are the same between the two.

I am working to update the tools to fix this for the users automatically, please use the above fix in the meantime. And let me know if you run into anything else.

Thank you for your feedback!

-Gina

View solution in original post

GinaO_Neil · ‎02-01-2021

Hi Annina,

My first thought would have been that your raster is covering too large of an extent for use with the WIM. However, that can't be the case if you were able to run this tool successfully with the same data before. If the input ground truth raster has run successfully before, I would check that the data type of the raster has not changed. Maybe it was changed to a floating point type somewhere in the processing. If so, change the ground truth raster back to an integer type and try again. You can do this by running Raster Calculator and using the code int('myraster.tif')

My other suggestion would be to split up your raster and run the WIM in "chunks". That would get around the memory error if it was due to too large of an area.

What is the resolution of your input DEM?

View solution in original post

GinaO_Neil · ‎01-28-2021

Hi Annina,

This comes up when the extents between your ground truth raster and the predictor variable raster (composite) do not match. On the backend, the script takes both of these rasters, flattens them into a long 1-dimensional array, and for each position along the array it compares the landscape class to the predictor information in the same location. Through this process, the model learns what characteristics are common among each landscape class represented in the ground truth raster. So, when the number of cells in the rasters in the X and Y direction do not match, this one-to-one matching fails and the model could learn erroneous characteristics.

The fix for this is to clip your composite data to the exact same extents as your ground truth data. The best way to do this is to run Extract By Mask, with the ground truth raster as the mask and the composite raster as the input raster. In the environments of this tool also set the extents and snap raster constraints to be the ground truth raster. Before re-running the Train Random Trees tool, you can look at the raster properties for the ground truth and composite and check that the number of rows and columns are the same between the two.

I am working to update the tools to fix this for the users automatically, please use the above fix in the meantime. And let me know if you run into anything else.

Thank you for your feedback!

-Gina

anninarupe17 · ‎01-28-2021

Thank you SO much, Gina! I'll be trying this out tomorrow.

Annina

anninarupe17 · ‎02-01-2021

Gina,

Alright, I spent all day Friday trying to get things to work, and I didn't even get to the Train Random Trees part!

I tried to Extract by Mask, but for some reason, my rasters still weren't the same size, so I decided to start over to get the sizes right from the beginning. I was never able to get the surface water and ground water rasters the same as the DEMs, but I hoped that didn't make a difference, as long as the rasters that were the inputs for the composite.tif were the same size. But, I didn't get to try that out yet, because I ran into another error, but this time in the "Train Test Split" tool:

"('line 289', 'c:\\program files\\arcgis\\pro\\Resources\\ArcToolbox\\Scripts\\archydro\\traintestsplit.py', 'MemoryError: Unable to allocate 1.14 GiB for an array with shape (1220521176,) and data type bool')"

I've run this tool successfully before on the same data, and I have plenty of memory. Any ideas on what I should try next?

I appreciate your help,

Annina

GinaO_Neil · ‎02-01-2021

Hi Annina,

My first thought would have been that your raster is covering too large of an extent for use with the WIM. However, that can't be the case if you were able to run this tool successfully with the same data before. If the input ground truth raster has run successfully before, I would check that the data type of the raster has not changed. Maybe it was changed to a floating point type somewhere in the processing. If so, change the ground truth raster back to an integer type and try again. You can do this by running Raster Calculator and using the code int('myraster.tif')

My other suggestion would be to split up your raster and run the WIM in "chunks". That would get around the memory error if it was due to too large of an area.

What is the resolution of your input DEM?

anninarupe17 · ‎02-02-2021

Gina,

Thanks for your thoughts. I did a little bit of playing around and got the Run Random Trees to complete.

I noticed that I had a lot of "No Data" cells; the mask wasn't working and made the raster extent much larger than what it should've been. I started over, making sure that each raster was obeying the mask extent. These rasters' columns and rows were 10000 and 17000. The steps up to and including Train Random Trees completed successfully, although my CPU usage was maxing out, even though Pro was really the only thing running. It took a while, but it did complete. The Run Random Trees process failed, again with the memory error. I split the composite.tif into 4 tiles (5000 x 8500), and tried rerunning the process on only one file, which was successful.

I've got a 2 m resolution DEM. Is this too fine for the tools? If not, what size (columns/rows) would you suggest I chunk up the DEMs into?

Annina

SusanGale1 · ‎02-03-2021

I'd be interested in some guidance on maximum size as well. I've been playing with the WIM tools using a fairly large raster but at lower resolution (20 ft). Errors are somewhat unpredictable (sometimes a tool works, sometimes it returns error 999999), but if I try to run the smoothing tool it almost always gives me an error that the raster is too large, even when I try on much smaller chunks of the DEM. My workaround has been to call the Focal Statistics GP tool directly to do the smoothing, which seems to work.

I'm not sure if there's an absolute raster size (i.e., total number of cells) or if it is dependent on my workstation specs.

Just thought I'd mention too that I've tried using the TWI previously for modeling wetlands and TWI seemed to be very good at identifying headwater streams. 😉 But Gina's addition of smoothing (as described in her 2018 J. of Hydrology paper) is a really elegant and clever approach! I'm excited to see how it affects my results.

GinaO_Neil · ‎02-03-2021

Hi,

Glad to hear you're trying out the WIM!

Which tools are returning the generic 99999 error? I have seen that happen before during Train Random Trees and have traced it back to the composite bands tool that runs on the backend. You can try opening the composite bands tool, navigating to environments, and changing the parallel processing factor to 0. Then, retry Train Random Trees.

All of the smoothing methods use the Raster to Numpy Array tool before performing operations. So input raster size constraints would be the same as what is listed for that core software tool, and it will vary depending on the RAM available. My machine has 64gb of RAM and I received the same memory error when clipping my DEM to the extent of a HUC12 watershed (1m resolution, 20694 columns and 17258 rows). Ideally, the tool should be able to process data at the HUC 12 scale, so I will look into releasing improvements to WIM to make that possible. In the meantime, I would recommend clipping to sizes closer to a HUC16 and working in chunks. I would also recommend using WIM's smoothing tool rather than Focal Statistics to smooth the DEM. Specifically, so that you can apply the Perona Malik method. Perona Malik has been shown to significantly improve wetland predictions and outperform other standard smoothing methods. Those results are presented in this paper https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2019WR024784

Also note that when you get to the Train Random Trees and Assess Accuracy steps, all rasters should have the same extents of your ground truth raster (i.e., using this raster as the snap raster, extent, mask environment setting), since training and accuracy assessment can only be done using cells where the ground truth data is known. Setting these constraints automatically for the user is a fix coming to a new version of WIM by the end of this week.

Hope that helps as a temporary fix, I will be working on a better one!

SusanGale1 · ‎02-04-2021

Thanks so much, Gina! The details on size/memory and reason for the constraints (Numby tool) are super helpful, as is the journal article. Looking forward to trying out the new version of WIM too, and I second anninarupe17's comment that you are a rockstar!

GinaO_Neil · ‎02-03-2021

Please see my response to SusanGale1 below. I think that addresses your questions. Sounds like you already implemented the work around of chunking up the data, I will be working on a better solution for that. Also, yes 2m resolution is great for WIM, I just need to work on the raster extent limitations 🙂