I am working on a custom, Python/arcpy-based QAQC process for some of our corporate data. I am using ArcGIS Desktop 10.3 accessed via Citrix. The feature classes being tested are in file geodatabase, with intermediate data being stored in a scratch geodatabase.
Among the checks the tool will perform is a check for overlapping features within an input feature class. My current workflow for this check is:
On small and medium-sized feature classes this works fine. However, I am testing it on our largest dataset, a polyline (contour) feature class with over 3 million records and over a billion vertices in total. The intersect tool uses tiling when run on this dataset, but it takes a long time. As in, after four hours the Intersect tool is only 20% complete. I would love to get the processing time down to a minimum. I've read this help doc about geoprocessing with large datasets, and I've tried using the Dice tool to split the lines up into features with fewer vertices. The process is still extremely slow. Our Citrix servers are temporarily unavailable overnight for maintenance, and although I tried using a "batch" version of the software that supposedly will allow processes to run overnight, when I arrived this morning the application had closed without finishing successfully.
Any ideas? Will topology work faster? Should I go for my own custom tiling scheme? Thanks for any help you can provide.
We don't have nearly as many polygons as you do, but we have successfully used topology to find overlaps, gaps, etc. We have three feature datasets involved, with about (guessing) 15 different rules. Although we don't have have a lot of polygons, the vertices are quite dense. We also have many exceptions to the rules, and topology allows us to mark those.
At least with our topology checks, the process is done in an ArcEdit session (versioned)...I'm not sure (can't remember) if this can be automated with python or not, although the setting up of the rules was. Our geographic extent is large (Alaska, including the Aleutians) so the "cleaning" of the topology can take hours, but it can be done in smaller chunks to speed up the time between saves. The areas you have not checked remain as "dirty" areas.
The errors you find can be sorted and filtered, including by the current extent, and you can double-click to zoom into the area. If I remember correctly, you can even reselect them and export/save them to a new feature class if needed.
With your millions of polys, this might not be a practical solution, but it does work. For me, that's saying something since I came from the coverage topology world, and getting the topology to work for us was required for us to move (which we did quite a while ago now).
So, faster? maybe not. Ability to break into smaller sessions so the overnight offline isn't an issue? maybe.
Thanks! It is a line feature class. That topology approach sounds good. I may also be overthinking it. I do enjoy the problem-solving aspect but there is a point of diminishing returns after a while.
Thanks Jayanta! The Arcpy cafe snippet looks very promising. I'll test it out today if I can get a spare moment.
For what it's worth, the original process using a simple Intersect ran successfully last night. A mere 15 hours!
Another possible work around is install ArcGIS Pro 1.3. Geoprocessing operations are significantly faster in Pro than ArcMap. Test your workflows there to determine if there's a reduction in total geoprocessing time.
Thanks Robert. Is that because Pro is a multi-threaded application? My agency is doing a pilot implementation of ArcGIS Pro. Once they make it available to all staff I'll take a crack at it. Should I expect that all my custom script tools that I've developed in ArcGIS Desktop will work in Pro? I know that Pro uses Python 3.x.
I had this first part written last night when my ipad ate it....so a bit out of place...
I know what you mean..sometimes saving time in the long run takes too much time when you're just trying to get work done.
i'm not sure why I assumed they were polys. I'm not sure how exact you think your features are going to match, but if you have access to pro or want to give that a try, there is a Feature Compare—Data Management toolbox | ArcGIS for Desktop That sound interesting. I have not tried it but I'm sure someone here can give feedback on it. But that is more if the features are expected to be the same, not just overlap.
if not wanting to do it in pro, I wonder if a spatial join might give you some output to test
But some additional links your can look at if the above doesn't work, and you are wanting to look at some additional blogs/resources on the subject:
Again, I think the other suggestions that came from Jayanta and Robert are probably beter, but wanted to finish this train of thought from last night.
Micah - partially because it's a multi-threaded application and all the GP tools scripting was rewritten for Pro. ArcMap GP tools scripting used a variety of programming languages so it was "clunky" for lack of a better term. There is a tool in ArcMap 10.4 as well as Pro 1.x called "Analyze Tools for Pro" that will look at your Python tools from ArcMap and custom tools/toolboxes written with 2.7 Python. From there, it will output a text file telling you what won't work in 3.4.1 Python. This link should help with the transition: https://pro.arcgis.com/en/pro-app/arcpy/get-started/python-migration-for-arcgis-pro.htm