Hi all,
We get quarterly updated CA county parcels fcs from a vendor. I am currently using python and insert da cursor to append all rows (it allows me to avoid any schema locks). However this process takes a long time (5hours) as the number of record is very large (13 millions rows). I was wondering if there is was a way to create some sort of stored procedure on the Oracle side to update the data?
Any thoughts on this would be appreciated.
Thanks,
Alex
Solved! Go to Solution.
Typically when data is inserted or updated in database tables, the DBMS updates the indexes as the operations happen. This is good overall, since stale indexes can cause huge performance hits; however, doing so also slows down bulk loading. What you could do is remove all the indexes, including spatial index, insert all the new data; and then rebuild the indexes after the data is loaded.
Have you tried using the Append geoprocessing tool? In general, the geoprocessing tools are designed to work on sets of data, and they tend to be more efficient than ArcPy cursors when working with large amounts of data.
The issue I have had with the append tool is that sometimes there are some locks established from some users and it fails. I have not encountered any issues with the cursors. It seems like locks dont prevent the cursors from running.
As the adage goes, premature optimization is the root of all evil. Given the information you have provided, I really wonder whether optimizing your data loading is worth it. Sure, 5 hours isn't quick, but the process only runs once a quarter, and the how much is really gained from trimming 2 or 3 hours off the process if you burn 20 hours researching and testing alternatives.
Setting the larger question aside of whether optimizing is worth it, there are quite a few factors that affect how quickly the data loads. Are you using ArcMap or ArcGIS Pro? Is the data versioned or not? If versioned, how many versions are there in the GDB and have they been reconciled recently? Has the GDB been compressed recently?
True. I was just wondering if there was a way. 5hours is no big deal as the process runs at night but I thought I would ask to see if there are ways to achieve this.
To answer your questions below:
Are you using ArcMap or ArcGIS Pro? I am using python 3.5 (ArcPro)
Is the data versioned or not? Not versioned
If versioned, how many versions are there in the GDB and have they been reconciled recently? None
Has the GDB been compressed recently?Yes, we recently upgraded our SDE to 10.6 and went through a compress
Typically when data is inserted or updated in database tables, the DBMS updates the indexes as the operations happen. This is good overall, since stale indexes can cause huge performance hits; however, doing so also slows down bulk loading. What you could do is remove all the indexes, including spatial index, insert all the new data; and then rebuild the indexes after the data is loaded.
That could be a good solution. We had issues removing then rebuilding the spatial index the other day. We could not rebuild indexes afterwards. Maybe an issue on the Oracle side.