? append vs merge: which is faster?

KevinBell · ‎09-11-2012

Does anyone have any idea if it's faster to append vs merge?

I have rasters that I'm converting to points and then sticking them into a fgdb. Each raster is 1000 x 1000, so a million points. The first append takes 5 minutes, but the time grows w/ each iteration. Now on the 95th iteration each append is taking almost 20 minutes. I stopped it, and did a compress, restarted, but no change in time.

I could rewrite the process to generate a fgdb for each million points, then merge later, but would that be faster?

...maybe a long shot, but I figured I'd ask!

MathewCoyle · ‎09-11-2012

that's exactly what I'm doing. I use numpy to push cell values of 24 rasters into a dictionary and then populate an in_memory point FC which is then appended. I end up with a point w/ the values of the 24 rasters.

Everything is really fast up until the append. The in_memory point fc takes no time.

So you have a preexisting point feature class you want to add data from your rasters you are extracting data from? Maybe an insert cursor? You could load that directly from your dictionary, no need to create a temporary fc at all.

View solution in original post

MathewCoyle · ‎09-11-2012

I would imagine the one merge would be faster.

Do you need to keep the output point feature classes from the raster, or is all you are interested in the final output of the merged/appended feature class?

If you do not need to keep the intermediate data I would export them to an in_memory workspace which should speed processing time regarldess of the tool you choose.

If you want to get real crazy efficient you can extract the x,y,z,m values for each raster cell and store them in a csv or dictionary and create a point layer out of that. You'd have to manage your memory pretty well if you are processing 95 x 1 million records.

KevinBell · ‎09-11-2012

that's exactly what I'm doing. I use numpy to push cell values of 24 rasters into a dictionary and then populate an in_memory point FC which is then appended. I end up with a point w/ the values of the 24 rasters.

Everything is really fast up until the append. The in_memory point fc takes no time.

MathewCoyle · ‎09-11-2012

that's exactly what I'm doing. I use numpy to push cell values of 24 rasters into a dictionary and then populate an in_memory point FC which is then appended. I end up with a point w/ the values of the 24 rasters.

Everything is really fast up until the append. The in_memory point fc takes no time.

So you have a preexisting point feature class you want to add data from your rasters you are extracting data from? Maybe an insert cursor? You could load that directly from your dictionary, no need to create a temporary fc at all.

KevinBell · ‎09-11-2012

The way the rasters are stored on disk requires looping over them and I had not thought to keep an open insertCursor while the raster values were being slurped up into the dictionary. Maybe I'll give that a try.

I suppose I'd just need a def loadDictionary(inDict): that would open/close the cursor each time, and that would bypass the entire append.

Thanks. I suppose I was think too much like ArcToolbox!

ChrisSnyder · ‎09-11-2012

Merge just runs Append in a loop... Both tools are slow since they seem to rebuild the spatial index file with each loop of Appending. Perfiormace degradation is noticable if you are merging >2 FCs and/or if they are large FCs.

Using cursors is by far the best way: http://forums.arcgis.com/threads/66434-A-better-way-to-run-large-Append-Merge-jobs

KevinBell · ‎09-11-2012

Merge just runs Append in a loop... Both tools are slow since they seem to rebuild the spatial index file with each loop of Appending. Perfiormace degradation is noticable if you are merging >2 FCs and/or if they are large FCs.

Using cursors is by far the best way: http://forums.arcgis.com/threads/66434-A-better-way-to-run-large-Append-Merge-jobs

Great stuff thanks!

I just commented out my in_memory temp fc and pointed my dictionary straight at the target and it's taking 5 minutes instead of 20 now : )

I think I'll chalk this up to a case of tunnel vision! The code is very tight, then that Homer Simpson move! :rolleyes:

ChrisSnyder · ‎09-11-2012

One clarification: The cursor method is ONLY faster when using the 10.1 data access cursors... I found using the "old" cursor model is SLOWER than the Merge or Append tools.

For my particular case, I found in order of speedieness:
1. arcpy (non-data access) cursor method (slowest)
2. Merge tool
3. Append tool (ever so slightly faster than Merge tool)
4. arcpy.da cursor method (fastest BY FAR!)