Insert Cursor to Table In Memory

ClintonCooper1 · ‎10-31-2013

I am trying to take a large dataset and import (export or append) it into an "in memory" table where I can then run the calculations: I need to import three fields ( SOS_VOTERID, FEAT_SEQ and YEAR_Of_BIRTH). For my code below, I am just working with 2. I believe I need to run a search cursor on my orig table, and then run an insert cursor to import the data into my new table. I am running into an error that says:

Runtime error  Traceback (most recent call last):   File "<string>", line 23, in <module> TypeError: sequence size must match size of the row

import arcpy, collections  from arcpy import env env.workspace = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE" table = "WAYNE"  table2 = arcpy.CreateTable_management("in_memory", "WAYNE") arcpy.AddField_management(table2, "SOS_VOTERID","TEXT", field_length=25) arcpy.AddField_management(table2, "FEAT_SEQ","LONG")  newList = {row[0]: row[1] for row in arcpy.da.SearchCursor(table, ["SOS_VOTERID","FEAT_SEQ"])}                                         tbl = arcpy.ListTables("*")                       for table in tbl:     fieldList = arcpy.ListFields(table)     for field in fieldList:         newList.append([table,field.name])    #  this populates the new list with table and field to directly insert to new table with arcpy.da.InsertCursor(table2, ['SOS_VOTERID', 'FEAT_SEQ']) as insert:     for f in newList:                                          insert.insertRow(f) del insert

Anyone know where I am going wrong? Thanks!

Clinton

ChrisSnyder · ‎10-31-2013

It'd look something like this I think - see how for each loop of the search cursor, the data (the searchRow tuple) gets fed dirtectly to the insertRow?

import arcpy inputTbl = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE" outputTbl = str(arcpy.CreateTable_management("in_memory", "WAYNE").getOutput(0)) arcpy.AddField_management(outputTbl, "SOS_VOTERID","TEXT", field_length=25) arcpy.AddField_management(outputTbl, "FEAT_SEQ","LONG") insertRows = arcpy.da.InsertCursor(outputTbl, ["SOS_VOTERID","FEAT_SEQ"]) searchRows = arcpy.da.SearchCursor(inputTbl, ["SOS_VOTERID","FEAT_SEQ"]) for searchRow in searchRows:     insertRows.insertRow(searchRow) del searchRow, searchRows, insertRows

View solution in original post

ChrisSnyder · ‎10-31-2013

I think you would probably want some code that looked more like this:

http://forums.arcgis.com/threads/66434-A-better-way-to-run-large-Append-Merge-jobs?p=230850&viewfull...

No need to store the data in a list or dictionary. Just read it via the search cursor and then write it directly to the in_memory table.

Nice dictionary comprehension BTW! Forgot that was supported now in v2.7... I learned something today.

ClintonCooper1 · ‎10-31-2013

I think you would probably want some code that looked more like this:

http://forums.arcgis.com/threads/66434-A-better-way-to-run-large-Append-Merge-jobs?p=230850&viewfull...

No need to store the data in a list or dictionary. Just read it via the search cursor and then write it directly to the in_memory table.

Nice dictionary comprehension BTW! Forgot that was supported now in v2.7... I learned something today.

I kind of understand where you are coming from, but not fully. I am having issues as to where I put the line:

arcpy.CreateTable_management("in_memory", "WAYNE")

at within the code. Could you give a little bit more explanation of how to write this code? Thanks!!

ChrisSnyder · ‎10-31-2013

It'd look something like this I think - see how for each loop of the search cursor, the data (the searchRow tuple) gets fed dirtectly to the insertRow?

import arcpy inputTbl = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE" outputTbl = str(arcpy.CreateTable_management("in_memory", "WAYNE").getOutput(0)) arcpy.AddField_management(outputTbl, "SOS_VOTERID","TEXT", field_length=25) arcpy.AddField_management(outputTbl, "FEAT_SEQ","LONG") insertRows = arcpy.da.InsertCursor(outputTbl, ["SOS_VOTERID","FEAT_SEQ"]) searchRows = arcpy.da.SearchCursor(inputTbl, ["SOS_VOTERID","FEAT_SEQ"]) for searchRow in searchRows:     insertRows.insertRow(searchRow) del searchRow, searchRows, insertRows

ClintonCooper1 · ‎11-02-2013

It'd look something like this I think - see how for each loop of the search cursor, the data (the searchRow tuple) gets fed dirtectly to the insertRow?

import arcpy
inputTbl = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE"
outputTbl = str(arcpy.CreateTable_management("in_memory", "WAYNE").getOutput(0))
arcpy.AddField_management(outputTbl, "SOS_VOTERID","TEXT", field_length=25)
arcpy.AddField_management(outputTbl, "FEAT_SEQ","LONG")
insertRows = arcpy.da.InsertCursor(outputTbl, ["SOS_VOTERID","FEAT_SEQ"])
searchRows = arcpy.da.SearchCursor(inputTbl, ["SOS_VOTERID","FEAT_SEQ"])
for searchRow in searchRows:
    insertRows.insertRow(searchRow)
del searchRow, searchRows, insertRows

Thank you so much! This worked great!

I do have one more question. To get the benefits from in memory performance, once I create a table in memory, will all the subsequent calculations/operations done on that table be in memory as well or do I need to continue to call the in memory operation? Thanks!

Anonymous User · ‎11-02-2013

outputTbl = str(arcpy.CreateTable_management("in_memory", "WAYNE").getOutput(0))

That is a nice trick.

ClintonCooper1 · ‎11-04-2013

I still have this one last question. To get the benefits from in memory performance, once I create a table in memory, will all the subsequent calculations/operations done on that table be in memory as well or do I need to continue to call the in memory operation? Thanks!

ChrisSnyder · ‎11-04-2013

It depends...

Some operations (such as AddField or CalculateField) can operate directly on the in_memory table. Other operations (such as Frequency or SummaryStatistics) create a new output table that, depending on your specifications, may be written to either the in_memory workspace or to disk.

That said, reading input from in_memory and then writting the output to in_memory would of course be the fastest option you have. Only worry is if you have enough available RAM to complete the process(es)...

ClintonCooper1 · ‎11-04-2013

I have plenty of memory, but I am not seeing any increase in performance 😞 My workflow consists of creating a new table in memory, adding the fields and indexes, appending the data in, and then performing about 7 field calculations using data cursors. I have runt he process both in memory and regular, and they are both running at around 23 hours to complete. I have seen that once your dataset gets a to a certain size, you lose your in memory performance gains. I guess that I am seeing that with my data set. Thanks for all your help!

Clinton

JamesCrandall · ‎11-06-2013

I have plenty of memory, but I am not seeing any increase in performance 😞 My workflow consists of creating a new table in memory, adding the fields and indexes, appending the data in, and then performing about 7 field calculations using data cursors. I have runt he process both in memory and regular, and they are both running at around 23 hours to complete. I have seen that once your dataset gets a to a certain size, you lose your in memory performance gains. I guess that I am seeing that with my data set. Thanks for all your help!

Clinton

This may or may not help (maybe just confuse the approach), but I wanted to mention that the in_memory is definitely a performance gainer as long as the RAM is available. The other performance gain is achieved by performing the tabular operations on non-ESRI specific objects.

So, we heavily use Pandas Data Frame objects to join/merge tabular data as well as populate new fields with math/statistic operations. We simply convert back and forth between esri and pandas objects using NumPyArrayTo..FeatureClass/Table and FeatureClass/Table..ToNumpyArray

Just something to think about as it may provide the performance you require.