POST
|
I have plenty of memory, but I am not seeing any increase in performance 😞 My workflow consists of creating a new table in memory, adding the fields and indexes, appending the data in, and then performing about 7 field calculations using data cursors. I have runt he process both in memory and regular, and they are both running at around 23 hours to complete. I have seen that once your dataset gets a to a certain size, you lose your in memory performance gains. I guess that I am seeing that with my data set. Thanks for all your help! Clinton
... View more
11-04-2013
04:10 PM
|
0
|
0
|
1214
|
POST
|
I still have this one last question. To get the benefits from in memory performance, once I create a table in memory, will all the subsequent calculations/operations done on that table be in memory as well or do I need to continue to call the in memory operation? Thanks!
... View more
11-04-2013
01:03 PM
|
0
|
0
|
1214
|
POST
|
It'd look something like this I think - see how for each loop of the search cursor, the data (the searchRow tuple) gets fed dirtectly to the insertRow? import arcpy
inputTbl = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE"
outputTbl = str(arcpy.CreateTable_management("in_memory", "WAYNE").getOutput(0))
arcpy.AddField_management(outputTbl, "SOS_VOTERID","TEXT", field_length=25)
arcpy.AddField_management(outputTbl, "FEAT_SEQ","LONG")
insertRows = arcpy.da.InsertCursor(outputTbl, ["SOS_VOTERID","FEAT_SEQ"])
searchRows = arcpy.da.SearchCursor(inputTbl, ["SOS_VOTERID","FEAT_SEQ"])
for searchRow in searchRows:
insertRows.insertRow(searchRow)
del searchRow, searchRows, insertRows Thank you so much! This worked great! I do have one more question. To get the benefits from in memory performance, once I create a table in memory, will all the subsequent calculations/operations done on that table be in memory as well or do I need to continue to call the in memory operation? Thanks!
... View more
11-02-2013
04:33 AM
|
0
|
0
|
1214
|
POST
|
I think you would probably want some code that looked more like this: http://forums.arcgis.com/threads/66434-A-better-way-to-run-large-Append-Merge-jobs?p=230850&viewfull=1#post230850 No need to store the data in a list or dictionary. Just read it via the search cursor and then write it directly to the in_memory table. Nice dictionary comprehension BTW! Forgot that was supported now in v2.7... I learned something today. I kind of understand where you are coming from, but not fully. I am having issues as to where I put the line: arcpy.CreateTable_management("in_memory", "WAYNE") at within the code. Could you give a little bit more explanation of how to write this code? Thanks!!
... View more
10-31-2013
04:14 PM
|
0
|
0
|
1214
|
POST
|
I am trying to take a large dataset and import (export or append) it into an "in memory" table where I can then run the calculations: I need to import three fields ( SOS_VOTERID, FEAT_SEQ and YEAR_Of_BIRTH). For my code below, I am just working with 2. I believe I need to run a search cursor on my orig table, and then run an insert cursor to import the data into my new table. I am running into an error that says: Runtime error Traceback (most recent call last): File "<string>", line 23, in <module> TypeError: sequence size must match size of the row import arcpy, collections from arcpy import env env.workspace = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE" table = "WAYNE" table2 = arcpy.CreateTable_management("in_memory", "WAYNE") arcpy.AddField_management(table2, "SOS_VOTERID","TEXT", field_length=25) arcpy.AddField_management(table2, "FEAT_SEQ","LONG") newList = {row[0]: row[1] for row in arcpy.da.SearchCursor(table, ["SOS_VOTERID","FEAT_SEQ"])} tbl = arcpy.ListTables("*") for table in tbl: fieldList = arcpy.ListFields(table) for field in fieldList: newList.append([table,field.name]) # this populates the new list with table and field to directly insert to new table with arcpy.da.InsertCursor(table2, ['SOS_VOTERID', 'FEAT_SEQ']) as insert: for f in newList: insert.insertRow(f) del insert Anyone know where I am going wrong? Thanks! Clinton
... View more
10-31-2013
12:10 PM
|
0
|
11
|
5197
|
POST
|
In addition, the ESRI Summary Statistics tool (and ther Frequency tool) give incorrect results in the output table when the case field values are either NULL or 0. Which was a bug that got fixed a long time ago, but seems to be back (at least in v10.1 SP1). As a solution to little issues like this, I too have a little collection of Python-based code/tools I have written over the years that are either bug work arounds or major performace enhancments for some of the out of the box geoprocessing tools. Based on your solution, what can I do to fix my code as the times are way too high given what you have found?
... View more
10-30-2013
02:46 PM
|
0
|
0
|
278
|
POST
|
added this line:
values = [row[0] for row in arcpy.da.SearchCursor(table, ("FULL_ADDRESS_NAME"))]
and it worked with this code:
import arcpy
from arcpy import env
env.workspace = r"C:\Users\ccooper\Desktop\DATA.gdb\WAYNE"
table = "WAYNE"
uniqueValues = {}
values = [row[0] for row in arcpy.da.SearchCursor(table, ("FULL_ADDRESS_NAME"))]
newID = 0
with arcpy.da.UpdateCursor(table, ["FULL_ADDRESS_NAME","FEAT_SEQ"]) as updateRows:
for row in updateRows:
nameValue = row[0]
if nameValue in uniqueValues:
row[1] = uniqueValues[nameValue]
else:
newID += 1
uniqueValues[nameValue] = newID
row[1] = newID
updateRows.updateRow(row)
del row, updateRows
uniqueCount = {}
for val in uniqueValues:
uniqueCount[val] = values.count(val)
with arcpy.da.UpdateCursor(table, ["FULL_ADDRESS_NAME", "FREQ_NAME"]) as updateRows:
for row in updateRows:
nameValue = row[0]
row[1] = uniqueCount[nameValue]
updateRows.updateRow(row)
del row, updateRows
on my test data (1/100th) the size of my main file, it ran for 85 seconds.....much slower than my other method. Do you see any in efficiencies in my code that I could change to increase performance?
... View more
10-30-2013
02:01 PM
|
0
|
0
|
278
|
POST
|
import arcpy
from arcpy import env
env.workspace = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE"
table = "WAYNE"
uniqueValues = {}
values = []
newID = 1
with arcpy.da.UpdateCursor(table, ["FULL_ADDRESS_NAME","FEAT_SEQ"]) as updateRows:
for row in updateRows:
nameValue = row[0]
values.append(nameValue)
if nameValue in uniqueVals:
row[1] = uniqueValues[[nameValue]]
else:
newID += 1
uniqueValues[nameValue] = [newID]
row[1] = newID
updateRows.updateRow(row)
del row, updateRows
uniqueCount = {}
for val in uniqueValues:
uniqueCount[val] = values.count(val)
with arcpy.da.UpdateCursor(table, ["FULL_ADDRESS_NAME", "FREQ"]) as updateRows:
for row in updateRows:
nameValue = row[0]
row[1] = uniqueCount[nameValue]
updateRows.updateRow(row)
del row, updateRows
Ran the code, and it is throwing a zero into the FREQ field for all values?
... View more
10-30-2013
12:42 PM
|
0
|
0
|
663
|
POST
|
If you really want to do it all in python, as a first step, add a field called 'FREQ' to the table. Then run this modified version of the above code:
import arcpy
from arcpy import env
env.workspace = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE"
table = "WAYNE"
uniqueValues = {}
values = []
newID = 1
with arcpy.da.UpdateCursor(table, ["FULL_ADDRESS_NAME","FEAT_SEQ"]) as updateRows:
for row in updateRows:
nameValue = row[0]
values.append(nameValue)
if nameValue in uniqueVals:
row[1] = uniqueValues[[nameValue]]
else:
newID += 1
uniqueValues[nameValue] = [newID]
row[1] = newID
updateRows.updateRow(row)
del row, updateRows
uniqueCount = {}
for val in uniqueValues:
uniqueCount[val] = values.count(val)
with arcpy.da.UpdateCursor(table, ["FULL_ADDRESS_NAME", "FREQ"]) as updateRows:
for row in updateRows:
nameValue = row[0]
row[1] = uniqueCount[nameValue]
updateRows.updateRow(row)
del row, updateRows
The arcpy.da.UpdateCursor is definitely faster than the field calculator tool, at least in my testing (if you were going to summarize -> join -> calculatefield -> remove join), can't say that they will be than the summary stats alone. Thanks Doug for the help! On my main file it takes about 10 minutes to run the summarize tool, and another 24 to update it, taking a total time of 34 minutes. I will run this script and report back if the script is faster. Again, thanks for the help. What good resources would you suggest to help with my learning curve in python for acrgis? I do have the python scripting for ArcGIS book, but it only goes over the very basics, and I am now starting to get into deeper code writing. Thanks again!!
... View more
10-30-2013
12:28 PM
|
0
|
0
|
663
|
POST
|
Summary statistics is written in C++. It will be significantly faster than any Python solution. even with having to add/join the data back into the original table? What I have been doing, is running the Frequency Tool (which creates a new table) and then using a search and update cursor to get the data back into the original file. I guess I am wondering if it would be faster to run a cursor in the original file that calculates the "frequency" and updates the data directly into the original table?
... View more
10-30-2013
08:54 AM
|
0
|
0
|
663
|
POST
|
The Summary Statistics tool works exactly like frequency if you specify the inputs correctly, and it only requires ArcGIS Basic (ArcEditor). The python syntax is available at the link. well I am looking for a python data cursor solution for better performance.
... View more
10-30-2013
08:45 AM
|
0
|
0
|
663
|
POST
|
This worked great! Thank you! Now if I wanted to find all the duplicates and sum them up, and print that value for each record (basically a like the Frequency tool) what would I need to change to get that to work? Thanks again! Clinton
... View more
10-30-2013
06:59 AM
|
0
|
0
|
663
|
POST
|
I am trying to l create a python script that will be a substitute to the find identical tool. I ran it last night on a large dataset, and it is taking 10+ hours to run. I believe I can run a search cursor and update cursor that will be light years ahead in performance. So far, I have gotten this far with my script: import arcpy from arcpy import env env.workspace = r"C:\Users\cc1\Desktop\NEW.gdb\WAYNE" table = "WAYNE" list = [] with arcpy.da.SearchCursor(table, ["FULL_ADDRESS_NAME"]) as cursor: for row in cursor: list.append(row[0]) del row, cursor with arcpy.da.UpdateCursor(table, ["FULL_ADDRESS_NAME","FEAT_SEQ"]) as updateRows: for updateRow in updateRows: nameValue = updateRow[0] if nameValue in list: updateRow[1] = lutDict[nameValue] updateRows.updateRow(updateRow) del updateRow, updateRows To be specific for what I am doing, I need to search through a field (that had duplicate values) and return a new value that is a unique number for all the different set of duplicates. For example: search ID new Unique ID aaa 1 aaa 1 bbb 2 ccc 3 ccc 3 aaa 1 ddd 4 So there would be an increment, but based if the number on the field of search ID is unique, and each successive value that is the same search ID would have the same value. Any thoughts on how to accomplish this? Thanks in advance!!
... View more
10-30-2013
05:28 AM
|
0
|
17
|
3164
|
POST
|
An insert cursor would be your best performance for this kind of operation. Can you post the code you have where you aren't getting the performance you require? Sorry let me explain, I created an example using field mappings. It worked great, and had great performance. My test file is 70,000 records (1/100th of my larger set), and it took around 10 seconds to export. I also tried the append method, and I have decided actually to go with that route, as 1. the code is simpler (and allows to easily set the field length), 2. performance is exactly the same, 3. allows me to take the two files that my data comes in, and can easily merge them together into one. You were saying insert cursor would still be faster then both these methods? Do you have an example that I could look at to play around with? Thanks!
... View more
10-29-2013
05:52 AM
|
0
|
0
|
1062
|
Title | Kudos | Posted |
---|---|---|
1 | 03-21-2016 10:10 AM | |
1 | 04-30-2015 09:41 AM | |
2 | 04-28-2015 05:45 AM |
Online Status |
Offline
|
Date Last Visited |
11-11-2020
02:24 AM
|