Using 'in_memory' workspace yields incorrect results

Anonymous User · ‎01-07-2013

I have came across a strange problem while working with the 'in_memory' workspace. Below is a small snippet where I summarize (Statistics_analysis tool) a table by a field called 'TWPSEC' to get the frequency of each unique value in the field. When I use the 'in_memory' workspace to store this table I am getting incorrect results, but when I use a file GDB, I get the correct results for the summarized table. The in_memory workspace is also a geodatabase, (although a temporary one) so I am not sure why I am getting different results. Has anyone else experienced this issue?

#ws = 'in_memory'
ws = r'G:\Map_Documents\Walkin_requests\Scratch.gdb'

# Summarize
sum_tab2 = p.join(ws,'sum_tab2')
arcpy.Statistics_analysis(copy_tab,sum_tab2,'TWPSEC COUNT','TWPSEC')

Here is what happens when I comment out the 'Scratch.gdb' and use the 'in_memory' workspace:
[ATTACH=CONFIG]20509[/ATTACH]

And here is what happens when I comment out the 'in_memory' workspace and use the 'Scratch.gdb'. I have not altered anything else in the code:
[ATTACH=CONFIG]20510[/ATTACH]

EDIT: I am using ArcGIS 10.1, SP 1 on Windows XP

Anonymous User · ‎01-07-2013

Original User: mzcoyle

Interesting that the totals still match. Can you post the code that goes into creating the table? I know there have been some bugs related to in_memory statistics in the past but I had thought they had all been resolved. If you copy the in_memory output to disk can you narrow down any differences?

Anonymous User · ‎01-07-2013

Original User: Caleb1987

Interesting that the totals still match. Can you post the code that goes into creating the table? I know there have been some bugs related to in_memory statistics in the past but I had thought they had all been resolved. If you copy the in_memory output to disk can you narrow down any differences?

Yes, interesting indeed. It doesn't make any sense. I did do a test where I copied all the tables to that same scratch.gdb and there were no problems so I am at a loss. What is even stranger still is that there is a total of 3 tables that were created in the 'in_memory' workspace and the other two were correct.

Below is the function where this is happening. I do not change anything except the 'ws' variable and I am getting the bad result table when using 'in_memory':

def lastnameinput(Parcels, PdfPath, MXD):
    
    try:
        arcpy.env.qualifiedFieldNames = True
        # Collect input from user
        lastname = raw_input("\nEnter the last name and hit ENTER\n").upper()

        # Select input record
        query = "REALDATA.DEEDHOLDER LIKE '*%s*'" % lastname
        arcpy.SelectLayerByAttribute_management(Parcels, "NEW_SELECTION", query)
        result = int(arcpy.GetCount_management(Parcels).getOutput(0))

        # Return message to user
        print "Initial Result: %s" % result

        # set up twpname to avoid crash
        if result == 1:
            twpname = lastname
           
        
        # Deterime if more than one result returned
        if result > 1:

            ws = 'in_memory'  # This creates ONE bad table
##            ws = r'G:\Map_Documents\Walkin_requests\Scratch.gdb'
##            if arcpy.Exists(ws):
##                arcpy.Delete_management(ws)
##            arcpy.CreateFileGDB_management(r'G:\Map_Documents\Walkin_requests','Scratch.gdb')
            arcpy.Statistics_analysis(Parcels,p.join(ws,'sum_tab'),[['PARCEL.PID', 'COUNT']],'REALDATA.DEEDHOLDER')
            sum_tab = p.join(ws,'sum_tab')
            
            # Create name dictionary
            with arcpy.da.SearchCursor(sum_tab,['OBJECTID','REALDATA_DEEDHOLDER','FREQUENCY']) as rows:
                namedict = dict([(row[0],[row[1], row[2]]) for row in rows])
                
            for key,value in namedict.iteritems():
                print '{0} : {1}, Count={2}'.format(key, str(value[0]),
                                                   str(value[-1]))  # print menu

            #  Collect input from user from Name dictionary
            idx = int(raw_input('Enter number for name\n'))
            fin = "'%s'" %str(namedict[idx][0])  
          
            query = "\"REALDATA.DEEDHOLDER\" = %s" %fin # already has ''
            arcpy.SelectLayerByAttribute_management(Parcels,"NEW_SELECTION",query)
       
            # Generate sum table for twpsec
            arcpy.CopyRows_management(Parcels, p.join(ws,'copy_tab')) # copies selected rows
            
            # Add field
            copy_tab = p.join(ws,'copy_tab')
            arcpy.AddField_management(copy_tab,'TWPSEC','TEXT',5)

            # pop TWPSEC field
            with arcpy.da.UpdateCursor(copy_tab,['PARCEL_PID','TWPSEC']) as rows:
                for row in rows:
                    row[1] = row[0][5:10]
                    rows.updateRow(row)
            
            # Sum 
            sum_tab2 = p.join(ws,'sum_tab2')
            arcpy.Statistics_analysis(copy_tab,sum_tab2,'TWPSEC COUNT','TWPSEC') # this is the ONLY one that comes out wrong if using 'in_memory'
    
            # twpdict
            with arcpy.da.SearchCursor(sum_tab2,['OBJECTID','TWPSEC','FREQUENCY']) as rows:
                twpdict = dict([(row[0],[row[1],row[2]]) for row in rows])
                
            needmenu = raw_input('Do you wish to see Township/Area Menu? (y,n)\n').lower()
            if needmenu == 'y':
                township_file = r'G:\Data\Geodatabase\Cedar_County.mdb\JURISDICTION\POLITICAL_TWP'
                with arcpy.da.SearchCursor(township_file,['NAME','Area_Number']) as rows:
                    TwpNameDict = dict([(row[0],row[1]) for row in rows])
                for key,value in TwpNameDict.iteritems():
                    if len(key) > 7:
                        print '{0}\t: {1}' .format(key,value)
                    else:
                        print '{0}\t\t: {1}' .format(key,value)
            
            # Print display
            print '\n'
            for key,value in twpdict.iteritems():
                print '{0} : TWP-{1}, SEC{2}, Count: {3}' .format((key,str(value[0])[0:2],str(value[0])[2:5],
                                                                  str(value[1]))

            # TWP dictionary index
            twpidx = int(raw_input('\nType number for TWP SEC\n'))
            twp = '%-{0}-%' .format(str(twpdict[twpidx][0]))

            
            # Final Selection 
            query = "\"REALDATA.DEEDHOLDER\" = {0} AND \"PARCEL.PID\" LIKE '{1}'" .format(fin,twp) 
            twpname = lastname +'_TWP_%s' %str(twpdict[twpidx]).split(',')[0][3:-1].replace('-','_')
            arcpy.SelectLayerByAttribute_management(Parcels, "NEW_SELECTION", query)
            result = int(arcpy.GetCount_management(Parcels).getOutput(0))
            
            # Return message to user
            print "Post Result: %s" % result
            
        # Determine if records returned
        if result:
            # Return message to user
            print 'selecting parcel...'
            
            # Modify layout in map document to the selected features
            df.zoomToSelectedFeatures()
            df.extent = parcels.getSelectedExtent(True)
            if result == 1:
                df.scale *= 2.0
            elif result > 1:
                df.scale *= 1.4
            arcpy.RefreshActiveView()
            
            # Local Variable
            pdfownername = '{0}.pdf'.format(twpname)
            pdf = p.join(PdfPath, pdfownername) 
            
            # Function: call printpdf()
            arcpy.env.qualifiedFieldNames = False
            printpdf(MXD, pdf)

        else:
            
            # Return Error message to user
            print '\n#########################################################'
            print '#\tInvalid SQL statement, no selected features\t#'
            print '#\tPlease verify the Name and try again\t\t#'
            print '#########################################################\n'

            # Function: call lastnameinput()
            lastnameinput(Parcels, PdfPath, MXD)
    except:
        print arcpy.GetMessages(2)
        tb     = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = "PYTHON ERRORS:\nTraceback info:\n%s\nError Info:\n%s\n" % (tbinfo, sys.exc_info()[1])
        msgs  = "ArcPy ERRORS:\n%s\n" % arcpy.GetMessages(2)
        arcpy.AddError(pymsg)
        arcpy.AddError(msgs)
        print pymsg
        print msgs

Anonymous User · ‎01-07-2013

Well, I found a work around to basically replicate the summary statistics tool on this subset of records. Although, this is really clunky and I do not like it. I wish I had an ArcInfo license so I could just use the Frequency tool 😞

Not really sure this long workaround is worth it just to use the in_memory workspace rather than writing the temp tables out onto disk, but here it is:

rows = arcpy.SearchCursor(copy_tab)
unid = {}
dlist = []
count = 0
for r in rows:
    dlist.append(r.TWPSEC)
    if r.TWPSEC not in unid:
        unid[r.TWPSEC] = count
        count += 1
del r, rows

i = 1
twpdict = {}
for key in unid.keys():
    num = dlist.count(key)
    twpdict=[key,num]
    i +=1

for key,value in twpdict.iteritems():
    print '{0} : TWP-{1}, SEC{2}, Count: {3}'.format(str(key),str(value[0])[:2],
                                                     str(value[0])[2:6],str(value[1]))

[ATTACH=CONFIG]20528[/ATTACH]

I chatted with Esri about it earlier, and they think it may be a new bug with the in_memory workspace. I am still waiting to hear back from them on the matter.

Anonymous User · ‎01-08-2013

Original User: jamesfreddyc

Do you think the in_memory space is holding on to some leftover table or some other reference? I rely heavil on the in_memory envrionment to create and build tables/featclasses for lots of things and I have not run into a similar thing (but I am not invoking statistics calcs in ags10 either).

Here is a def I use to prep/clear out the in_memory before I use it (arcgisscripting 9.3):

def ClearINMEM():
   ## clear out the IN_MEMORY workspace of any tables
   try:
     gp.Workspace = "IN_MEMORY"
     tabs = gp.ListTables()
     
     ### for each Table in the list of tabs's, delete it.
     for tab in tabs:
        gp.Delete_management(tab)
   except:
     gp.AddMessage("The following error(s) occured: " + gp.GetMessages())
     return

Anonymous User · ‎01-08-2013

I do not think this is the issue since I rarely use the in_memory workspace. According to the ArcGIS help docs, the temp workspace will be deleted after the script executes so this doesn't appear to be the issue. I also have the overwriteOutput set to true (I know this is shaky sometimes). The table has the correct amount of records, it just doesn't summarize them properly. It also doesn't make sense that out of the 3 tables generated to the in_memory workspace, only one is incorrect. As Mathew mentioned, there have been bugs found in the past, perhaps this is one that wasn't discovered.

It just doesn't make any sense that when I change the 'ws' variable to a workspace (gdb) on disk, it works without any problems. I just ran a test to list the tables in the 'in_memory' workspace after the script has closed itself and it returns an empty list.

Anonymous User · ‎01-08-2013

Original User: jamesfreddyc

I do not think this is the issue since I rarely use the in_memory workspace. According to the ArcGIS help docs, the temp workspace will be deleted after the script executes so this doesn't appear to be the issue. I also have the overwriteOutput set to true (I know this is shaky sometimes). The table has the correct amount of records, it just doesn't summarize them properly. It also doesn't make sense that out of the 3 tables generated to the in_memory workspace, only one is incorrect. As Mathew mentioned, there have been bugs found in the past, perhaps this is one that wasn't discovered.

It just doesn't make any sense that when I change the 'ws' variable to a workspace (gdb) on disk, it works without any problems. I just ran a test to list the tables in the 'in_memory' workspace after the script has closed itself and it returns an empty list.

Yeah -- was just throwing out some ideas.

When you change the 'ws' around, wonder if there is a possibilty that the references are holding on to something and your stats are being run on those older referenced items. Again -- just thinking out loud (I am from the ArcObjects/.net world and perhaps I am viewing things incorrectly --- I still view everything as an object!).

Good luck.

Anonymous User · ‎01-08-2013

Yeah -- was just throwing out some ideas.

When you change the 'ws' around, wonder if there is a possibility that the references are holding on to something and your stats are being run on those older referenced items.

I appreciate the input, I think you may be onto something here. I have been doing some more testing and have seen some strange results. When using both workspaces to write out two tables at a time (one to in_memory (ws), the other to disk(ws2)), no matter what ONLY the 'sum_tab2' table is incorrect for both (using search cursors to print out the results for each). But then when I comment out the in_memory workspace and use the disk gdb it works perfectly. It seems just the mere existence of this workspace while creating the 'sum_tab2' table causes the strange results. This is very bizarre how it only happens on this one table in the in_memory workspace when the other 2 tables turn out fine. I am not sure if it is a bug, may just be a problem with the tables somehow.

I also have not been able to reproduce this behavior with any other table. This is a joined table so I thought maybe that had something to do with that, so I tested it on several other joined tables but could not reproduce this error. I even tested each table separately joined to others and could not reproduce the error either. And finally, I tested this on the original joined tables in different MXD's and got the same error again. I am thoroughly confused.

Anonymous User · ‎04-04-2013

Original User: ehm119

Has anyone come up with any further solutions? I have a Python script that ranks the values of a valueField and writes the ranks to a rankField. When I run it with an on-disk GDB, I get the proper results. However, when I use an in_memory Workspace, the rows are simply ranked in the order they appear in the table (i.e. by OBJECTID). This is a little disconcerting, and a shame, since the in-memory workspace is so much faster.