Select to view content in your preferred language

Using count and group by in makeQueryTable_Management

2036
8
10-29-2013 07:24 AM
DaveTaylor
Emerging Contributor
I know you can't use the count and group by in the where clause of the MakeQueryTable_Management function but is there a way using python to count the rows that are the same once done?

I have the function working so that it exports to a .dbf file but instead of having them grouped by ID's it just has them all listed separately. For instance:

ID
1234
2345
2346
1234
2345

What I want is this:

ID       COUNT
1234       2
2345       2
2346       1

Is there something I can use in python to either group all the same id's together and add a new column to my dbf or is there something with ESRI I can use?

My end result is I want to join this to a shapefile I already have and color code them by counts and have it update on the fly.

Thanks in advance!
Tags (2)
0 Kudos
8 Replies
DouglasSands
Deactivated User
Something along the lines of the following code should work:

from collections import Counter

def count_duplicates(path_to_data, field_to_count):
    #Add the count field
    data_layer = arcpy.MakeFeatureLayer_management(path_to_data, 'data')
    arcpy.AddField_management(data_layer, 'COUNT', 'LONG')
    arcpy.Delete_management(data_layer)
    del data_layer

    #Do the counting:
    value_list = []
    with arcpy.da.SearchCursor(path_to_data, [field_to_count]) as search_rows:
        for row in search_rows:
            value_list.append(row[0])

    counts = Counter(value_list)
    del value_list
    
    with arcpy.da.UpdateCursor(path_to_data, [field_to_count, 'COUNT']) as update_rows:
        for row in udpate_rows:
            row[1] = counts[row[0]]
            update_rows.updateRow(row)

0 Kudos
DaveTaylor
Emerging Contributor
Thanks. This is what I put and I am getting a syntax error

def count_duplicates("I/live_data.dbf",'id'):
    #Add the count field
    data_layer = arcpy.MakeFeatureLayer_management("I:/live_data.dbf", 'data')
    arcpy.AddField_management(data_layer, 'COUNT', 'LONG')
    arcpy.Delete_management(data_layer)
    del data_layer

#Do the counting:
    value_list = []
    with arcpy.da.SearchCursor("I/live_data.dbf", ['id']) as search_rows:
        for row in search_rows:
            value_list.append(row[0])

    counts = Counter(value_list)
    del value_list
   
    with arcpy.da.UpdateCursor("I/live_data.dbf", ['id', 'COUNT']) as update_rows:
        for row in udpate_rows:
            row[1] = counts[row[0]]
            update_rows.updateRow(row)
0 Kudos
DouglasSands
Deactivated User
Using your example, to execute the function keep what I originally posted and add the following at the end:

count_duplicates("I/live_data.dbf",'id')


Since the code is defined as a function, you need to call it at the end rather than replacing the parameters. Also to be sure you have the correct file path you can navigate to it in Windows Explorer and hold shift, then right click on the file. This enables the option "Copy as text".
0 Kudos
DouglasSands
Deactivated User
You could also accomplish the same thing by using Summary Statistics, and then joining the resulting DBF table back to your source dataset. This could be easily done in model builder also.
0 Kudos
DuncanHornby
MVP Notable Contributor
Unless I'm missing the point, calling the Summary Statistics tool will do this in one line of code?

arcpy.Statistics_analysis("Export_Output","in_memory/x",[["ID","COUNT"]],"ID")


This creates a table called X in the in_memory workspace and returns a count on a field called ID.
0 Kudos
DouglasSands
Deactivated User
It would. I overthought the issue initially, although cursors are usually faster with large data sets. The join back is necessary so that the features can be symbolized by the count.
0 Kudos
DaveTaylor
Emerging Contributor
Unless I'm missing the point, calling the Summary Statistics tool will do this in one line of code?

arcpy.Statistics_analysis("Export_Output","in_memory/x",[["ID","COUNT"]],"ID")


This creates a table called X in the in_memory workspace and returns a count on a field called ID.


How do I get this out of memory and saved as an actual .dbf file?
0 Kudos
DaveTaylor
Emerging Contributor
nevermind I think I got it!

Thanks that appears to work perfectly!
0 Kudos