Count and sum unique values

5140
8
10-11-2017 10:23 PM
SibghatUllah1
Occasional Contributor

I am writing a code for count and sum unique values and exporting results to csv. I want that to be done in a quick way.I am getting the unique values count on Field "SubtypeCD" and exporting them to csv using following code,but i need sum also on shapelength().Please guide 

import arcpy
import csv
import os
import io
from arcpy import env
env.overwriteOutput = True

mxd=arcpy.mapping.MapDocument("CURRENT")
df = arcpy.mapping.ListDataFrames(mxd, "")[0]
count=0
field = ['SubtypeCD']

with open(r'D:\result.csv','wb') as resultFile:
 wr = csv.writer(resultFile, dialect='excel')
 for lyr in arcpy.mapping.ListLayers(mxd,"",df):
    if lyr.name=="LV UG Electric Line Segment":
       flds = [f.name for f in arcpy.ListFields(lyr)]
       s = set(flds)
       print lyr.name
       if 'SubtypeCD' in s :
          wr.writerow([lyr.name]) 
          CountUnique = {}
          sumUnique={}
          desc_Domain = {key: value['Name'] for (key, value) in arcpy.da.ListSubtypes(arcpy.Describe(lyr).catalogPath).iteritems()}

          with arcpy.da.SearchCursor (lyr,[ "SubtypeCD","Shape.STLength()"]) as cursor:
              for row in cursor:
                if not row[0] in CountUnique.keys():
                   CountUnique[desc_Domain[row[0]]] = 1
                else:
                   CountUnique[desc_Domain[row[0]]] += 1                  
                    sumUnique+=row[1]

for key in CountUnique.keys():
print str(key) + ":", CountUnique[key], "features"
tr(key) ,CountUnique[key],"Features"
wr.writerow(total)
0 Kudos
8 Replies
SibghatUllah1
Occasional Contributor

when i am using desc_Domain[row[0]]] it is only showing description but not count..if i use only row[0] then it is showing count.

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

You are counting the same thing twice, shouldn't

CountUnique[desc_Domain[row[0]]] += 1    # the first field

be

CountUnique[desc_Domain[row[1]]] += 1    # the second field

SibghatUllah1
Occasional Contributor

I missed desc_Domain  in If statement...

if not desc_Domain[row[0]] in CountUnique.keys():

I am getting the unique values but i want sum.

CountUnique[desc_Domain[row[1]]] += 1 this one is for count not sum.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Does Summary Statistics—Help | ArcGIS Desktop  not work in this case?

If using ArcGIS geoprocessing tools is out an option for one reason or another, I recommend using pandas now that it is bundled with ArcGIS Desktop's Python implementation.

>>> import random
>>> import pandas
>>> 
>>> faux_cur = [(random.randint(2,8),random.random() * 100) for _ in xrange(20)]
>>> for cd, length in faux_cur:
...     print cd, length
...     
6 96.8258600285
3 77.0591127943
3 40.0932870686
4 30.5733894343
8 89.6075352909
3 60.1866907222
6 48.9614312
3 15.3538726667
4 55.5179664721
4 63.0649217433
7 11.0513070756
3 42.8155108492
8 51.7732680208
8 61.3946527598
3 91.4306233633
4 72.2762171765
3 60.1245526762
6 31.793860784
6 60.9235931092
4 62.8160459607
>>> 
>>> df = pandas.DataFrame(faux_cur)
>>> df.groupby(0)[1].agg(['sum','count'])
          sum  count
0                   
3  387.063650      7
4  284.248541      5
6  238.504745      4
7   11.051307      1
8  202.775456      3
>>> 

I have some code in https://community.esri.com/thread/198817-how-do-i-convert-a-python-collection-to-a-gdb-table-1-to-ma... that shows how to dump a table into a pandas dataframe and export results to CSV file.

SibghatUllah1
Occasional Contributor

Thank you for your help. Actually i am not using  Summary Statisticsas i want to this for multiple layers and export results to csv. I am using arcgis 10.2 and dont have pandas installed.I have searched for pandas installation on 10.2 but could not find solution. Once my script is ready i have to publish GP service and use it in JavaScript API Application.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I can't remember what Python version is packaged with 10.2.x, but I think it was/is at least 2.5.  You can use defaultdict to craft a fairly compact solution.

>>> import random
>>> import collections
>>> 
>>> faux_cur = [(random.randint(2,8),random.random() * 100) for _ in xrange(20)]
>>> for cd, length in faux_cur:
...     print cd, length
...     
7 92.2475389655
4 83.9462712302
6 81.6108729337
6 74.8669001032
3 12.5043749507
7 31.419143564
5 78.0403699406
6 13.6414803142
4 32.3717355049
8 10.7456093313
3 92.3411062073
8 20.0692694293
2 32.0990645882
3 80.236745944
4 84.6515841245
6 42.7892118033
5 12.0345925176
5 56.4556372617
3 68.8038120762
7 60.5648194955
>>> 
>>> count_sum = collections.defaultdict(lambda: (0,0))
>>> for cd, length in faux_cur:
...     c,s = count_sum[cd]
...     count_sum[cd] = c + 1, s + length
...     
>>> for k,v in sorted(count_sum.iteritems(), key=lambda x: x[0]):
...     print k,v
...     
2 (1, 32.09906458822493)
3 (4, 253.88603917814936)
4 (3, 200.96959085957693)
5 (3, 146.5305997199382)
6 (4, 212.90846515451312)
7 (3, 184.23150202499107)
8 (2, 30.81487876056358)
>>> 
JamesCrandall
MVP Frequent Contributor
import arcpy
import pandas as pd

for lyr in arcpy.mapping.ListLayers(mxd, "", df):
    if lyr.name == 'LV UG Electric Line SegmentH':
        narr = arcpy.da.FeatureClassToNumPyArray(lyr, ('SubtypeCD', 'SHAPE_Length'))
        df = pd.DataFrame(narr, columns=['SubtypeCD', 'SHAPE_Length'])
        idxs = df.groupby(by='SubtypeCD', as_index=True)['SHAPE_Length'].sum()

        print idxs‍‍‍‍‍‍‍‍‍‍‍‍‍‍
SibghatUllah1
Occasional Contributor

thank you very much for the code but i am not using pandas.

0 Kudos