topic Efficient counting of features occurrences per attribute value in Python Questions

Efficient counting of features occurrences per attribute value

ZoltanSzecsei — Tue, 02 May 2023 12:43:22 GMT

Hi,

I have over 2700 FileGDBs in which to count feature occurrences based on an attribute value.

The attribute field (FeaType) has over 200 possible values. (Typically: Tree, Road, Building, etc etc.)

I need to count the occurences of each feature type, per FileGDB and put them in a table with 1 row per FileGDB and 1 column per FeaType value.

Putting them into a table is no issue - I am using xlsxwriter.

What would be the most efficient way to count these features?

I have tried iterating through each FileGDB, each FeatureClass, row by row incrementing a table entry based on the FeaType value - very slow 😞

I could try iterating through each FeaType value, then using that value to 'select' and 'getcount'.

But surely there is a more efficient way?

Some pointers would be great.

Thanks in advance,

Zoltan

Re: Efficient counting of features occurrences per attribute value

Anonymous User — Tue, 02 May 2023 13:05:19 GMT

Multiprocess it so that a thread works on one fgdb at a time, and returns a dictionary of counts for features. When the threads are done, combine all dictionaries for the total sum. What code do you have so far?

Re: Efficient counting of features occurrences per attribute value

Anonymous User — Tue, 02 May 2023 15:24:35 GMT

Here's a little sample of what I am thinking- you can modify the fields to be more dynamic per fc but this should get you started.

Edited to go by the field FeaType and key off attribute value.

def get_count(fgdb): env.workspace = fgdb fcDict = {'fgdb': f'{fgdb}', 'status': True, 'featVals': {'fc': None, 'feaTypeCnt': {}}} for fc in arcpy.ListFeatureClasses(): fields = [f.name for f in arcpy.ListFields(fc) if f.name == 'FeaType'] fcDict['featVals']['fc'] = os.path.basename(fc) if fields: with arcpy.da.SearchCursor(fc, fields) as sCur: for row in sCur: if fcDict['featVals']['feaTypeCnt'].get(row[0]): fcDict['featVals']['feaTypeCnt'][row[0]] = fcDict['featVals']['feaTypeCnt'][row[0]] + 1 else: fcDict['featVals']['feaTypeCnt'][row[0]] = 1 else: fcDict['status'] = 'Did not contain FeaType' return fcDict if __name__ == '__main__': workspace = r"C:\Path\to\explore" gdbs = [] for dirpath, dirnames, filenames in arcpy.da.Walk(workspace, datatype="Container"): for dirname in dirnames: if ".gdb" in dirname: gdbs.append(os.path.join(dirpath, dirname)) cores = mp.cpu_count() with mp.Pool(processes=cores) as pool: jobs = [pool.apply_async(get_count, (gdb,)) for gdb in gdbs] res = [r.get() for r in jobs] for r in res: if r['status']: vals = r['featVals'] print(f'{r["fgdb"]} : {vals["fc"]}') for k, v in vals['feaTypeCnt'].items(): print(f'\t{k}: {v}') else: print(f'{r["fgdb"]} {r["featVals"]["fc"]} {r["status"]}')

Re: Efficient counting of features occurrences per attribute value

GISErik — Mon, 08 May 2023 00:27:22 GMT

Assuming I understand your intent, I would use Pandas and the Spatially Enabled Dataframe for this task. I haven't tested this code, but assuming you have a list of FGDB paths and are always using the same feature class name, this might work as is.

I've found that if you give Pandas a list of dictionaries, it will use the keys as columns, and the keys don't have to be identical in each dictionary.

import pandas as pd import os from arcgis.features import GeoAccessor, GeoSeriesAccessor data = [] for this_gdb_path in my_list_of_gdb_paths: this_fc_path = os.path.join(this_gdb_path, "ConstantFeatureClassName") sdf = pd.DataFrame.spatial.from_featureclass(this_fc_path) this_dict = sdf.FeaType.value_counts().to_dict() this_dict['gdb_path'] = this_gdb_path data.append(this_dict) df = pd.DataFrame(data) df.to_excel('FeaTypesPerGDB.xlsx')