Efficient counting of features occurrences per attribute value

ZoltanSzecsei · ‎05-02-2023

Hi,

I have over 2700 FileGDBs in which to count feature occurrences based on an attribute value.

The attribute field (FeaType) has over 200 possible values. (Typically: Tree, Road, Building, etc etc.)

I need to count the occurences of each feature type, per FileGDB and put them in a table with 1 row per FileGDB and 1 column per FeaType value.

Putting them into a table is no issue - I am using xlsxwriter.

What would be the most efficient way to count these features?

I have tried iterating through each FileGDB, each FeatureClass, row by row incrementing a table entry based on the FeaType value - very slow 😞

I could try iterating through each FeaType value, then using that value to 'select' and 'getcount'.

But surely there is a more efficient way?

Some pointers would be great.

Thanks in advance,

Zoltan

Anonymous User · ‎05-02-2023

Multiprocess it so that a thread works on one fgdb at a time, and returns a dictionary of counts for features. When the threads are done, combine all dictionaries for the total sum. What code do you have so far?

Anonymous User · ‎05-02-2023

Here's a little sample of what I am thinking- you can modify the fields to be more dynamic per fc but this should get you started.

Edited to go by the field FeaType and key off attribute value.

def get_count(fgdb):
    env.workspace = fgdb
    fcDict = {'fgdb': f'{fgdb}', 'status': True, 'featVals': {'fc': None, 'feaTypeCnt': {}}}

    for fc in arcpy.ListFeatureClasses():
        fields = [f.name for f in arcpy.ListFields(fc) if f.name == 'FeaType']
        fcDict['featVals']['fc'] = os.path.basename(fc)
        if fields:
            with arcpy.da.SearchCursor(fc, fields) as sCur:
                for row in sCur:
                    if fcDict['featVals']['feaTypeCnt'].get(row[0]):
                        fcDict['featVals']['feaTypeCnt'][row[0]] = fcDict['featVals']['feaTypeCnt'][row[0]] + 1
                    else:
                        fcDict['featVals']['feaTypeCnt'][row[0]] = 1
        else:
            fcDict['status'] = 'Did not contain FeaType'

    return fcDict


if __name__ == '__main__':
    workspace = r"C:\Path\to\explore"
    gdbs = []
    for dirpath, dirnames, filenames in arcpy.da.Walk(workspace, datatype="Container"):
        for dirname in dirnames:
            if ".gdb" in dirname:
                gdbs.append(os.path.join(dirpath, dirname))

    cores = mp.cpu_count()
    with mp.Pool(processes=cores) as pool:
        jobs = [pool.apply_async(get_count, (gdb,)) for gdb in gdbs]

        res = [r.get() for r in jobs]

    for r in res:
        if r['status']:
            vals = r['featVals']
            print(f'{r["fgdb"]} : {vals["fc"]}')
            for k, v in vals['feaTypeCnt'].items():
                print(f'\t{k}: {v}')
        else:
            print(f'{r["fgdb"]} {r["featVals"]["fc"]} {r["status"]}')

GISErik · ‎05-07-2023

Assuming I understand your intent, I would use Pandas and the Spatially Enabled Dataframe for this task. I haven't tested this code, but assuming you have a list of FGDB paths and are always using the same feature class name, this might work as is.

I've found that if you give Pandas a list of dictionaries, it will use the keys as columns, and the keys don't have to be identical in each dictionary.

import pandas as pd
import os
from arcgis.features import GeoAccessor, GeoSeriesAccessor

data = []
for this_gdb_path in my_list_of_gdb_paths:
        this_fc_path = os.path.join(this_gdb_path, "ConstantFeatureClassName")
	sdf = pd.DataFrame.spatial.from_featureclass(this_fc_path)
	this_dict = sdf.FeaType.value_counts().to_dict()
	this_dict['gdb_path'] = this_gdb_path
	data.append(this_dict)

df = pd.DataFrame(data)
df.to_excel('FeaTypesPerGDB.xlsx')