describe the size of a dataset

06-25-2015 02:21 AM
ArcGIS shows the size of each Dataset and I want to use this sizes in Python.

so the first point is  I'm looking for a function that decribes the size of a featureclass inside my filegeodabase. ArcCatalog shows it in the Content.sizeInside_GDB_50proz.gif

I've serched inside the Decribe-Objects, but didn't found. Did I overlooked something?

Second point is  I'm looking for a function that describe wich files belongs for ArcGIS to a geodataset, so that i can summerize the sizes of each part.

For example the land.shp -Dataset consist of e.g  SHP, SHX, DBF-File. So,  initially glob.glob('land.*') seems to be the solution. But I've got also an archive or an Table land.csv inside the directory...

How can I consider geodata like ArcGIS do?

Thanks, Lothar

To your first point, file size most likely comes from the OS you're running (Windows, Linux, etc.). You can get this from the os module in Python (e.g. see here).

For your second point, there are probably several ways, but for your example an easy way would be to limit your search to just the possible extensions for files making up a shapefile. This would probably not work for a feature class in a file geodatabase, service, etc., as those are more complicated structures.

I haven't come across any way to extract feature class size from within a FGDB using Python, directly. The best workaround I found was to create a new FGDB, copy over each feature class one by one, and monitor the size of the entire FGDB.

If you are willing to install/import comtypes, the following code can be used to give dataset sizes and timestamps for shapefiles in a folder or datasets in a file geodatabase:

def GetDatasetFileStatsFromWorkspace(in_workspace):
    from comtypes.client import CreateObject, GetModule
    import os
    def _GetDatasetFileStats(pDataset):
        from datetime import datetime, timedelta
        from dateutil import tz
        DFS = {}
        d = datetime(1970, 01, 01, tzinfo=tz.tzutc())
        pDFS = pDataset.QueryInterface(esriGeoDatabase.IDatasetFileStat2)
        DFS['StatSize'] = pDFS.StatSize
        DFS['StatTime'] = {
            'LastAccess': d + timedelta(0, pDFS.StatTime(0)),
            'Creation': d + timedelta(0, pDFS.StatTime(1)),
            'LastModification': d + timedelta(0, pDFS.StatTime(2))
        return DFS
    assert os.path.isdir(in_workspace), "Workspace is not folder or file geodatabase"
    comDirectory = os.path.join(  
        os.path.join(arcpy.GetInstallInfo()['InstallDir']), 'com'  
    esriDataSourcesGDB = GetModule(os.path.join(comDirectory, 'esriDataSourcesGDB.olb'))
    esriDataSourcesFile = GetModule(os.path.join(comDirectory, 'esriDataSourcesFile.olb'))
    esriGeoDatabase = GetModule(os.path.join(comDirectory, 'esriGeodatabase.olb'))
    if in_workspace.endswith('.gdb'):
        pWSF = CreateObject(esriDataSourcesGDB.FileGDBWorkspaceFactory,
        pWSF = CreateObject(esriDataSourcesFile.ShapefileWorkspaceFactory,
    pWS = pWSF.OpenFromFile(in_workspace, 0)
    pEnumDS = pWS.Datasets(1)
    pDS = pEnumDS.Next()
    DS = {}
    while pDS:
        if pDS.Type == esriGeoDatabase.esriDTFeatureDataset:
            pEnumSS = pDS.Subsets
            pSS = pEnumSS.Next()
            while pSS:
                Name = os.path.join(pDS.Name, pSS.Name)
                DS[Name] = _GetDatasetFileStats(pSS)
                pSS = pEnumSS.Next()
            DS[pDS.Name] = _GetDatasetFileStats(pDS)
        pDS = pEnumDS.Next()
    return DS if DS else None

A couple or few comments:

  • The dataset size and timestamps are coming from the IDatasetFileStat2 interface of the Geodatabase library.
    • Dataset size is in bytes (original format).
    • Dataset timestamps are Python timedate in UTC (converted from original to Python type).
  • The function returns a dictionary of properties for all shapefiles in a folder or datasets in a file geodatabase.
    • The dictionary keys are dataset names.
      • Feature datasets are recursed, and the feature dataset name is prefixed to the dataset name.
    • The timestamps are further stored in another dictionary with those keys being the type of timestamp:  Creation, LastModification, LastAccess.
  • Error catching is limited.  The code is demonstrative and not production.
    • One assertion statement is included to catch the most likely error of an invalid workspace type being passed since com errors can be cryptic, or at least in this case.
  • If you do install comtypes and haven't worked with it before, see the following StackExchange post about a configuration change that is necessary to make it work with ArcGIS:  ArcObjects + comtypes at 10.1 and newer
Hi Joshua,

thanks for your comprehensive and penetrative answer!

I have to digest your code , but I think it contains all answers i'm looking for.


