Hallo,
ArcGIS shows the size of each Dataset and I want to use this sizes in Python.
so the first point is I'm looking for a function that decribes the size of a featureclass inside my filegeodabase. ArcCatalog shows it in the Content.
I've serched inside the Decribe-Objects, but didn't found. Did I overlooked something?
Second point is I'm looking for a function that describe wich files belongs for ArcGIS to a geodataset, so that i can summerize the sizes of each part.
For example the land.shp -Dataset consist of e.g SHP, SHX, DBF-File. So, initially glob.glob('land.*') seems to be the solution. But I've got also an archive land.zip or an Table land.csv inside the directory...
How can I consider geodata like ArcGIS do?
Thanks, Lothar
To your first point, file size most likely comes from the OS you're running (Windows, Linux, etc.). You can get this from the os module in Python (e.g. see here).
For your second point, there are probably several ways, but for your example an easy way would be to limit your search to just the possible extensions for files making up a shapefile. This would probably not work for a feature class in a file geodatabase, service, etc., as those are more complicated structures.
I haven't come across any way to extract feature class size from within a FGDB using Python, directly. The best workaround I found was to create a new FGDB, copy over each feature class one by one, and monitor the size of the entire FGDB.
If you are willing to install/import comtypes, the following code can be used to give dataset sizes and timestamps for shapefiles in a folder or datasets in a file geodatabase:
def GetDatasetFileStatsFromWorkspace(in_workspace): from comtypes.client import CreateObject, GetModule import os def _GetDatasetFileStats(pDataset): from datetime import datetime, timedelta from dateutil import tz DFS = {} d = datetime(1970, 01, 01, tzinfo=tz.tzutc()) pDFS = pDataset.QueryInterface(esriGeoDatabase.IDatasetFileStat2) DFS['StatSize'] = pDFS.StatSize DFS['StatTime'] = { 'LastAccess': d + timedelta(0, pDFS.StatTime(0)), 'Creation': d + timedelta(0, pDFS.StatTime(1)), 'LastModification': d + timedelta(0, pDFS.StatTime(2)) } return DFS assert os.path.isdir(in_workspace), "Workspace is not folder or file geodatabase" comDirectory = os.path.join( os.path.join(arcpy.GetInstallInfo()['InstallDir']), 'com' ) esriDataSourcesGDB = GetModule(os.path.join(comDirectory, 'esriDataSourcesGDB.olb')) esriDataSourcesFile = GetModule(os.path.join(comDirectory, 'esriDataSourcesFile.olb')) esriGeoDatabase = GetModule(os.path.join(comDirectory, 'esriGeodatabase.olb')) if in_workspace.endswith('.gdb'): pWSF = CreateObject(esriDataSourcesGDB.FileGDBWorkspaceFactory, interface=esriGeoDatabase.IWorkspaceFactory) else: pWSF = CreateObject(esriDataSourcesFile.ShapefileWorkspaceFactory, interface=esriGeoDatabase.IWorkspaceFactory) pWS = pWSF.OpenFromFile(in_workspace, 0) pEnumDS = pWS.Datasets(1) pDS = pEnumDS.Next() DS = {} while pDS: if pDS.Type == esriGeoDatabase.esriDTFeatureDataset: pEnumSS = pDS.Subsets pSS = pEnumSS.Next() while pSS: Name = os.path.join(pDS.Name, pSS.Name) DS[Name] = _GetDatasetFileStats(pSS) pSS = pEnumSS.Next() else: DS[pDS.Name] = _GetDatasetFileStats(pDS) pDS = pEnumDS.Next() return DS if DS else None
A couple or few comments:
Hi Joshua,
thanks for your comprehensive and penetrative answer!
I have to digest your code , but I think it contains all answers i'm looking for.
Lothar