Get file size for Shapefile not SHP file

1937
5
09-14-2011 04:24 PM
HenryColgate
Occasional Contributor
I was wondering if there was a way to get file sizes from a shapefile or other multi part spatial file as a single command.

For instance a shapefile is comprised of the several files including SHP, DBF, SHX, etc... and would involve getting info for each of the parts if done using regular Python.  While I can use the os.path.getfilesize('fileName') command it does not take into account the other information that is not specifically the SHP file. 

While it would not be too difficult to write a one off for say a shapefile it would require writing one for each multi part file type in 'ListDatasets'.
Tags (2)
0 Kudos
5 Replies
MarcNakleh
New Contributor III
hmmm... that's an interesting question!

I've wondered the same thing and have so far not had any luck when it comes to finding a built-in function in arcpy to get the size of a feature class or shapefile.

If you were cycling through the various shapes in a path, I would have gone with something using list comprehensions like:

for shp in arcpy.ListFeatureClasses():
    total_size = sum([os.path.getsize(x) for x in glob.glob(os.path.join(path, shp[:-4]) + '*')])

or, cleaning it up a bit:

for shp in arcpy.ListFeatureClasses():
    mask = os.path.join(path, shp[:-4]) + '*'
    total_size = sum([os.path.getsize(x) for x in glob.glob(mask)])


But it still feels sloppy to me, and it doesn't work for feature classes or datasets (only shapes.)
Still, I hope this helps!
0 Kudos
DanielScott
New Contributor

Henry Colgate and Marc,

Jumping on the ride here, for the last few weeks Ive been messing around with this script to get this correct function and Im getting some layer files being mapped whilst others not. 

What want to do:

a) Map out the layers files that are in use in a mxd (shps should be dealt with with the glob, but doesnt include tiffs)

b) Get the size of the collection of shps (i.e. test.shp consists of test.dbf, test.proj etc and I want to get the size of this bunch) - problem

c) Total size calculator of how much data is in the mxd - cant figure best way of doing this as b is blocking me

I found a really nice library called humanize to help the size conversion.

import arcpy, os, humanize, glob

def write_log(text,file):
    f = open(file,'a')  #a appends to an existing file if it exists
    f.write("{}\n".format(text)) #write the text to the logfile and move to the next line
    return

output = r'X:\ds\sizemxd.txt' #directory of the log file

mxd = arcpy.mapping.MapDocument("CURRENT")
df = arcpy.mapping.ListDataFrames(mxd)

write_log("This report summarizes the names of all map documents and data frames within " + mxd.filePath +  "\n",output)
write_log("Date: " + str(datetime.datetime.today().strftime("%d %B, %Y")) + "\n",output)

for d in df:
    write_log("Data Frame: " + d.name, output)
    layers = arcpy.mapping.ListLayers(mxd, "", d)
    for lyr in layers:
        try:
            if lyr.supports("dataSource"):
                lname = lyr.name
                print "lname: " + lname
                datasource = lyr.dataSource
                wspath = lyr.workspacePath
                if datasource.endswith('.shp'):
                    stringwspath = str(wspath).replace("\\", '/')
                    path = stringwspath + "/"
                    print "Path: " + path
                    for shp in glob.glob(os.path.join(path,'{0}.*').format(lname)):
                        shp = str(shp).replace("\\", '/')
                        print "stripped " + stripshp
                        try:
                            size = os.stat(shp).st_size
                            print "size of shp " + stripshp + " is " + humanize.naturalsize(size)
                        except:
                            print "unable to access size of " + shp
                else:
                    pass
            else:
                pass
        #print "File size: " + humanize.size
        except:
            print "Unable to analyse size for " + lname.encode('utf-8')‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The output is:

Path: C:/Users/ds/Desktop/
stripped C:/Users/ds/Desktop/E458_search.cpg
size of shp C:/Users/ds/Desktop/E458_search.cpg is 9 Bytes
stripped C:/Users/ds/Desktop/E458_search.dbf
size of shp C:/Users/ds/Desktop/E458_search.dbf is 391 Bytes
stripped C:/Users/ds/Desktop/E458_search.sbn
size of shp C:/Users/ds/Desktop/E458_search.sbn is 252 Bytes
stripped C:/Users/ds/Desktop/E458_search.sbx
size of shp C:/Users/ds/Desktop/E458_search.sbx is 124 Bytes
stripped C:/Users/ds/Desktop/E458_search.shp
size of shp C:/Users/ds/Desktop/E458_search.shp is 2.3 kB
stripped C:/Users/ds/Desktop/E458_search.shp.NB-055.12412.16264.sr.lock
size of shp C:/Users/ds/Desktop/E458_search.shp.NB-055.12412.16264.sr.lock is 0 Bytes
stripped C:/Users/ds/Desktop/E458_search.shx
size of shp C:/Users/ds/Desktop/E458_search.shx is 204 Bytes
lname: E458_HSR_LITHO_29092017_acQ
lname: E458ThinSections_17082017
Path: C:/shps/
stripped C:/shps/E458ThinSections_17082017.CPG
size of shp C:/shps/E458ThinSections_17082017.CPG is 5 Bytes
stripped C:/shps/E458ThinSections_17082017.dbf
size of shp C:/shps/E458ThinSections_17082017.dbf is 36.7 kB
stripped C:/shps/E458ThinSections_17082017.prj
size of shp C:/shps/E458ThinSections_17082017.prj is 409 Bytes
stripped C:/shps/E458ThinSections_17082017.sbn
size of shp C:/shps/E458ThinSections_17082017.sbn is 260 Bytes
stripped C:/shps/E458ThinSections_17082017.sbx
size of shp C:/shps/E458ThinSections_17082017.sbx is 124 Bytes
stripped C:/shps/E458ThinSections_17082017.shp
size of shp C:/shps/E458ThinSections_17082017.shp is 492 Bytes
stripped C:/shps/E458ThinSections_17082017.shp.NB-055.12412.16264.sr.lock
size of shp C:/shps/E458ThinSections_17082017.shp.NB-055.12412.16264.sr.lock is 0 Bytes
stripped C:/shps/E458ThinSections_17082017.shp.xml
size of shp C:/shps/E458ThinSections_17082017.shp.xml is 789 Bytes
stripped C:/shps/E458ThinSections_17082017.shx
size of shp C:/shps/E458ThinSections_17082017.shx is 212 Bytes
lname: PBT_FieldObservationPoints_24072017
Path: G:/08/E458/Data/Geology/Observation Points/
lname: PBT_FieldDescriptions_24072017
Path: G:/08/E458/Data/Geology/IntegratedGeol/PBT Maps/

so for the last two files PBT_FieldObservationPoints_24072017 and PBT_FieldDescriptions_24072017, I can see in the folder that there do exist shp files but for some reason it doesnt register them to get their size checked.

Would anyone be able to help with this?

Thanks in advance

0 Kudos
HenryColgate
Occasional Contributor
Nice bit of code.  Very tidy and I imagine it will work in most cases.  I slapped myself on the head when I saw it as I had envisaged doing it the hard way by finding out all the possible file extensions for each file type.  The benfit is it will also work with MapInfo files and quite a few others.  Thanks, I will definitely be using it

As you say it would not work for Feature Classes or Datasets but I wonder whether this would be feasible anyway as they are not so much individual entities that have a distinct size rather they are a part of the greater GDB. 

The only downfall is that it will not work on some of the imagery data I have which is used quite extensively.  Particularly ER Mapper format imagery files.  Even though they are in BIL format they do not have a file extension.  In this case a generic extension solution would not work.

Likewise, ER Mapper ERS files in general may not necessarily carry the same names as the file (e.g. ECW) they reference as the linkage is created internally to the ERS not by associated name.

Both of these have a pretty simple work arounds but looking at the bigger picture it could be a fair bit of time and effort to discover, work out and code the work arounds for all the data types available in ListDatasets.
0 Kudos
MarcNakleh
New Contributor III
Hi Henry,

Agreed on all points. To be honest, I'm not 100% sure whether the os or arcpy modules could do much of what you're wondering about.
My money is on the need to use more in-depth analysis modules (or the associated programs themselves) to find out this kind of metadata. I don't even know if these would be part of Python's standard library, though I can't say for sure. Your best bet would be to look around online for GIS modules developed in Python for Image Analysis or File management.

Cheers,
Marc
0 Kudos
TedKowal
Occasional Contributor III

I had a similiar issue but different workflow.....

I get Shapefiles within Zip containers so I use a slightly different technique....

How to know the folder size in a zipfile (Python) - Stack Overflow 

Ted

0 Kudos