using arcpy.da.walk to inventory data and export metadata to csv

5197
15
Jump to solution
11-30-2016 10:34 AM
ShannonGroff
New Contributor II

I'm a novice when it comes to arcpy and am trying to develop a script that will use arcpy.da.walk to inventory our GIS data. As it goes through the folders/gdbs of data that we have, I want it to export a few items to a csv for each feature class (for now I'd be happy with feature class path, filename, spatial reference name and metadata purpose). I've gotten the script to work up until the metadata purpose part. Once I add the lines:

arcpy.ExportMetadata_conversion(feature_class, translatorpath, xmlfile)   
tree = ElementTree()   
tree.parse(xmlfile)    
spot = tree.find("idinfo/descript/purpose")‍‍‍‍‍‍‍‍

my script does not return anything. Without those lines, I recieve a csv file with feature class path, filename, and spatial reference name, but if I include the lines my csv file is empty. No errors, just empty. My script (included below) is based off of: https://arcpy.wordpress.com/tag/os-walk/ and http://gis.stackexchange.com/questions/34729/creating-table-containing-all-filenames-and-possibly-me....

Any help is greatly appreciated!

EDITED: Some feature classes may not have a spatial reference defined, and many feature classes may not have any metadata associated. I still want these in the csv, but those fields can either be blank or say something along the lines of "No spatial reference defined" and "No metadata purpose defined".

import os
import arcpy
import csv
from xml.etree.ElementTree import ElementTree
from arcpy import env
 
def inventory_data(workspace, datatypes):
    """
    Generates full path names under a catalog tree for all requested
    datatype(s).
 
    Parameters:
    workspace: string
        The top-level workspace that will be used.
    datatypes: string | list | tuple
        Keyword(s) representing the desired datatypes. A single
        datatype can be expressed as a string, otherwise use
        a list or tuple. See arcpy.da.Walk documentation 
        for a full list.
    """
    for path, path_names, data_names in arcpy.da.Walk(
            workspace, datatype=datatypes):
        for data_name in data_names:
            yield os.path.join(path, data_name)

AGSHOME = arcpy.GetInstallInfo("Desktop")["InstallDir"]  
translatorpath = AGSHOME + "Metadata\\Translator\\ARCGIS2FGDC.xml"
outfile = "C:\\GIS\\Records\\Data Management\\Inventories\\GIS_Data_Inventory_daWalk_function_outputtocsv_descitems_try_sr_meta.csv"
xmlfile = "C:\\GIS\\Records\\Data Management\\Inventories\\TempInventoryError\\daWalk_function_outputtocsv_descitems_try_sr_meta.xml"

with open (outfile, 'wb') as csvfile:
    csvwriter = csv.writer(csvfile)
    for feature_class in inventory_data(r"C:\GIS\Data\Natural_Environment\Species_and_Habitats\Habitat_Models", "FeatureClass"):
        try:
            desc = arcpy.Describe(feature_class)
            sr = desc.spatialReference
            arcpy.ExportMetadata_conversion(feature_class, translatorpath, xmlfile)  
            tree = ElementTree()  
            tree.parse(xmlfile)   
            spot = tree.find("idinfo/descript/purpose")
            csvwriter.writerow([desc.path.encode('utf-8'), desc.file.encode('utf-8'), desc.dataType.encode('utf-8'), sr.name.encode('utf-8'), spot.text.encode('utf-8')])
        except:
            pass
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
Tags (3)
1 Solution

Accepted Solutions
ShannonGroff
New Contributor II

Well, I got it working. I tried inserting three if statements (1 for each metadata component), but it would only print my else value ("No Abstract") even if there was a value for the abstract. Instead, I inserted three try/except statements (again, 1 for each metadata component I was trying to grab) and it's working great! Thank you both for your help

The only missing piece as far as I can tell is a known issue with arcpy.da.Walk, which is that it does not list rasters within gdb. I'm not sure why that's the case, but it only lists fc within the gdb even though it will list rasters that are within a folder. Not sure how to get around that. In a perfect world I would include an if statement somehow (if gdb ...arcpy.ListRasters) but I don't know where to start on that.

#This script uses arcpy.da.walk to inventory the data contained within a folder. It only captures feature classes, shapefiles and raster bands.

#For each feature class, shapefile or raster band it exports to a csv: file path, data name, data type, spatial reference name, metadata purpose, metadata abstract, and metadata publication date
#For those feature classes that are missing any of the metadata items, it prints "No Purpose", "No Abstract" and/or "No Publication Date".

#A known issue with this script is that it does not include rasters contained with .gdb (this is an issue with arcpy.da.walk). It may be able to be fixed with an if statement
#(something along the lines of "If gdb...arcpy.ListRasters") but I have not tried to do that.

#This script was created with the help of https://arcpy.wordpress.com/2012/12/10/inventorying-data-a-new-approach/ and https://community.esri.com/thread/30556

#Three variables need to be updated each time this script is run on a new workspace...those variables are commented below (outfile, xmlfile and workspace when calling inventory_data).

#Created 12/1/2016

#Created by: Shannon Groff

import os
import arcpy
import csv
from xml.etree.ElementTree import ElementTree
from arcpy import env

arcpy.env.overwriteOutput = True
 
def inventory_data(workspace, datatypes):
    for path, path_names, data_names in arcpy.da.Walk(
            workspace, datatype=datatypes):
        for data_name in data_names:
            yield os.path.join(path, data_name)

AGSHOME = arcpy.GetInstallInfo("Desktop")["InstallDir"]  
translatorpath = AGSHOME + "Metadata\\Translator\\ARCGIS2FGDC.xml"

#These two variables need to be updated each time this script is run on a new workspace
outfile = r"C:\GIS\Records\Data Management\Inventories\GIS_Data_Inventory.csv"
xmlfile = r"C:\GIS\Records\Data Management\Inventories\Inventory\TempInventoryError\GIS_Data_Inventory.xml"

with open (outfile, 'wb') as csvfile:
    csvwriter = csv.writer(csvfile)
    #You need to update workspace in the line below
    for feature_class in inventory_data(r"C:\GIS\Data\Natural_Environment\Species_and_Habitats\Species_Data", "FeatureClass"):
        try:
            desc = arcpy.Describe(feature_class)
            sr = desc.spatialReference
            arcpy.ExportMetadata_conversion(feature_class, translatorpath, xmlfile)  
            tree = ElementTree()  
            tree.parse(xmlfile)   
            spot = tree.find("idinfo/descript/purpose")
            try:
                purpose = tree.find("idinfo/descript/purpose").text
            except:
                purpose = "No Purpose"
            try:
                abstract = tree.find ("idinfo/descript/abstract").text
            except:
                abstract = "No Abstract"
            try:
                pubdate = tree.find ("idinfo/citation/citeinfo/pubdate").text
            except:
                pubdate = "No Publication Date"
            csvwriter.writerow([desc.path.encode('utf-8'), desc.file.encode('utf-8'), desc.dataType.encode('utf-8'), sr.name.encode('utf-8'), purpose.encode('utf-8'), abstract.encode('utf-8'), pubdate.encode('utf-8')])
        except Exception:
            e = sys.exc_info()[1]
            print(e.args[0])
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

15 Replies
RebeccaStrauch__GISP
MVP Emeritus

Shannon, it would help if you edit your original post and format your code.  Easiest way to do that is, when editing, in Click on the ... in the toolbar, then select More and Syntax Highlighter.  Choose Python, then paste you code (the formatted code from your editing IDE).  That way it will be easier for people to comment.

Although I have not inventoried the metadata,   I have a python addin /blogs/myAlaskaGIS/2015/08/31/python-addin-for-data-inventory-and-broken-link-repair?sr=search&searc...‌  That you are welcome to look at and grab the .py files (instead of changing the .zip to .addin, you can just unzip it).

ShannonGroff
New Contributor II

Thanks, Rebecca. I have formatted my code as you've suggested. I appreciate the direction to your addin, but I think I have the inventory component covered. I'm getting what I need with my script (file path, name, spatial reference name) but just lacking the metadata. Hopefully someone can help out there. Thanks!

BlakeTerhune
MVP Regular Contributor

You have an except block but you just pass. Maybe you are getting an error but you are simply ignoring it. Try printing the error instead of just using pass.

except Exception as err:
    print err
ShannonGroff
New Contributor II

Good thought, Blake. I just tried this and am getting the main error:

Could not find a part of the path "C:\\GIS\\Records\\Data Management\\Inventories\\TempInventoryError\\daWalk_function_outputtocsv_descitems_try_sr_meta.xml".
Failed to execute (Export ArcGIS Metadata).
Failed to execute (Export Metadata). ‍‍‍‍

So, I guess it's not locating the .xml. I looked in the folder and indeed there is no .xml. Any thoughts on why this would occur?

EDIT: I used 

except Exception:
    e = sys.exc_info()[1]
    print(e.args[0])‍‍‍
BlakeTerhune
MVP Regular Contributor

Double check that translatorpath is valid.

RebeccaStrauch__GISP
MVP Emeritus

fyi I used

arcpy.Exists(translatorpath)

and it returned True, so she should be ok there.

RebeccaStrauch__GISP
MVP Emeritus

This would be a handy tool once it works.

I added Blake's suggestion and have a few more comments:

  • add print and/or arcpy.AddMessage statements to make sure the variables are what you expect.  I got   spot = "None" when looking at my FGDB
  • Set the workspace to a GDB instead of a folder
  • for testing, try simpler output paths (just a suggestion)

The error I was getting with m modifications is (one for each FC)

FC: C:\__Data1\_TalkeetnaBU\AllFCNad83.gdb\bl03_point
Failed to execute. Parameters are not valid.
ERROR 000725: Output File: Dataset c:___temp\descitmes.xml already exists.
WARNING 000725: Output File: Dataset c:___temp\descitmes.xml already exists.
Failed to execute (ExportMetadata).

My modified code (manly just modified the variable and paths and added print statements.  I use a

import os
import arcpy
import csv
from xml.etree.ElementTree import ElementTree
from arcpy import env


def inventory_data(workspace, datatypes):
     """
     Generates full path names under a catalog tree for all requested
     datatype(s).

     Parameters:
     workspace: string
         The top-level workspace that will be used.
     datatypes: string | list | tuple
         Keyword(s) representing the desired datatypes. A single
         datatype can be expressed as a string, otherwise use
         a list or tuple. See arcpy.da.Walk documentation 
         for a full list.
     """
     for path, path_names, data_names in arcpy.da.Walk(
          workspace, datatype=datatypes):
          for data_name in data_names:
               yield os.path.join(path, data_name)

AGSHOME = arcpy.GetInstallInfo("Desktop")["InstallDir"]  
translatorpath = AGSHOME + "Metadata\\Translator\\ARCGIS2FGDC.xml"
outfile =  r"c:___temp\descitems.csv" 
# "C:\\GIS\\Records\\Data Management\\Inventories\\GIS_Data_Inventory_daWalk_function_outputtocsv_descitems_try_sr_meta.csv"

xmlfile = r"c:___temp\descitems.xml" 
#"C:\\GIS\\Records\\Data Management\\Inventories\\TempInventoryError\\daWalk_function_outputtocsv_descitems_try_sr_meta.xml"
workspace = r"C:\__Data1\_TalkeetnaBU\AllFCNad83.gdb"

with open (outfile, 'wb') as csvfile:
     csvwriter = csv.writer(csvfile)
     for feature_class in inventory_data(workspace, "FeatureClass"):
          #r"C:\GIS\Data\Natural_Environment\Species_and_Habitats\Habitat_Models", "FeatureClass"):
          print("FC: {0}".format(feature_class))
          try:
               desc = arcpy.Describe(feature_class)
               sr = desc.spatialReference
               arcpy.ExportMetadata_conversion(feature_class, translatorpath, xmlfile)  
               tree = ElementTree()  
               tree.parse(xmlfile) 
               spot = tree.find("title") #("idinfo/descript/purpose")
               arcpy.AddMessage("spot: {0}".format(spot))
               arcpy.AddMessage("{0} {1} {2} {3} {4}".format([desc.path.encode('utf-8'), desc.file.encode('utf-8'), desc.dataType.encode('utf-8'), sr.name.encode('utf-8'), spot.text.encode('utf-8')]))
               print("spot: {0}".format(spot))
               print("{0} {1} {2} {3} {4}".format([desc.path.encode('utf-8'), desc.file.encode('utf-8'), desc.dataType.encode('utf-8'), sr.name.encode('utf-8'), spot.text.encode('utf-8')]))               
               
               csvwriter.writerow([desc.path.encode('utf-8'), desc.file.encode('utf-8'), desc.dataType.encode('utf-8'), sr.name.encode('utf-8'), spot.text.encode('utf-8')])
          except Exception as err:
               print err
          

I use a slightly different method to write my .csv files  (fwiw)

# .....bunch of stuff before
# Create new output name tagged with YYYYMMDD_HHMM
outfileTXT = os.path.join(theWorkspace, ("{0}{1}.txt".format(outFile, fileDateTime))) 
outFileCSV = os.path.join(theWorkspace, ("{0}{1}.csv".format(outFile, fileDateTime))) 
outFileXLS = os.path.join(theWorkspace, ("{0}{1}.xls".format(outFile, fileDateTime)))
myMsgs("{0}, {1}".format(theWorkspace, outfileTXT))   #(theWorkspace + ", " + outfileTXT)
reportFile = open(outfileTXT, 'w')
csvFile = open(outFileCSV, 'w')
myMsgs(  "File {0} is open? {1}".format(outfileTXT, str(not reportFile.closed)))
myMsgs(  "File {0} is open? {1}".format(str(outFileCSV), str(not csvFile.closed)))
myMsgs("Writing the report to: " + outfileTXT + " and " + outFileCSV)

outText = "List of all GIS data in " + theWorkspace + " on " + currentDate + '\n'
outText += "  Includes coverages (pts, poly, arc, anno), shapes, and FGDB data." + '\n'
outText += "-----------------------------------------------------" + '\n'

reportFile.write(outText)
csvFile.write("FType, FCname, FullPath, recCount\n")

#do a bunch fo stuff...but make sure to close the file
reportFile.close()
csvFile.close()

# ...bunch of stuff after
ShannonGroff
New Contributor II

Hi Blake & Rebecca: Thanks so much for your help with troubleshooting. I followed your suggestion Blake: the translator path was correct, but my xmlfile path was incorrect (hence the "Could not find part of the path" error I was receiving). Once I fixed that, I ran into the same error that you got Rebecca - "Dataset .....xml already exists". So, I added env.overwriteOutput = True and it's working (partially). It is exporting the file path, name, spatial reference name and metadata purpose but only for those datasets who have a metadata purpose. Many of our datasets do not and it looks like this script is skipping those. Which is not really helpful because part of the purpose of this exercise is to get a handle on just how many datasets need metadata! It looks like the error I'm getting when the script hits a dataset that doesn't have metadata is: 'NoneType' object has no attribute 'text'. 

So, I guess now I need to figure out how to include all datasets in the csv, regardless of whether they have metadata. I'm thinking I need an if statement (if metadata exists, write that, else write "None"). The other piece of this puzzle is that I want to include metadata abstract and metadata pub date (included in the script below as spot2 and spot3 but commented out) and I'm guessing the same issue with skipping will arise there. If a dataset does not have these items, I just want it to write "None" (or leave blank, whatever is easier) to the csv instead of skipping. Help? Thanks!  

#Modified from - https://arcpy.wordpress.com/2012/12/10/inventorying-data-a-new-approach/
import os
import arcpy
import csv
from xml.etree.ElementTree import ElementTree
from arcpy import env

arcpy.env.overwriteOutput = True
 
def inventory_data(workspace, datatypes):
    for path, path_names, data_names in arcpy.da.Walk(
            workspace, datatype=datatypes):
        for data_name in data_names:
            yield os.path.join(path, data_name)

AGSHOME = arcpy.GetInstallInfo("Desktop")["InstallDir"]  
translatorpath = AGSHOME + "Metadata\\Translator\\ARCGIS2FGDC.xml"
outfile = r"C:\GIS\Records\Data Management\Inventories\GIS_Data_Inventory_daWalk_meta_SpeciesData.csv"
xmlfile = r"C:\GIS\Records\Data Management\Inventories\Inventory\TempInventoryError\daWalk_meta_SpeciesData.xml"

with open (outfile, 'wb') as csvfile:
    csvwriter = csv.writer(csvfile)
    for feature_class in inventory_data(r"C:\GIS\Data\Natural_Environment\Species_and_Habitats\Species_Data", "FeatureClass"):
        try:
            desc = arcpy.Describe(feature_class)
            sr = desc.spatialReference
            arcpy.ExportMetadata_conversion(feature_class, translatorpath, xmlfile)  
            tree = ElementTree()  
            tree.parse(xmlfile)   
            spot = tree.find("idinfo/descript/purpose")
            #spot2 = tree.find ("idinfo/descript/abstract")
            #spot3 = tree.find ("idinfo/citation/citeinfo/pubdate")
            csvwriter.writerow([desc.path.encode('utf-8'), desc.file.encode('utf-8'), desc.dataType.encode('utf-8'), sr.name.encode('utf-8'), spot.text.encode('utf-8')])
        except Exception:
            e = sys.exc_info()[1]
            print(e.args[0])
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
BlakeTerhune
MVP Regular Contributor
I'm thinking I need an if statement (if metadata exists, write that, else write "None").

That's where I would start.