Batch processing metadata

1367
4
08-31-2018 06:15 AM
JustinBridwell2
Occasional Contributor II

I am trying to write a Python script (or find a tool) to batch process metadata for a raster dataset. I have a folder with about 300 .tff files. I want to be able to go into ArcCatalog and see the standard (or FGDC) item description metadata for each of these files. 

Without creating a literal .xml file for each .tiff file, I am currently trying out the arcpy_metadata module (for Python 2.7 only right now). Is there a simpler way to do this using a tool? I've seen more complicated scripts using the lxml library. Would it better to go that route? 

0 Kudos
4 Replies
curtvprice
MVP Esteemed Contributor

Wow, arcpy_metadata looks really useful.  If it works, that IS the simpler way to do it, though it may be a bit slow. (The readme on arcpy_metadata suggests to me it is using the 32-bit Metadata tools in ArcMap's x32 arcpy to dump to an XML file and parse it with Python. [UPDATE: I realized .tif metadata is in a .tif.xml file so maybe no export/import is needed.]

The main interface to access metadata in a granular way in ArcMap is the .NET Metadata Toolkit, which I think is more complexity than it sounds like you are interested in.

It would be great if we had arcpy methods to read and modify the basic item/description metadata! Even some Describe methods that read the item/description properties would be pretty great. I can dream, can't I?

You were not specific about what kind of "batch" processing you are trying to do. Change data values? Dump a text report? The answer may be to build a model or python workflow using the XML Transformation  tool with other metadata tools, though writing a transform is not a simple process and takes some non-trivial XML skills. There are some nice examples in the help to get you started if you want to go that route.

Edit metadata for many ArcGIS items—Help | ArcGIS Desktop 

RandyBurton
MVP Regular Contributor

Following up on Curtis Price‌'s .tif.xml file tip, something like the following might let you process the metadata:

import xml.etree.ElementTree as ET
import glob, os, re

os.chdir(r'C:\Tif\Directory\Path)
for filename in glob.glob('*.tif.xml'):
    # print(filename)

    tree = ET.parse(filename)

    for info in tree.findall('dataIdInfo'):
        
        file = [info.find('idCitation/resTitle').text if info.find('idCitation/resTitle') is not None else ''][0] # file
        purpose = [info.find('idPurp').text if info.find('idPurp') is not None else ''][0] # purpose

        abstract = [info.find('idAbs').text if info.find('idAbs') is not None else '<span>'][0]
        # abstract is inside span tag, this removes the html tags, leaving any text     
        cleanr = re.compile('<.*?>')
        abstract = re.sub(cleanr, '', abstract)
        
    keywords = []
    for kw in tree.findall('dataIdInfo/searchKeys/keyword'):
        keywords.append(kw.text) # keywords

    print 'File: {} \tPurpose: {} \tAbstract: {} \tKeywords: {}'.format(file, purpose, abstract,', '.join(keywords))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

You will need to adjust the code to pick up the desired xml tags.

JustinBridwell2
Occasional Contributor II

Hey Guys,

     I was actually able to get this working using the arcpy_metadata mod/library. It's super simple, but has some limitations. 1) It only runs with Python 2.7, 2) It's 32-bit, 3) It can't add spatial reference or extent elements (also some of the contact data is a little wonky). Other than that, I made some loops to iterate through the target folder, some conditionals to handle for different file types (only works on features, .lyrs, .shp, and raster datasets). Handling some of these file types is a little difficult but for limited use, it works pretty good. 

0 Kudos
NeilFordyce
New Contributor III

I use arcpy_metadata to bulk update ISO 19139 format metadata for ArcGIS 10.4-10.7

It works well for the basic fields that you may need in a company. You cannot update every metadata entry with it currently.

My files tend to follow a naming convention where underscores separate the name components e.g. topo_NSW_Lidar_2015_AreaA_05m_DEM.img so my code separates the name into keyword tags.

It's very easy to update just one or two of the fields where they are all the same info e.g. the place keywords or the processing keywords.


I generate a file of Descriptions and supplemental information in Excel and read through that to update individual files where they need something different from the en masse processing

It's unbelievable that ESRI have not built in the ability to work with metadata in Python as metadata is the greatest chore in GIS management.

I personally wouldn't waste my time on etree as arcpy_metadata has already dealt with most of the issues.

Neil