python & xtree: extracting information from xml

2431
6
07-11-2011 05:38 PM
AliceDeschamps1
New Contributor
I am trying to extract the associated text with the following xml tags:
<satellite>
<beamModeMnemonic>
<rawDataStartTime>

For now I am simply trying to print the value but eventually I would like to 'stuff' this text into variable for geoprocessing with arcpy.

Below is my attempt at coding this but obviously not working. I followed a simple example from this link (Sections: Traversing the Parsed Tree & Parsed Node Attributes):
http://blog.doughellmann.com/2010/03/pymotw-parsing-xml-documents-with.html

Any suggestions on how to do this simple task?

---------------------------------------------------------
from xml.etree import ElementTree
import arcpy, string, os

arcpy.env.worspace='C:/Alice/scripting/script'

with open('product.xml', 'rt') as f:
    tree = ElementTree.parse(f)
    
#this prints all the tags and attributes, needed to know if it was reading file=yes
for node in tree.getiterator():
    print node.tag, node.attrib    

#this give an error!!!
for path in [ './satellite', './beamModeMnemonic', './rawDataStartTime' ]:
    node = tree.find(path)
    print '   node text:', node.tag
    print '   node text:', node.text


The errors I get in PythonWIn are:
"print node.tag
AttributeError: 'NoneType' object has no attribute 'tag'" and
"print node.text
AttributeError: 'NoneType' object has no attribute 'text'




Snippet of xml file showing tags tags required:
---------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<product xmlns="http://www.rsi.ca/rs2/prod/xml/schemas" copyright="RADARSAT-2 Data and Products (c) MacDonald, Dettwiler and Associates Ltd., 2011 - All Rights Reserved." xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.rsi.ca/rs2/prod/xml/schemas schemas/rs2prod_product.xsd">
 <productId>PDS_01583330</productId>
 <documentIdentifier>RN-RP-51-2713, Issue 1/8</documentIdentifier>
 <sourceAttributes>
  <satellite>RADARSAT-2</satellite>
  <sensor>SAR</sensor>
  <inputDatasetId>/Fred/rsat2/166026P</inputDatasetId>
  <imageId>128973</imageId>
  <inputDatasetFacilityId>Not Specified</inputDatasetFacilityId>
  <beamModeId>110</beamModeId>
  <beamModeMnemonic>S6</beamModeMnemonic>
  <rawDataStartTime>2011-04-18T12:32:56.257634Z</rawDataStartTime>
  <
  <fullResolutionImageData pole="HV">imagery_HV.tif</fullResolutionImageData>
 </imageAttributes>
</product>

Tags (2)
0 Kudos
6 Replies
StacyRendall1
Occasional Contributor III
You would be best to ask the guy who made the example in the link; this has nothing to do with ArcGIS 🙂

However, just from looking at your xml and the similar example on his page I can see that your xml has a deeper level (? sorry, I don't know much about xml or the terminology). The error that Python is giving shows that node = tree.find(path) is returning None, probably indicating the path wasn't found (at that level...?). When you query the node, which is None, for .tag and .text it fails as None doesn't contain any information.

You could try looking through more of his examples, or ask the guy, how do examine deeper levels of the xml.
0 Kudos
Luke_Pinner
MVP Regular Contributor
1. Your example XML is invalid (I assume that's just an error snipping out the example, you have an unclosed "sourceAttributes" element which it looks like you're trying to close with an "imageAttributes" element and you have an open "<" in there as well.)

2. Your XPath expression will only search the first level. Use ".//" to recursively search.

3. Your xml declares a namespace, you need to use it. i.e.

xmlns='{http://www.rsi.ca/rs2/prod/xml/schemas}'
for path in [ './/%ssatellite', './/%sbeamModeMnemonic', './/%srawDataStartTime' ]:
    node = tree.find(path%xmlns)
    print path, node
0 Kudos
AliceDeschamps1
New Contributor
The snippet of xml is a subset showing the first 17 lines only (entire file = 2000lines).
I need to extract the text from specific tag for vector attribution.  

I need a simple example to get me going.  This link is a good GIS example but it's a bit more complicated than my current skill level, but I will persist.....
http://blogs.esri.com/Dev/blogs/geoprocessing/archive/2010/05/07/Handling-XML-with-Python-in-ArcGIS....
0 Kudos
Luke_Pinner
MVP Regular Contributor
Have you tried to use the code snippet from my previous post?

xmlns='{http://www.rsi.ca/rs2/prod/xml/schemas}'
for path in [ './/%ssatellite', './/%sbeamModeMnemonic', './/%srawDataStartTime' ]:
    node = tree.find(path%xmlns)
    print 'node.text:', node.text
    print 'node.tag:', node.tag

This prints:
node.text: RADARSAT-2
node.tag: {http://www.rsi.ca/rs2/prod/xml/schemas}satellite
node.text: S6
node.tag: {http://www.rsi.ca/rs2/prod/xml/schemas}beamModeMnemonic
node.text: 2011-04-18T12:32:56.257634Z
node.tag: {http://www.rsi.ca/rs2/prod/xml/schemas}rawDataStartTime
0 Kudos
AliceDeschamps1
New Contributor
Thanks Luke.   Yes your code snippet spits out the 3 text values that I need.  Below is a modified version.

So is this correct:
1) xmlns is the root? 
2) I need to specify the tag + root to get the text  like in this ('.//%ssatellite'%xmlns).   The .// level of tag nesting.  %s converts to string.  How does it concatenate the two?



from xml.etree import ElementTree
import arcpy, string, os
arcpy.env.overwriteOutput = True

arcpy.env.worspace='C:/scripting/script'

with open('product.xml', 'rt') as f:
    tree = ElementTree.parse(f)
root = tree.getroot()
print root

xmlns='{http://www.rsi.ca/rs2/prod/xml/schemas}'

item1=tree.find('.//%ssatellite'%xmlns).text
item2=tree.find('.//%sbeamModeMnemonic'%xmlns).text
item3=tree.find('.//%srawDataStartTime'%xmlns).text
print item1
print item2
print item3



This also works and spits out the same values. I find has simpler syntax but I needed your example to get it working!

from xml.etree import ElementTree
import arcpy, string, os
arcpy.env.overwriteOutput = True

arcpy.env.worspace='C:/scripting/script'

with open('product.xml', 'rt') as f:
    tree = ElementTree.parse(f)

root = tree.getroot()
print root
rootText='.//{http://www.rsi.ca/rs2/prod/xml/schemas}'

satellite=tree.find(rootText + 'satellite').text
beamMode=tree.find(rootText + 'beamModeMnemonic').text
timeUTC=tree.find(rootText + 'rawDataStartTime').text
print satellite
print beamMode
print timeUTC


Now let's see if I can manage to do something useful with the info.....
0 Kudos
Luke_Pinner
MVP Regular Contributor
So is this correct:
1) xmlns is the root? 
2) I need to specify the tag + root to get the text  like in this ('.//%ssatellite'%xmlns).   The .// level of tag nesting.  %s converts to string.  How does it concatenate the two?


1. No. xmlns is the namespace. The root is {namespace}product
2. If your XML specifies a namespace, you need to use {namespace}tag not just tag. % is a string operator, it is used to replace string format codes (e.g. %s) with one or more values. See http://docs.python.org/library/stdtypes.html#string-formatting for more info.
0 Kudos