Find adn Replace XML tags using Python

MikeMacRae · ‎06-29-2011

Hey everyone,

I am trying to write some python to find and replace values/variables in XML tags. In the XML attached below, you will see these variables embedded in a various tags and enclosed in % (ie %SITEDESCR%). I've used the following code to find and replace some values.

from xml.etree import ElementTree as et
import arcpy

BEGDATEPARAM = arcpy.GetParameterAsText(0)
ENDDATEPARAM = arcpy.GetParameterAsText(1)



tree = et.parse(r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\ECOSITEPHASE_R.xml")

tree.find('.//begdate').text = BEGDATEPARAM
tree.find('.//enddate').text = ENDDATEPARAM

tree.write(r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\ECOSITEPHASE_R_1.xml")

There are a couple issues I am coming up with that I can't seem to solve:

1. Replacing the whole value in a tag is easy because that's what tree.find('.//begdate').text = BEGDATEPARAM will do, but if I have a paragraph of words in a tag and I only want to replace one word (variable) in it, I'm not sure how to make that happen. Example is in the <abstract> tag, where there is a sentence with a variable called %SITEDESCR%. I just want to replace the variable, not the whole sentence.
2. The second issue I am having is trying to replace values in tags where the tag and tag tree path is identicle. The structure at the bottom of the XML where you see 4 tags called <placekey>, I want to replace the value in the first placekey tag, but I want the values of the remaining 4 tags to remain the same. I can't seem to figure out how to select that first tag and ignore the rest.

[HTML]<metadata>
<idinfo>
    <citation>
      <citeinfo>
           <origin>My Company</origin>
           <pubdate>05/04/2009</pubdate>
           <title>POLYGONS</title>
           <geoform>vector digital data</geoform>
           <onlink>\\ArcGISDevelopment\2009 Geodatabase\PDA_STD_05_25_2009.gdb</onlink>
       </citeinfo>
    </citation>
<descript>
       <abstract>This dataset represents the mapped polygons developed from the field data for the %SITEDESCR%.</abstract>
       <purpose>This dataset was created to accompany the clients Pre-Disturbance Assessment and Conservation and Reclamation Plan.</purpose>
   </descript>
<timeperd>
     <timeinfo>
      <rngdates>
          <begdate>%begdate%</begdate>
          <begtime>unknown</begtime>
          <enddate>%enddate%</enddate>
          <endtime>unknown</endtime>
        </rngdates>
      </timeinfo>
      <current>ground condition</current>
    </timeperd>
<status>
      <progress>Complete</progress>
      <update>None planned</update>
   </status>
<spdom>
    <bounding>
        <westbc> -110.541628</westbc>
        <eastbc> -110.535713</eastbc>
        <northbc> 54.643033</northbc>
        <southbc> 54.639832</southbc>
      </bounding>
   </spdom>
<keywords>
    <theme>
        <themekt>POLYGONS</themekt>
        <themekey>Polygon</themekey>
     </theme>
    <place>
        <placekey>%SITEID%</placekey>
        <placekey>Alberta</placekey>
        <placekey>Canada</placekey>
        <placekey>North America</placekey>
     </place>
[/HTML]

DarrenWiens2 · ‎06-29-2011

1.) Since tree.find('.//abstract').text is a string, I think you should be able to use the usual replace method. (totally untested)

tree.find('.//abstract').text = tree.find('.//abstract').text.replace("%SITEDESCR%", thesubvalue)

2.) I think you want to use the getchildren() method to create a list of the subelements under the <place> tag. Then, you can reference the children by the new index number. (again, untested - you can find lots of great Elementtree methods here.)