Importing an XML file into Access using Python

1964
6
Jump to solution
02-02-2018 08:25 AM
DavidDenham
Occasional Contributor

I am working on a python application to download information from Waze that we receive in XML format.  I have the part of the code written to download the file and name it based on the time and date.  The problem that I have is automating the import of the XML file into a personal geodatabase.  I am using pypyodbc to establish the connection to the database and the cursor for the SQL statement to import the xml data.  The only example I have been able to find it for importing an Excel spreadsheet into Access.  

cnxn = pypyodbc.connect(    r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};' +    r'DBQ=C:\Users\Public\Database1.accdb')crsr = cnxn.cursor()crsr.execute(    r"SELECT * INTO fromExcel " +    r"FROM [Excel 12.0 Xml;HDR=YES;IMEX=2;ACCDB=YES;DATABASE=C:\Users\Public\xlData.xlsx].[Sheet1$]")crsr.commit()crsr.close()cnxn.close()

The part after "crsr.execute(" is where I am having problems with the syntax to set the name for the table.  I cannot find much information on this part. Has anyone else dealt with this issue?

Tags (3)
1 Solution

Accepted Solutions
RandyBurton
MVP Regular Contributor

Although these examples work with a file geodatabase, they may give you some ideas.

If the Waze xml uses the GPX format, you could use the GPX To Features tool.  I use this script to convert a number of GPX files to features:

import glob, os
import arcpy
import datetime

gpxPath = r"C:\Path\to\gpx\directory"
gdb = r'C:\Path\to\file.gdb'
arcpy.env.workspace = gdb

os.chdir(gpxPath)
print datetime.datetime.now()
for gpx in glob.glob("*.gpx"):
    gpxStr = filter(str.isdigit, gpx)
    gpxStr = 'GPX_'+gpxStr[2:8]+'_'+gpxStr[8:] # my filename included a date/time
    print gpxStr
    
    arcpy.GPXtoFeatures_conversion(Input_GPX_File=gpxPath+gpx, Output_Feature_class=gpxStr) 
    for i in range(0,arcpy.GetMessageCount()):
        print arcpy.GetMessage(i)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Many of my GPX files are created by the logging function built into the Bad Elf Pro GPS.  The XML file includes some additional extensions that the GPXtoFeatures tool will not import.  So I use the following to convert the XML into a feature, which uses steps outlined by James MacKay‌:

# import Bad Elf track from GPX xml file and save as file geodatabase feature

import xml.etree.ElementTree as ET
import re
import arcpy
from arcpy import env

# geodatabase and environment to use
geoDB = r"C:\Path\To\file.gdb"
env.workspace = geoDB

ns = { 'gpx': 'http://www.topografix.com/GPX/1/1',
       'elf': 'http://bad-elf.com/xmlschemas' }
# name for feature layer from xml
for trk in tree.findall('gpx:trk', ns):
    feature = trk.find('gpx:name', ns).text
feature = 'BE_'+' '.join(re.sub('[^0-9a-zA-Z ]', '', feature.strip()).split())

# create feature
print "\nCreating feature: " + feature
geometry_type = "POINT"
       
tree = ET.parse(r"C:\Path\To\bad-elf.gpx') # xml file

template = "#"
has_m = "DISABLED"
has_z = "DISABLED"
spatial_reference = arcpy.SpatialReference("WGS 1984")
arcpy.CreateFeatureclass_management(geoDB, feature, geometry_type, template, has_m, has_z, spatial_reference)

# add fields
arcpy.AddField_management(feature,'TimeString','TEXT','#','#','20','Date and Time','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'HDOP','DOUBLE','#','#','#','HDOP','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'Speed','DOUBLE','#','#','#','Speed','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'EleGPS','DOUBLE','#','#','#','GPS Elevation','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'EleBP','DOUBLE','#','#','#','BP Elevation','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'BP','DOUBLE','#','#','#','Barometric Pressure','Nullable','NON_REQUIRED')

# open an insert cursor
cursor = arcpy.da.InsertCursor(feature, ['TimeString', 'HDOP', 'Speed', 'EleGPS', 'EleBP', 'BP', 'SHAPE@XY'])

# read xml and insert into feature
for trkseg in tree.findall('gpx:trk', ns):
    for trkpt in trkseg.find('gpx:trkseg', ns):
        ts = trkpt.find('gpx:time', ns).text
        hd = float(trkpt.find('gpx:hdop', ns).text)
        sp = float(trkpt.find('gpx:extensions/elf:speed', ns).text)
        el = float(trkpt.find('gpx:ele', ns).text)
        be = float(trkpt.find('gpx:extensions/elf:baroEle', ns).text)
        bp = float(trkpt.find('gpx:extensions/elf:baroPress', ns).text)
        row = (ts, hd, sp, el, be, bp, (float(trkpt.attrib['lon']), float(trkpt.attrib['lat'])))
        print row
        cursor.insertRow(row)

#delete the cursor to finish
del cursor
print "Done."‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Here's a sample of the Bad Elf XML used:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1"
     xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
     xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"
     xmlns:badelf="http://bad-elf.com/xmlschemas"
     version="1.1"
     creator="Bad Elf GPS Pro+ 2.1.44"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd http://bad-elf.com/xmlschemas http://bad-elf.com/xmlschemas/GpxExtensionsV1.xsd">
  <metadata>
    <extensions>
      <badelf:modelNickname>Bad Elf GPS Pro+</badelf:modelNickname>
      <badelf:modelSerialNumber>007</badelf:modelSerialNumber>
      <badelf:modelName>Bad Elf GPS Pro+</badelf:modelName>
      <badelf:modelNumber>BE-GPS-2300</badelf:modelNumber>
      <badelf:modelHardwareRevision>8.0.0</badelf:modelHardwareRevision>
      <badelf:modelFirmwareRevision>2.1.44</badelf:modelFirmwareRevision>
      <badelf:elevationSource>gps</badelf:elevationSource>
    </extensions>
  </metadata>
  <wpt lat="1.306545" lon="1.408652">
    <ele>95.8</ele>
    <time>2017-09-21T02:46:58Z</time>
    <name>START (2017-09-21T02:46:58Z)</name>
    </wpt>
  <trk>
    <name>2017-09-21T02:46:58Z</name>
    <trkseg>
      <trkpt lat="1.306545" lon="1.408652">
        <ele>95.8</ele>
        <time>2017-09-21T02:46:58Z</time>
        <hdop>1.2</hdop>
        <extensions>
          <badelf:speed>0.77</badelf:speed>
          <badelf:baroEle>74.10</badelf:baroEle>
          <badelf:baroPress>1007.31</badelf:baroPress>
        </extensions>
      </trkpt>
    </trkseg>
  </trk>
</gpx>‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Hope this helps.

View solution in original post

6 Replies
BruceHarold
Esri Regular Contributor

David, this is the sort of workflow Data Interoperability extension can tackle.  If that is an option for you and you can share some XML we can advise more.

0 Kudos
DavidDenham
Occasional Contributor

Unfortunately we do not have a license for the Data Interoperability extension and I do not think that we can justify that for a single project. I am under the impression that the data belongs to Waze I do not believe that I can share it with the public.  I will try to tell you what I can.  This file is xml version 1.0  and it is from the Waze GeoRSS feed.  When I import the data using the import xml file in Access it generates 2 table 1 called "channel" that contains information on the area covered by the data and another called "item" with contains the incident data with 15 text fields.  Let me know if there is some other information that would help.  

0 Kudos
RandyBurton
MVP Regular Contributor

Do you have python code to read the XML, or are you just interested in the ODBC connection portion?

0 Kudos
DavidDenham
Occasional Contributor

I want to automate the import of the XML file into a personal geodatabase so that I can generate a feature class from the XML file.  So I would appreciate any advise, but the connection is where I am getting stuck now.  I have not been able to find any on how to handle to connection when importing a XML file.  

0 Kudos
JamesMacKay3
Occasional Contributor

If you're planning on putting the XML into a feature class you should use geoprocessing tools like Create Feature Class and Add Field, then insert records using the arcpy.da.InsertCursor class.  If you create your schema through an ODBC connection it won't be a proper feature class as it won't be in the GDB system tables.

(Edit: Even if your schema already exists you should still be using the InsertCursor to create records rather than through pyODBC - you can easily insert shapes with an insert cursor, I imagine it's possible using pyODBC but I'm sure it's a lot more difficult, and there are other factors to consider like spatial indexes.)

RandyBurton
MVP Regular Contributor

Although these examples work with a file geodatabase, they may give you some ideas.

If the Waze xml uses the GPX format, you could use the GPX To Features tool.  I use this script to convert a number of GPX files to features:

import glob, os
import arcpy
import datetime

gpxPath = r"C:\Path\to\gpx\directory"
gdb = r'C:\Path\to\file.gdb'
arcpy.env.workspace = gdb

os.chdir(gpxPath)
print datetime.datetime.now()
for gpx in glob.glob("*.gpx"):
    gpxStr = filter(str.isdigit, gpx)
    gpxStr = 'GPX_'+gpxStr[2:8]+'_'+gpxStr[8:] # my filename included a date/time
    print gpxStr
    
    arcpy.GPXtoFeatures_conversion(Input_GPX_File=gpxPath+gpx, Output_Feature_class=gpxStr) 
    for i in range(0,arcpy.GetMessageCount()):
        print arcpy.GetMessage(i)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Many of my GPX files are created by the logging function built into the Bad Elf Pro GPS.  The XML file includes some additional extensions that the GPXtoFeatures tool will not import.  So I use the following to convert the XML into a feature, which uses steps outlined by James MacKay‌:

# import Bad Elf track from GPX xml file and save as file geodatabase feature

import xml.etree.ElementTree as ET
import re
import arcpy
from arcpy import env

# geodatabase and environment to use
geoDB = r"C:\Path\To\file.gdb"
env.workspace = geoDB

ns = { 'gpx': 'http://www.topografix.com/GPX/1/1',
       'elf': 'http://bad-elf.com/xmlschemas' }
# name for feature layer from xml
for trk in tree.findall('gpx:trk', ns):
    feature = trk.find('gpx:name', ns).text
feature = 'BE_'+' '.join(re.sub('[^0-9a-zA-Z ]', '', feature.strip()).split())

# create feature
print "\nCreating feature: " + feature
geometry_type = "POINT"
       
tree = ET.parse(r"C:\Path\To\bad-elf.gpx') # xml file

template = "#"
has_m = "DISABLED"
has_z = "DISABLED"
spatial_reference = arcpy.SpatialReference("WGS 1984")
arcpy.CreateFeatureclass_management(geoDB, feature, geometry_type, template, has_m, has_z, spatial_reference)

# add fields
arcpy.AddField_management(feature,'TimeString','TEXT','#','#','20','Date and Time','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'HDOP','DOUBLE','#','#','#','HDOP','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'Speed','DOUBLE','#','#','#','Speed','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'EleGPS','DOUBLE','#','#','#','GPS Elevation','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'EleBP','DOUBLE','#','#','#','BP Elevation','Nullable','NON_REQUIRED')
arcpy.AddField_management(feature,'BP','DOUBLE','#','#','#','Barometric Pressure','Nullable','NON_REQUIRED')

# open an insert cursor
cursor = arcpy.da.InsertCursor(feature, ['TimeString', 'HDOP', 'Speed', 'EleGPS', 'EleBP', 'BP', 'SHAPE@XY'])

# read xml and insert into feature
for trkseg in tree.findall('gpx:trk', ns):
    for trkpt in trkseg.find('gpx:trkseg', ns):
        ts = trkpt.find('gpx:time', ns).text
        hd = float(trkpt.find('gpx:hdop', ns).text)
        sp = float(trkpt.find('gpx:extensions/elf:speed', ns).text)
        el = float(trkpt.find('gpx:ele', ns).text)
        be = float(trkpt.find('gpx:extensions/elf:baroEle', ns).text)
        bp = float(trkpt.find('gpx:extensions/elf:baroPress', ns).text)
        row = (ts, hd, sp, el, be, bp, (float(trkpt.attrib['lon']), float(trkpt.attrib['lat'])))
        print row
        cursor.insertRow(row)

#delete the cursor to finish
del cursor
print "Done."‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Here's a sample of the Bad Elf XML used:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1"
     xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
     xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"
     xmlns:badelf="http://bad-elf.com/xmlschemas"
     version="1.1"
     creator="Bad Elf GPS Pro+ 2.1.44"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd http://bad-elf.com/xmlschemas http://bad-elf.com/xmlschemas/GpxExtensionsV1.xsd">
  <metadata>
    <extensions>
      <badelf:modelNickname>Bad Elf GPS Pro+</badelf:modelNickname>
      <badelf:modelSerialNumber>007</badelf:modelSerialNumber>
      <badelf:modelName>Bad Elf GPS Pro+</badelf:modelName>
      <badelf:modelNumber>BE-GPS-2300</badelf:modelNumber>
      <badelf:modelHardwareRevision>8.0.0</badelf:modelHardwareRevision>
      <badelf:modelFirmwareRevision>2.1.44</badelf:modelFirmwareRevision>
      <badelf:elevationSource>gps</badelf:elevationSource>
    </extensions>
  </metadata>
  <wpt lat="1.306545" lon="1.408652">
    <ele>95.8</ele>
    <time>2017-09-21T02:46:58Z</time>
    <name>START (2017-09-21T02:46:58Z)</name>
    </wpt>
  <trk>
    <name>2017-09-21T02:46:58Z</name>
    <trkseg>
      <trkpt lat="1.306545" lon="1.408652">
        <ele>95.8</ele>
        <time>2017-09-21T02:46:58Z</time>
        <hdop>1.2</hdop>
        <extensions>
          <badelf:speed>0.77</badelf:speed>
          <badelf:baroEle>74.10</badelf:baroEle>
          <badelf:baroPress>1007.31</badelf:baroPress>
        </extensions>
      </trkpt>
    </trkseg>
  </trk>
</gpx>‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Hope this helps.