List Broken Data Source's Path

LauraMiles1 · ‎08-07-2013

Hi everyone, I'm a very amateur Python user. I've managed to get the following, which I pieced together from here and there, working for the most part. I need to create a csv file which has a column showing the path to the data source which is broken. Everything works fine and dandy if I take out the dataSource part; when I add it in, it will run for a while and then fail. It seems to be getting tripped up on some mxd or data source it doesn't like perhaps? I have no clue. Here's my code:

import arcpy, os
path = r"H:\Plans\GIS Plans\2003"
f = open('BrokenMXD2003.csv', 'w')
f.write("Type, File Path, Layer, Broken Path" + "\n")
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                lyrList = arcpy.mapping.ListLayers(mxd)
                f.write("MXD, " + fullPath + ", " + brknItem.name)
                if brknItem.supports("dataSource"):
                    f.write(", " + brknItem.dataSource + "\n")
                else:
                    f.write("\n")

f.close()

print "Script Completed"

And here's the error I get:

Traceback (most recent call last):
File "X:\Documents\Working Files\Broken Data Sources\ListBrokenMXD.py", line 11, in <module>
    brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\utils.py", line 181, in fn_
    return fn(*args, **kw)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\mapping.py", line 1465, in ListBrokenDataSources
    result = mixins.MapDocumentMixin(map_document_or_layer).listBrokenDataSources()
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 832, in listBrokenDataSources
    broken_sources = [l for l in self.layers if not l._arc_object.valid]
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 683, in layers
    for frame in reversed(self.dataFrames):
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 695, in dataFrames
    return map(convertArcObjectToPythonObject, self.pageLayout.dataFrames)
AttributeError: 'NoneType' object has no attribute 'dataFrames'

It does run and create several lines of output, but this error appears when it gets near to the end of the mxd's in the directory. I haven't a clue what the problem could be, as I said I'm quite amateur. If anyone can see what it is, I'd be greatly appreciative. Thank you!

StacyRendall1 · ‎08-08-2013

There are a few other problems with your code:

You use an updatecursor when only searching getting data out, try a searchcursor (see below)

Your datelist is not used.

The line if i == idlist: makes no sense, and will always evaluate to False (i.e. the stuff below it isn't happening).

You attempt to do a new selection each time, rather than adding to the selection.

You don't export the selected results.

This should work, but I do not know about the SQL query (Richard's answer looks pretty good...):

import arcpy
from arcpy import env
env.workspace=r"Z:\GIS_Data\gdb\GIS_Test.gdb"

fc="COA_FH_Inspections"
cursor = arcpy.SearchCursor(fc)
idlist = []

for row in cursor:
    idlist.append(row.getValue('HydrID'))

del row, cursor

print idlist
idunique = set(idlist)
print idunique

#Make a layer from the feature class
arcpy.MakeFeatureLayer_management(fc, "lyr")

for ID in idunique:
    arcpy.SelectLayerByAttribute_management("lyr", "ADD_TO_SELECTION",'"TestDate" = (SELECT MAX("TestDate") FROM "lyr") WHERE "HydrID" = %s' % ID)

# EXPORT RESULTS TO ANOTHER FEATURE CLASS
#, i.e. CopyFeatures_management

If the SQL query will not work, and cannot be fixed, I would suggest a solution using dictionaries (main key HydrID, storing OID and date within lists for each Hydrant) and the Python datetime module. Pandas is overkill for this problem. I can provide more information if required.

RichardFairhurst · ‎08-08-2013

This won't work, since "lyr" is not the underlying feature class name.

arcpy.SelectLayerByAttribute_management("lyr", "ADD_TO_SELECTION",'"TestDate" = (SELECT MAX("TestDate") FROM "lyr") WHERE "HydrID" = %s' % ID)

This may work (but I don't know the Pythonic way to do the fc name substitution:

arcpy.SelectLayerByAttribute_management("lyr", "ADD_TO_SELECTION",'"HydrID" = %s And "TestDate" = (SELECT MAX("TestDate") FROM ' + fc + ') WHERE "HydrID" = %s' % ID)

How do you obtain the underlying feature class name of the layer from the fc variable? In any case, that is what must be after the FROM, the underlying FC name. You also much select the HydrID outside of the subquery, because the subquery will just return a date without any ID association.

StacyRendall1 · ‎08-08-2013

This may work (but I don't know the Pythonic way to do the fc name substitution):

arcpy.SelectLayerByAttribute_management("lyr", "ADD_TO_SELECTION",'"HydrID" = %s And "TestDate" = (SELECT MAX("TestDate") FROM ' + fc + ') WHERE "HydrID" = %s' % ID)

I think this should do it, provided that the query structure is OK:

arcpy.SelectLayerByAttribute_management("lyr", "ADD_TO_SELECTION",'"HydrID" = %s And "TestDate" = (SELECT MAX("TestDate") FROM %s) WHERE "HydrID" = %s' % (ID, fc, ID))

I will do a little testing and see if it works...

StacyRendall1 · ‎08-08-2013

OK. I did some testing, and here below is the final code which should work. Remember that you will still need to do something with the selected values, such as exporting to a new feature class...

import arcpy
from arcpy import env
env.workspace=r"Z:\GIS_Data\gdb\GIS_Test.gdb"

fc="COA_FH_Inspections"
cursor = arcpy.SearchCursor(fc)
idlist = []

for row in cursor:
    idlist.append(row.getValue('HydrID'))

del row, cursor

print idlist
idunique = set(idlist)
print idunique

#Make a layer from the feature class
arcpy.MakeFeatureLayer_management(fc, 'lyr')

for ID in idunique:
    arcpy.SelectLayerByAttribute_management('lyr', 'ADD_TO_SELECTION', '"HydrID" = %s AND "TestDate" = (SELECT MAX("TestDate") FROM %s WHERE "HydrID" = %s)' % (ID, fc, ID))

# EXPORT RESULTS TO ANOTHER FEATURE CLASS
#, i.e. CopyFeatures_management

This will only work if the HydrID values are integers (Short or Long). If they are text you will need to add single quotes and escape them, like so:

    arcpy.SelectLayerByAttribute_management('lyr', "ADD_TO_SELECTION", '"HydrID" = \'%s\' AND "TestDate" = (SELECT MAX("TestDate") FROM %s WHERE "HydrID" = \'%s\')' % (IDval, fc, IDval))

Note that the code in my earlier reply had the closing parenthesis in the incorrect place, it had:

arcpy.SelectLayerByAttribute_management('lyr', 'ADD_TO_SELECTION','"HydrID" = %s And "TestDate" = (SELECT MAX("TestDate") FROM %s) WHERE "HydrID" = %s' % (ID, fc, ID))

rather than:

arcpy.SelectLayerByAttribute_management('lyr', 'ADD_TO_SELECTION', '"HydrID" = %s AND "TestDate" = (SELECT MAX("TestDate") FROM %s WHERE "HydrID" = %s)' % (ID, fc, ID))

StacyRendall1 · ‎08-08-2013

The thing making this difficult is where the failure occurs (are you absolutely sure that it hasn't changed location from your original post?). The problem is that the WorkSpace or DataSource, or whatever, should not really affect the ListBrokenDataSources of a future iteration.

If you are slowly building your code up, please show us the most built up code that does work, and the one with the simplest addition on top of that which fails. This might help isolate the error.

Given the apparent failure on a future iteration, you could try deleting leftover variables each iteration. It is a long shot, but it's the only thing I can think of... You will have more things to clean up than this, but here is a simple example (just make sure not to delete anything not updated at the indentation level or you will get a NameError):

import arcpy, os
path = r"H:\Plans\GIS Plans\2003"
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            print mxd
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                print brknItem.name

            del fullPath, mxd, brknMXD
        del basename, extension

print "Script Completed"

MattSayler · ‎08-09-2013

Looks like two threads were merged? Not seeing a reason.

LauraMiles1 · ‎08-09-2013

Hi Stacy, after some more testing I've found some things are causing issues: empty data frames, and standalone tables. Both cause the script to throw an error. I am certain that in the one folder I was doing testing on, the script was inconsistently stopping at different places. It's very strange indeed. This folder had an .mxd with no data/layers on it - once I found and deleted that, my script runs beautifully.

So my issue now is how to test for empty data frames and standalone tables, and to skip over those items.

LauraMiles1 · ‎08-09-2013

Another update: after more testing, I think it must have been a corrupted .mxd that happened to have an empty data frame that was causing the issue. I do still have trouble with standalone tables though. The error seems to suggest that you can't even test on this data type to see if it supports the dataSource property:

Traceback (most recent call last):
File "X:\Documents\Working Files\Broken Data Sources\ListBrokenDataSources.py", line 24, in <module>
if brknItem.supports("dataSource"):
AttributeError: 'TableView' object has no attribute 'supports'

LaurenYee · ‎08-09-2013

How many mxds and folders are you looping through?

I would try and start replacing the broken datasources...and then leave the remaining mxds where it fails alone and look at these indvidiually.
Unless of course there is a large number of mxds, then refine the code!

Perhaps there is a way to cycle through the TOC and only check datasources for those layers that are say: shapefiles or .sdes?
And ignore all the tables. Just a thought!

LauraMiles1 · ‎08-09-2013

Hi Lauren, there are a LOT of .mxd's to go through. In the 2003 folder alone, there are 300...and we have a folder for every year from 2001 until now. I can handle checking into a particular .mxd wherever it fails, if that's necessary, but ideally I'd like to set up a test to skip over any tableViews. If I take out the following code:

                if brknItem.supports("dataSource"):
                    f.write(", " + brknItem.dataSource + "\n")

And add writing the dataSource back in with the full path and item name, the script will write the paths for the tableView items fine. So I need to find a way to skip this test if the dataset is a tableView, but I don't know the syntax. Pseudocode would be something like:

If broken item is a tableView,
write dataSource
else
if brokenItem.supports("dataSource")
write dataSource

The issue now is the brokenItem.supports test throws an error if the dataset is a tableView. This makes sense, as a tableView does support dataSource so why would you test for it, right?