List Broken Data Source's Path

LauraMiles1 · ‎08-07-2013

Hi everyone, I'm a very amateur Python user. I've managed to get the following, which I pieced together from here and there, working for the most part. I need to create a csv file which has a column showing the path to the data source which is broken. Everything works fine and dandy if I take out the dataSource part; when I add it in, it will run for a while and then fail. It seems to be getting tripped up on some mxd or data source it doesn't like perhaps? I have no clue. Here's my code:

import arcpy, os
path = r"H:\Plans\GIS Plans\2003"
f = open('BrokenMXD2003.csv', 'w')
f.write("Type, File Path, Layer, Broken Path" + "\n")
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                lyrList = arcpy.mapping.ListLayers(mxd)
                f.write("MXD, " + fullPath + ", " + brknItem.name)
                if brknItem.supports("dataSource"):
                    f.write(", " + brknItem.dataSource + "\n")
                else:
                    f.write("\n")

f.close()

print "Script Completed"

And here's the error I get:

Traceback (most recent call last):
File "X:\Documents\Working Files\Broken Data Sources\ListBrokenMXD.py", line 11, in <module>
    brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\utils.py", line 181, in fn_
    return fn(*args, **kw)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\mapping.py", line 1465, in ListBrokenDataSources
    result = mixins.MapDocumentMixin(map_document_or_layer).listBrokenDataSources()
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 832, in listBrokenDataSources
    broken_sources = [l for l in self.layers if not l._arc_object.valid]
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 683, in layers
    for frame in reversed(self.dataFrames):
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 695, in dataFrames
    return map(convertArcObjectToPythonObject, self.pageLayout.dataFrames)
AttributeError: 'NoneType' object has no attribute 'dataFrames'

It does run and create several lines of output, but this error appears when it gets near to the end of the mxd's in the directory. I haven't a clue what the problem could be, as I said I'm quite amateur. If anyone can see what it is, I'd be greatly appreciative. Thank you!

LaurenYee · ‎08-07-2013

Instead of populating an excel spreadsheet this will show the results in IDLE:

import arcpy, os

path = r"C:\Test"

for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                print "MXD: " +fileName
                if brknItem.supports("workspacePath"):
                        source = brknItem.workspacePath
                        print  str(brknItem) + ": " +  source
                else:
                    print "Layer does not support source"
             
print "Completed"

StacyRendall1 · ‎08-07-2013

File "X:\Documents\Working Files\Broken Data Sources\ListBrokenMXD.py", line 11, in <module>
brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\utils.py", line 181, in fn_
return fn(*args, **kw)
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\mapping.py", line 1465, in ListBrokenDataSources
result = mixins.MapDocumentMixin(map_document_or_layer).listBrokenDataSources()
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 832, in listBrokenDataSources
broken_sources = [l for l in self.layers if not l._arc_object.valid]
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 683, in layers
for frame in reversed(self.dataFrames):
File "D:\Program Files (x86)\ArcGIS\Desktop10.2\arcpy\arcpy\arcobjects\mixins.py", line 695, in dataFrames
return map(convertArcObjectToPythonObject, self.pageLayout.dataFrames)
AttributeError: 'NoneType' object has no attribute 'dataFrames'

The traceback (error report) is indicating that the problem is within the Arcpy libraries, but is caused by line 11 of your script, brknMXD = arcpy.mapping.ListBrokenDataSources(mxd). I would guess that at some point you are passing an invalid mxd path...?

Try printing the mxd path each time (change is shown in red):

import arcpy, os
path = r"H:\Plans\GIS Plans\2003"
f = open('BrokenMXD2003.csv', 'w')
f.write("Type, File Path, Layer, Broken Path" + "\n")
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            arcpy.AddMessage(mxd)
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                lyrList = arcpy.mapping.ListLayers(mxd)
                f.write("MXD, " + fullPath + ", " + brknItem.name)
                if brknItem.supports("dataSource"):
                    f.write(", " + brknItem.dataSource + "\n")
                else:
                    f.write("\n")

f.close()

print "Script Completed"

LauraMiles1 · ‎08-07-2013

Hi Lauren, thanks for your answer. I'm actually successful at creating and populating the csv file, it just hangs up near the end of cycling through all the mxd's and doesn't finish the job. It puts out a csv file just fine, with the majority of the broken links listed, it just doesn't make it through every mxd in the folder and I'm not sure what the error I'm being given means.

I notice you used "workspacePath" rather than "dataSource". I tested out your code in IDLE and it seems to be providing the same information as dataSource. I changed dataSource to workspacePath in my code and tested that out. Both using your code as a straight copy and paste into IDLE, and changing dataSource to workspacePath in my own code, I'm getting the exact same error as I have been. Oddly, it does NOT stop at the same .mxd in both scenarios.

Any clues out there???

LauraMiles1 · ‎08-07-2013

Hi Stacy, I added this one line but it seems to have had no effect. I'm not familiar with addMessage; should there have been a popup or something? You mentioned printing the mxd path but I don't see where the results of addMessage should be added to my csv file or printed in IDLE?

I've checked the .mxd's that aren't getting looped through; they open fine and seem to be "valid" so far as I can tell.

StacyRendall1 · ‎08-07-2013

Hi Stacy, I added this one line but it seems to have had no effect. I'm not familiar with addMessage; should there have been a popup or something? You mentioned printing the mxd path but I don't see where the results of addMessage should be added to my csv file or printed in IDLE?

I've checked the .mxd's that aren't getting looped through; they open fine and seem to be "valid" so far as I can tell.

Hey. This is kind of a long answer; first part attempts answer your questions, second part provides some general advice about what I think you could do to work out the problem and (hopefully) fix it.

AddMessage is kind of like print, except it will also work if you are running your code as a script tool within ArcMap (it will print to the little progress window). In your case it should simply print to the console output in IDLE, just like print "Script Completed" does. If AddMessage is not working, you could try just making that line print mxd.

I didn't do a great job of describing what this will do, sorry. Having the print (or AddMessage) statement just before the crash will let you check which mxd is causing the crash. So your code will run, it will print each mxd that it analyses, then it will crash. The last mxd listed before the crash is the one potentially causing the problem. Then you can open it in ArcMap and do some other investigation on that particular file...

My guess is still that at some point this statement mxd = arcpy.mapping.MapDocument(fullPath) is returning something bad, and this is causing the next functional statement (arcpy.mapping.ListBrokenDataSources(mxd)) to fail.

Lauren's answer doesn't change anything from before the crash (apart from setting up the CSV), so it is interesting that it causes a crash on a different mxd. Does your original code always crash at the same place?

In a situation like this I would always recommend doing the simplest possible thing first, then isolate and fix any errors, then add more complexity once it works. For all we know the crash could be due to some strange conflict between Arc and the open file object (f), but to find this out we need to clearly build the code up step by step. So, perhaps just print to screen for now, get rid of everything related to the CSV and everything after your crash:

import arcpy, os
path = r"H:\Plans\GIS Plans\2003"
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            print mxd
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)

print "Script Completed"

This should cause the crash, and the last printed mxd name should show the mxd causing the error. Once this is resolved, add the next layer of complexity, printing to the screen:

import arcpy, os
path = r"H:\Plans\GIS Plans\2003"
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            print mxd
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                print brknItem.name

print "Script Completed"

You will want to make sure that you have some mxds in your test set that you have designed with broken data sources, to ensure your code works for those.

If this runs OK, add the dataSource or workspacePath check (I don't know what either of these does, sorry...), printing them out. If everything checks out OK printing to screen, finally add the CSV output.

I also noticed that this line in your original post doesn't do anything lyrList = arcpy.mapping.ListLayers(mxd); I would recommend removing it if unused, at best it will slow things down, but may cause other problems too.

Hope this helps!

LaurenYee · ‎08-08-2013

Hi Lauren, thanks for your answer. I'm actually successful at creating and populating the csv file, it just hangs up near the end of cycling through all the mxd's and doesn't finish the job. It puts out a csv file just fine, with the majority of the broken links listed, it just doesn't make it through every mxd in the folder and I'm not sure what the error I'm being given means.

I notice you used "workspacePath" rather than "dataSource". I tested out your code in IDLE and it seems to be providing the same information as dataSource. I changed dataSource to workspacePath in my code and tested that out. Both using your code as a straight copy and paste into IDLE, and changing dataSource to workspacePath in my own code, I'm getting the exact same error as I have been. Oddly, it does NOT stop at the same .mxd in both scenarios.

Any clues out there???

Yes, in my code where I check datasources I've had more success using workspacePath... I wish I could tell you WHY that was, I'll try and find a resource. I think it may have been because most of the paths I needed to replace referenced SDE databases(?)

Anyway, I would follow the advice on this thread and start with a smaller script to cycle through your mxds and see which one is the culprit.

How many mxds are you cycling through? - also will you be replacing these datasources with something new?

Edit, I believe this is why I used workspacePath instead . If you were to replace one path for another, it is easier to use workspacePath!

LauraMiles1 · ‎08-08-2013

Hi Stacy, thanks for your detailed response. Something very odd is happening indeed, because I noticed after I get the error message and the script seems to be just hanging there (tried waiting 1/2 an hour and saw no additions to my csv file), I close the script and after I've closed it there are several more .mxd's that appear in the csv file. I'm wondering if that's the reason it's not "stopping" on the same .mxd every time - it looks to me like it's crashed, but it's actually still running? I'm going to try running it overnight and see what's there in the morning.

I had the script running with no errors before adding in the dataSource or workspacePath. It's definitely something about that which is causing problems. I'll go through and simplify again, as per your suggestions, and see if I can narrow it down to any particular .mxd. The addMessage didn't print anything to the idle window. I also noticed I had that unused lyr line in there - forgot to remove that.

Some data types (ie. annotation feature classes) won't return a dataSource, and apparently an error will be thrown if you don't test them out first.

Lauren, thanks for the tip on workspacePath - I'll stick with this then. Eventually I want to repair the broken links, I'm trying to find out what/where they are first so I can figure out the paths to where the data has moved to. Our folder system was totally reorganized a few years back and I want to archive all these old .mxd's by creating map packages. Do you know if workspacePath is also better to use on layer files than dataSource? I have pretty much the same script which runs on all the layer files in a directory, not getting any errors with that one.

UPDATE: I managed to isolate an .mxd, and when I checked into it, there was no data in it. No layers, just an empty data frame. This doesn't explain why sometimes the script would go past this point, or stop several .mxd's before this point? One time it even ran through every .mxd in the directory, though it still showed the error. Anyway I've deleted the offending .mxd and will try and find a way to identify an .mxd with no layers in it. I'd like to print this to my csv file as well. Thanks all for your help!

lelaharrington · ‎04-22-2015

Hello i have been struggling through the same problem as you i have an H: that has been moved around so much that i need to find all mxd's with broken data. i will then look to how efficiently it is to resource the data or if the map is to old and i cant find the sourcing to do something else.

can you or do you mind sharing your final working code?

LauraMiles1 · ‎04-22-2015

Hi lela, I haven't looked at this in a long time but I do think it was working. I never made it to actually replacing the broken data sources as other things took priority but this should at least list which items have broken sources. I have two different scripts depending on data source type and never got to combining them but here they are. Should be pretty simple to add the table views one in if it's needed. I think it was hanging up before I put the test in to see if the data type is able to return a dataSource value, after I did that it worked fine.

List Broken .mxd and .lyr sources:

import arcpy, os
#Update the following path with the folder you want to inspect
path = r"H:\Plans\GIS Plans\2004\G2004-045"
#Update the .csv file name below
f = open('2004_Test.csv', 'w')
f.write("Type, File Path, Layer, Broken Path" + "\n")
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
#Write the information for all .mxd's with broken data sources
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                f.write("MXD, " + fullPath + ", " + brknItem.name)
#Test to see if the data type is able to return a dataSource value
                if brknItem.supports("dataSource"):
                    f.write(", " + brknItem.dataSource + "\n")
                else:
                    f.write("\n")
#Write the information for all .lyr's with broken data sources
        elif extension == ".lyr":
            fullPath = os.path.join(root, fileName)
            lyr = arcpy.mapping.Layer(fullPath)
            brknLYR = arcpy.mapping.ListBrokenDataSources(lyr)
            for brknItem in brknLYR:
                f.write("LYR, " + fullPath + ", " + brknItem.name)
#Test to see if the data type is able to return a dataSource value
                if brknItem.supports("dataSource"):
                    f.write(", " + brknItem.dataSource + "\n")
                else:
                    f.write("\n")
f.close()


print "Script Completed"

List broken table views:

import arcpy, os
#Update the following path with the folder you want to inspect
path = r"H:\Plans\GIS Plans\2004\G2004-045"
#Update the .csv file name below
f = open('2004_TestTableView.csv', 'w')
f.write("Type, File Path, Layer, Broken Path" + "\n")
for root, dirs, files in os.walk(path):
    for fileName in files:
        basename, extension = os.path.splitext(fileName)
#Write the information for all .mxd's with broken data sources
        if extension == ".mxd":
            fullPath = os.path.join(root, fileName)
            mxd = arcpy.mapping.MapDocument(fullPath)
            brknMXD = arcpy.mapping.ListBrokenDataSources(mxd)
            for brknItem in brknMXD:
                if extension ==".dbf":
                    f.write("MXD, " + fullPath + ", " + brknItem.name + ", " + brknItem.dataSource + "\n")


f.close()


print "Script Completed"