Detect broken links with arcpy?

1981
20
03-07-2014 10:59 AM
TracyDash
New Contributor III
Hi,

I adapted some code I found online to search for broken (moved or renamed) hyperlinks in a shapefile. It should write all broken links to a text file, but instead it is writing ALL the hyperlinks to the text file. Any advice on what to change? I know extremely little about python.

Also, I have arcmap 10.1

thank you!

import arcpy
import os

fc = r" C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
fields = ["PROJECT_DW"]

with arcpy.da.SearchCursor (fc,fields) as cursor:
    for row in cursor:
        Roll = str(row[0])
        if os.path.exists(Roll):
            pass
        else:
            f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","a")
            f.write(Roll + os.linesep)
            f.close()
Tags (2)
0 Kudos
20 Replies
TracyDash
New Contributor III
Forgot to add, also tried James idea and still all showed up.
0 Kudos
JamesCrandall
MVP Frequent Contributor
Forgot to add, also tried James idea and still all showed up.


Your issue is related to how Python is handling the escape characters ("\") in your path names.  As Joshua corrected my post, you will not be able to overcome your problem by simply attempting to string.replace() method on the values, and I have not found any other way to deal with this.  For now, you'd have to calculate a new field like Joshua suggested.

Edit: I noticed you said you tried Joshua's suggestion but I think you may have missed the full solution as it worked for me.  That is if I repalce my test string to this:

Roll = "H:\\exists\\broken\\dfBA D.csv"


It will get correctly evaluated in any test logic:

if os.path.exists(Roll):
    print "roll exists: " + Roll
else:
    print "roll failed"
0 Kudos
TracyDash
New Contributor III
Well, I added a new field and recalculated it so all \ became \\. Still, every single hyperlink showed up on my text file when I ran the original code after doing that (and I did change the field name). So not sure where to go from here.
0 Kudos
JamesCrandall
MVP Frequent Contributor
Well, I added a new field and recalculated it so all \ became \\. Still, every single hyperlink showed up on my text file when I ran the original code after doing that (and I did change the field name). So not sure where to go from here.


Maybe another dumb idea but make sure you can see the F: drive from where you are executing the Python script.  I stay away from mapped drives and opt for using UNC paths for everything file/directory related (which also has an issue with character limits).

Sorry I cannot help any more on this one as I am not sure what else I could do.
0 Kudos
TracyDash
New Contributor III
Yes, will do. Our server can be strange at times.

But thank you so much for all your help!
0 Kudos
JoshuaChisholm
Occasional Contributor III
Ok, two more dumb questions:
1) Have you been clearing/deleting the "C:\Users\GIS\Documents\brokenlinks.txt" file before each time you've run the script. You are using the append parameter ("a"). This means every time you run the script all additional broken links (or whatever this script is spitting out) will be appended to the end of whatever already exists.

If it were me, I would organize my script a little differently:
import arcpy
import os

fc = r" C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
fields = ["PROJECT_DW2"]

toTxtFile=""

with arcpy.da.SearchCursor (fc,fields) as cursor:
    for row in cursor:
        Roll = str(row[0])
        if os.path.exists(Roll):
            pass
        else:
            toTxtFile+=Roll + os.linesep

#write to file
f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","w") #note that "w" will overwrite the file if it already exists
f.write(toTxtFile)
f.close()


2) Can you provide a sample (copy and paste) of the kind of links that should be working but are appearing in the brokenlinks.txt file anyway.
0 Kudos
TracyDash
New Contributor III
1. Yes I have

Also- tried your script (thanks!!) but it still produced the same result.

2. OK I attached two files- the first is a piece of the attribute table and the second is a piece of the txt file (if you scroll down on the text file there are thousands of more hyperlinks). It's a bit funny because some fields are blank (could this be an issue?? I realize now that I should have mentioned that sooner.)

-also I forgot to run the second field I made with two backslashes. It has the same result, however. Sorry
0 Kudos
JoshuaChisholm
Occasional Contributor III
Ok. I'm really stumped.

Let's try another few small changes to the script:
import arcpy
import os

fc = r"C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
fields = ["PROJECT_D2"]

toTxtFile=""

with arcpy.da.SearchCursor (fc,fields) as cursor:
    for row in cursor:
        Roll = str(row[0])
        if not os.path.exists(Roll):
            toTxtFile+=Roll + os.linesep

#write to file
f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","w") #note that "w" will overwrite the file if it already exists
f.write(toTxtFile)
f.close()


Secondly, let's also try picking out a few hyperlinks from brokenlinks.txt that you are 100% sure actually exist (copy and paste into windows explorer). Then let's opening python (Start > All Programs > ArcGIS > Python [2.7] > IDLE (Python GUI)), type in import os, and try those selected hyperlinks manually. For example:
os.path.exists(r"F:\Lp04\PROV NORTHGLEN\DWG\SHEET 1.DWG")

Just in case, let's test these two lines too:
os.path.exists(r"F:")
os.path.exists(r"C:")


Let me know the results.
0 Kudos
TracyDash
New Contributor III
1. code yielded same results as before

2. tested true (also tried a fake hyperlink which tested false)

3. true
0 Kudos
JamesCrandall
MVP Frequent Contributor


2. tested true (also tried a fake hyperlink which tested false)



This is because the string literal is forced.  The problem is that because you have stored your file paths/links with single "\" backslashes, Python is evaluating these as escape characters.

If you were to replace:

os.path.exists(r"F:\Lp04\PROV NORTHGLEN\DWG\SHEET 1.DWG") 


With:

os.path.exists("F:\Lp04\PROV NORTHGLEN\DWG\SHEET 1.DWG")


Then it will evaluate as false.



Additionally, I tested these:

These evaluate as True:

os.path.exists("F:\\Lp04\\PROV NORTHGLEN\\DWG\\SHEET 1.DWG")

os.path.exists("F:/Lp04/PROV NORTHGLEN/DWG/SHEET 1.DWG")



I didn't find any way to apply a .replace method on the stored paths you have because it will always incorrectly evaluate the second backslash in each stored path and fail.  You will have to correct the source data with some other means as far as I can tell.  Although I'd really like to see a Python solution.
0 Kudos