Detect broken links with arcpy?

1989
20
03-07-2014 10:59 AM
TracyDash
New Contributor III
Hi,

I adapted some code I found online to search for broken (moved or renamed) hyperlinks in a shapefile. It should write all broken links to a text file, but instead it is writing ALL the hyperlinks to the text file. Any advice on what to change? I know extremely little about python.

Also, I have arcmap 10.1

thank you!

import arcpy
import os

fc = r" C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
fields = ["PROJECT_DW"]

with arcpy.da.SearchCursor (fc,fields) as cursor:
    for row in cursor:
        Roll = str(row[0])
        if os.path.exists(Roll):
            pass
        else:
            f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","a")
            f.write(Roll + os.linesep)
            f.close()
Tags (2)
0 Kudos
20 Replies
JamesCrandall
MVP Frequent Contributor
Hi,

I adapted some code I found online to search for broken (moved or renamed) hyperlinks in a shapefile. It should write all broken links to a text file, but instead it is writing ALL the hyperlinks to the text file. Any advice on what to change? I know extremely little about python.

Also, I have arcmap 10.1

thank you!

import arcpy
import os

fc = r" C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
fields = ["PROJECT_DW"]

with arcpy.da.SearchCursor (fc,fields) as cursor:
    for row in cursor:
        Roll = str(row[0])
        if os.path.exists(Roll):
            pass
        else:
            f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","a")
            f.write(Roll + os.linesep)
            f.close()



In other words, it is evaluating everything as "broken". 

This may seem like a silly question but is "PROJECT_DW" the correct field that contains the paths?

Also: would the evaluation fail if there are blank spaces in your "Roll" string variable?  Maybe you should attempt to deal with that possibility:

Roll = str(row[0])
Roll = Roll.strip()
0 Kudos
JoshuaChisholm
Occasional Contributor III
Code looks (really) good to me. A few questions:

  1. Are the hyperlinks to a drive you are now connected to?

  2. Are the hyperlinks pointing to the internet? If so, try this code:

  3. import arcpy
    import os
    from urllib2 import urlopen
    
    fc = r" C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
    fields = ["PROJECT_DW"]
    
    with arcpy.da.SearchCursor (fc,fields) as cursor:
        for row in cursor:
            Roll = str(row[0])
            try:
                urlopen(Roll)
            except:
                f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","a")
                f.write(Roll + os.linesep)
                f.close()

0 Kudos
TracyDash
New Contributor III
jamesfreddyc:

That's a good thought- I was just thinking that 'pass' wasn't working

That's not a silly question, but yes it is the correct field

and thanks for the idea, tried but eliminating spaces didn't make a difference...


hua17:

1. yes, the hyperlinks are on a drive I'm connected to. The drive can be a bit flaky at times (well, just when the computer falls asleep), but I think it should be OK...

2. no
0 Kudos
JamesCrandall
MVP Frequent Contributor
Can you post an example of how the hyperlink looks?  Maybe it is a character evaluation thing that isn't working as expected.

Also, just for kicks, print out "Roll" to verify it



import arcpy
import os

fc = r" C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
fields = ["PROJECT_DW"]

with arcpy.da.SearchCursor (fc,fields) as cursor:
    for row in cursor:
        Roll = str(row[0])

        ## print out the link to verify it
        #try to print it like this first:
        print "{0}".format(row[0])
     
        #try to print it like this too:
        print str(Roll)

        if os.path.exists(Roll):
            pass
        else:
            f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","a")
            f.write(Roll + os.linesep)
            f.close()

0 Kudos
TracyDash
New Contributor III
Sure, here are examples of types of hyperlinks we will search through:

folders: F:\LP04\COPP DUNN PLANTATION
pdfs: F:\MSL\EMAIL SURVEYS\STANDARD PACIFIC\DUNN PLANTATION\L65 GBBF 10-30-03.PDF
dwgs: F:\LP04\COPP DUNN PLANTATION\DWG\116-03-295 LT65.DWG

If spaces are an issue, than I'm pretty much screwed. There is way too much data to change all the names..

Tried the print function- again displayed ALL the hyperlinks instead of just the broken ones (I entered some fake links for testing).
0 Kudos
JamesCrandall
MVP Frequent Contributor
Sure, here are examples of types of hyperlinks we will search through:

folders: F:\LP04\COPP DUNN PLANTATION
pdfs: F:\MSL\EMAIL SURVEYS\STANDARD PACIFIC\DUNN PLANTATION\L65 GBBF 10-30-03.PDF
dwgs: F:\LP04\COPP DUNN PLANTATION\DWG\116-03-295 LT65.DWG

If spaces are an issue, than I'm pretty much screwed. There is way too much data to change all the names..

Tried the print function- again displayed ALL the hyperlinks instead of just the broken ones (I entered some fake links for testing).


I tested for spaces in the filename and the os.path.exists picked it up just fine.  I am wondering if the path is not constructed well enough though.

My best guess is that you are not able to apply string literal to the evaluation.  For example, this would evaluate as "else":

roll = 'H:\exists\broken\dfBA D.csv'
if os.path.exists(roll):
    print "roll exists: " + roll

else:
    print "roll failed"



However, this would evaluate as "exists":

roll = r'H:\exists\broken\dfBA D.csv'
if os.path.exists(roll):
    print "roll exists: " + roll

else:
    print "roll failed"


The difference being that the "roll" variable is set with an "r" in front of the actual string.
0 Kudos
JamesCrandall
MVP Frequent Contributor
See if this works -- replace "\" with "\\" in your path string formation:


import arcpy
import os

fc = r" C:\Users\GIS\Documents\TEST 2-25\project_test.shp"
fields = ["PROJECT_DW"]

with arcpy.da.SearchCursor (fc,fields) as cursor:
    for row in cursor:
        Roll = str(row[0])
        Roll = Roll.replace("\\", "\\\\")
        if os.path.exists(Roll):
            pass
        else:
            f = open(r"C:\Users\GIS\Documents\brokenlinks.txt","a")
            f.write(Roll + os.linesep)
            f.close()




This is also valid way to deal with it too:

roll = roll.replace('\\', r'/')
0 Kudos
JoshuaChisholm
Occasional Contributor III
I think James might have found the problem. If so, it's a pretty similar problem to this forum page.

I'm not sure if the replace functions will work. As soon as python reads in the string from the cursor (Roll = str(row[0])), it sees any \'s as special characters. For example, it sees "Path\to\newfolder" as "Path[tab character]o[new line character]ewfolder". I don't think the replace will work, because it no longer thinks \ is in the string.

One (lame) workaround might be to create a new field in the attribute table (maybe called "PROJECT_DW2") and replace "\" with "\\" (using VB).
Something like this:
[ATTACH=CONFIG]32082[/ATTACH]

Then run your original script with "PROJECT_DW2" instead of the original path.
0 Kudos
TracyDash
New Contributor III
That is a great thought. I had no idea that could be an issue- unfortunately I did what Joshua said and still ALL the hyperlinks showed up.

Is it possible to blame my sometimes occasionally flaky network? I usually don't have problems during the day, just when my computer starts up and wakes up from sleeping...or possibly some sort of other path issue?
0 Kudos