Select to view content in your preferred language

How to validate Hyperlinks that point to pdfs in system

371
5
02-12-2026 08:34 AM
FSD_Main
Emerging Contributor

Is there a way to validate what hyperlinks work or don't work from a field in the attributes table? I have a field with hyperlinks that direct to pdfs in my system and I have over 3000 segments. Not sure if there's an easier way to check besides going through ever single feature in the attributes table. Thank you.

0 Kudos
5 Replies
D_Atkins
Frequent Contributor

If your hyperlinks are actual http:// webserver addresses, it should be straightforward using a Python notebook.  

You'd need a hint of ArcPy to access the feature-table, then to iterate over each row.  Then use each hyperlink-attribute with a Python library like 'requests' to test the server response.  

Something like this AI boilerplate? 

import arcpy
import requests

# Set input feature class and URL field name
feature_class = r"C:\Path\To\Your.gdb\FeatureClassName"
url_field = "URL_FieldName"

# Use SearchCursor to iterate over the URL field
with arcpy.da.SearchCursor(feature_class, [url_field]) as cursor:
    for row in cursor:
        url = row[0]
        if url:
            try:
                # Send HTTP GET request
                response = requests.get(url, timeout=5)
                if response.status_code == 200:
                    print(f"Active: {url}")
                else:
                    print(f"Broken ({response.status_code}): {url}")
            except requests.exceptions.RequestException as e:
                print(f"Error: {url} - {e}")





0 Kudos
AlfredBaldenweck
MVP Frequent Contributor

Similarly, if they're just a file path in a text field (this will work with html too but you'll have to do some work to extract the text you care about)

import arcpy
import os

tbl = r"path\to\table"
fields = ["OID@", "LinkField"]
with arcpy.da.SearchCursor(tbl, fields) as cursor:
    for row in cursor:
        # Check if file exists
        if not os.path.exists(row[1]):
            #Print entire row (object id and link)
            arcpy.AddMessage(row)
FSD_Main
Emerging Contributor

Thanks! I'll give this a try though I'm not familiar with ArcPy at all. 

0 Kudos
FSD_Main
Emerging Contributor

They are not a web server with a URL it's just a file path here is an example of one T:\gis_fsdG2\fsdG2_layers\Pipe_Insp_Reports\A001-D119.pdf. And I'm not too experienced with Python or ArcPy.... but I'm willing to try and learn. Thank you for the response. 

0 Kudos
BobBooth1
Esri Regular Contributor

I'd use Python - you could do it in a Notebook in ArcGIS Pro.

The first step would be to load the unique ID and URLs from the table into a list of tuples:

Get_ID_and_URL.png

My data has a key field and a license field. The idea is, get both values, so when you check the URL, if it fails, you can make a list of the features that you need to go back and fix.

import arcpy

# Define the path to your feature class or table
# Example: r"C:\...\your_geodatabase.gdb\your_feature_class"
# Example: r"C:\...\your_shapefile.shp"
feature_class = "M_Monax_NA"

# Define a list with the names of the fields you want to extract values from
field_name = ["key","license"]

# Use a list comprehension with a SearchCursor to create the list
# The [0] index accesses the first value from the first (and only specified) field and the [1] index accessed the second value in the row tuple
field_values_list = [(row[0], row[1]) for row in arcpy.da.SearchCursor(feature_class, [field_name])]

# Print the resulting list of values
print(field_values_list)

 

Once you have the list of tuples of ID and URLs, you can get one to test with.

get_URL_from_first_tuple_to_test.png

The notation field_values_list[0] gets the first pair from the list, and the [1] after that gets the URL (which is the second item, after the key value in the tuple).

This URL in my data didn't work so well as an example, as it didn't have a file name, so I found the equivalent version of that page as  plaintext.

the_URL_to_text_file.png

Once you have a URL, you can use the requests library to test if there is a file there.

test_file_availability.png

import requests

def url_file_exists(url: str, timeout: int = 5) -> bool:
    """
    Checks if a file exists and is accessible at a given URL using a HEAD request.
    """
    try:
        # Use HEAD request for efficiency
        response = requests.head(url, timeout=timeout)
        # A status code of 2xx indicates success (e.g., 200 OK)
        return response.status_code // 100 == 2
    except requests.exceptions.RequestException:
        # Catches connection errors, timeouts, etc.
        return False

# Example usage:
url_to_check = otherTest
if url_file_exists(url_to_check):
    print(f"File exists at {url_to_check}")
else:
    print(f"File does not exist or is not accessible at {url_to_check}")

 

Once you have the url_file_exists function defined, you can run it in a loop on the list and do things conditionally, depending on whether it found a file or not.

Here I'm adding the bad pairs to another list.

test_add_add_fails_to_badList.png

# Make and empty list to hold the ones that fail the test
BadList = []
# Loop over the list of key,URL pairs
for testPair in field_values_list:
    # grab the URL out of the pair 
    url_to_check = testPair[1]
    if url_file_exists(url_to_check):
        print(f"File exists at {url_to_check}")
    else:
        # Add that pair to the BadList
        BadList.append(testPair)
        print(f"File does not exist or is not accessible at {url_to_check}")

You could print the BadList, or write it to a text file or a table. Then you know which ones need to be fixed.

For large numbers of pairs, this may take a while. Also, the process may happen faster than the notebook can keep up, so it might be necessary to add code inside the loop to sleep for a second after each try.

That'll make it take longer, but less risk of a problem.

import time

time.sleep(1)

 

0 Kudos