Python Tips and Tricks in ArcGIS

TravisOrmsby · ‎06-13-2024

Check out the Python Tips and Tricks for ArcGIS training seminar where @DerekNelson1 and I showcase dozens of strategies for helping you automate repetitive tasks and complex workflows.

Look around for existing solutions before starting from scratch.

Chances are very good that somebody else has had a similar problem and created a Python module that you can use to help solve the issue.

ArcPy is for the types of tasks you would do in ArcGIS Pro.

If you are looking to automate geoprocessing tasks or work with local files and databases, the ArcPy library has Python functions that replicate the things you can do manually in the graphic user interface.

The ArcGIS API for Python is for the types of tasks you would do in your portal.

Automating the management of users, groups, and content in your ArcGIS Online or ArcGIS Enterprise portal is a great use case for the ArcGIS API for Python. The Python API also has tools for creating a Spatially Enabled DataFrame from either local data or feature services. This is an extension of a Pandas DataFrame, so comes with all the power of Pandas tabular data manipulation alongside spatial properties and methods for handing geographic data.

Notebooks are great for teaching, sharing, and storytelling with code.

The ability to chunk code into cells, add markdown to explain the code, and include the output in the notebook makes them powerful tools for helping to iterate on your code and help others understand the context and importance of the work represented in the notebook.

An IDE has powerful tools to make it easier to write code.

An integrated development environment like PyCharm or VS Code includes sophisticated debugging tools, error and warning detection, automated linting, and helpful code completion tools to make it easier to write lots of code. It is even becoming common for IDEs to integrate AI code assistants to further accelerate development.

Managing a Python environment is hard. Expect to run into challenges.

Every module in a Python environment needs to be compatible with every other environment. It can be easy to accidentally break your environment in subtle ways.

Use the ArcGIS Pro default Python environment to minimize challenges.

Fortunately, the arcgispro-py3 environment that comes with ArcGIS Pro has a set of packages curated by Esri developers that has the libraries you will need for the vast majority of GIS tasks. Because you cannot alter this default environment, you will always have at least one working Python environment if you are working with ArcGIS Pro.

Clone the default environment to create a custom environment.

For those cases where you need additional libraries that are not included with arcgispro-py3, you can clone the default environment and add those libraries from the Package Manager in the Project tab of ArcGIS Pro. You have access to thousands of Python libraries through conda that you can install from the interface.

Use conda for advanced environment management. Avoid pip if possible.

Some advanced tasks require more complex environment management than can be done through the ArcGIS Pro interface. You can access conda from the command prompt for those tasks, such as specifying a particular channel from which to get a package, or exporting your environment as a text file. While it is possible to install packages with pip, you run the risk of installing incompatible packages, because pip does not fully solve for all dependencies in your environment. Sometimes you need to use pip if a package you require is not available in any conda channel, but it is better to avoid using it if possible.

Be aware of package/library dependencies and licenses listed in Package Manager.

The Package Manager will help you identify dependencies for all your installed libraries. It will also link to the license for any packages available for installation, so you can be sure the license requirements are compatible with your use case.

Send to Python window/Notebook.

You can drag and drop a tool from the Geoprocessing pane into the Python window to get the correct syntax for the tool. In recent version of ArcGIS Pro, you can also drag it into an ArcGIS Notebook.

Leverage existing resources like ModelBuilder models.

If you already have a model that performs a task, you do not need to rewrite it from scratch to use in a Python script. You can call the model as a geoprocessing tool in your script. You also have the option to have ArcGIS Pro export your model as Python code.

Verify that files and feature classes exist before executing code on it.

The Exists function will test if a particular data object actually exists or not. If it does not, you can skip running some of the code. Strategies for early returns help your code run faster.

Verify rows/values exists before executing.

Even if a data object exists, it may have empty data. You can use the GetCount function to identify whether a table is empty and skip code execution if so.

Parameterize model elements to allow for user input.

Models with parameters are easier to incorporate into a script, because you can programmatically insert different input values for its parameters.

Copy Python Command from the graphic user interface in ArcGIS Pro.

When you open a tool in the Geoprocessing pane, there is a little up arrow next to the Run button. One of the options is to Copy Python Command. This will copy a Python snippet that represents the parameters needed to replicate the parameters you have populated in the user interface. This is our most impactful tip for users who are familiar with doing geoprocessing tasks in ArcGIS Pro, but new to Python. You can see the correspondence between the parameters you chose in the familiar interface and the values that are passed as arguments to the Python function. In addition to getting a copy of the Python command from the Geoprocessing pane, you can also right-click any item in your Geoprocessing History and get the same snippet.

Use Copy Path in the ArcGIS Pro UI.

File paths are painful to type, and it is easy to make a mistake. In the Catalog pane in ArcGIS Pro, you can right-click a data object and Copy Path so you do not have to type it.

Regularly check that your code is returning expected results while you build.

You do not want to spend hour writing code without checking to see if it actually works. The more code you write between tests, the more code you have to check for bugs when it inevitably fails. Frequent checks can help you identify problematic code early.

Seaborn is great for chart/graphs.

Graphic visualizations can help you (and other people) understand your data better. Seaborn is a good library for creating those visualizations, and when you run it in a notebook, you can see the output directly with your code.

import seaborn as sns

# Dictionary of values
cryptid_data = {
    "Lochness Monster":LochnessMonsterCount, 
    "Chupacabra":ChupacabraCount, 
    "Bigfoot":BigfootCount
}

# Bar chart from the dictionary
cryptid_plot = sns.barplot(
    x=list(cryptid_data.keys()), 
    y=list(cryptid_data.values())
)

Never put login credentials in your code.

This is a huge security risk. The ArcGIS API for Python allows you to leverage the login credentials of your environment if you are working in ArcGIS Pro, Notebook Server, or ArcGIS Online Notebook. You can also use profiles, which keep use your operating system's secrets management tools, or PKI which uses security certificates. Check out the Working with different authentication schemes documentation for details.

Notebooks have rich output that can include images, hyperlinks, and formatted text.

This is part of what makes a notebook so great for teaching and sharing. The rich output helps people better understand the results of the work you have done in your code.

Use additional parameters to narrow your search in the ArcGIS API for Python.

The search method of the ContentManager can search on categories or item_type or by a query on a number of fields. You can see some example queries in this knowledge base article.

Assign meaningful variable names to the objects you work with.

Assigning variable names will allow you to make use of values you create. Giving them meaningful names will help you understand what those values mean.

x = 2405

Does not really mean anything.

population = 2405

Provides the relevant context for the value within the script.

Be aware of the specific type of object you are working with.

Different objects have different properties and methods. It is easy to get mixed up with similar types of objects, such as FeatureLayerCollection, FeatureLayer, and FeatureSet. Use type checking to confirm what you are working with, and the documentation to check what you can do with a particular object.

Search “arcpy <thing you are trying to do>”.

Search engines like Google and Bing will almost always return the correct documentation in the first few search results. Results from Esri Community, and Stack Exchange also tend to be valuable resources because they come from people asking questions. There is a very good chance that somebody else has already asked the same question you have, and you can see what worked for them.

The docs are your best friend

Nobody remembers all this stuff. The single most important skill you can cultivate is familiarity with the documentation. Use the sample code to see how different functions and objects work.

Use keyword arguments instead of only positional arguments.

Using keywords when calling functions, instead of just relying on argument order, makes your code more readable. It allows you to skip some optional parameters and makes it more obvious to people who read your code later what the different values are for.

Utilize help() or ?

Pass an object to the help function to get access to the documentation for that function without having to leave your development environment. That reduces the cognitive cost of context switching, and makes it easier to get back to writing code. In a notebook environment, you can also append a ? to an object to get nicely formatted documentation.

Use lists and loops to batch process data.

ArcPy has a number of different list functions like ListWorkspaces or ListFiles that automatically generate lists of data objects. Once you have a list, you can loop over all the items in that list to perform some task.

# Set the workspace
arcpy.env.workspace = r"C:\Input_Data"

# List all file geodatabases in that workspace
fgdbs = arcpy.ListWorkspaces(workspace_type="FileGDB")

# Run the model for every file geodatabase in the workspace
for gdb in fgdbs:
  arcpy.PythonTipsandTricksatbx.Model1(Database=gdb)

Schedule periodic tasks

If a process needs to be run every month, week, or even every minute, you can schedule that task. Local tools can be scheduled from the ArcGIS Pro interface. You can also schedule notebooks running in ArcGIS Notebook Server or in ArcGIS Online.

Set environmental variables to control how arcpy geoprocessing tools work

If you need to set a workspace, ensure that outputs can be overwritten, specify an output spatial reference, or any other geoprocessing environment variable, you can declare those using the env class in ArcPy. Not every tool honors every environmental variable, so make sure you check the specific documentation for that tool to see which environments it will honor.

Use variables for file paths, don’t hard code them into function parameters.

File paths are long, hard to read, and often don't reflect the purpose or meaning of the data they represent. By assigning them to variable names, you can simplify the reference to data at the path, and give the data a name that provides the proper context for data.

Use the Describe function to get information about a data element.

In the user interface, if you want to find out about a data element, you can usually right-click and select Properties. In a script, you cannot right click on anything. The Describe function can create an object that holds all the properties of that data element. You can access those properties and use them in your script.

import pandas as pd
import arcpy

# ArcPy tools honor the workspace environment, but pandas does not
# Use Describe to extract the full file path of an element
arcpy.env.workspace = r"C:\Data"
delays = "AA_delays_jan_2023.csv"
desc = arcpy.Describe(delays)
delays_df = pd.read_csv(desc.catalogPath)

Use the pandas library for tabular data manipulation.

Pulling tables or feature classes into a pandas DataFrame lets you use the powerful tools of that library to clean up, summarize, group, and aggregate your data (and more!). The Spatially Enabled DataFrame extension to pandas created by the ArcGIS API for Python lets you combine the standard pandas tools with additional spatial functionality for geographic data.

delays_df.fillna(0, inplace=True)
airport_group = delays_df.groupby(['ORIGIN_AIRPORT_SEQ_ID'])
delays_by_airport_df = airport_group.agg({'WEATHER_DELAY': 'mean'})

Join non-spatial data to spatial data to take advantage of spatial analysis tools.

Datasets that you work with will often be implicitly spatial, but not have actual geometry. For example, data about a known location, or data with addresses instead of coordinates. You can incorporate these datasets into your spatial analysis if you can join them with geometries for those locations with tools like Join, Spatial Join, or Geocoding. If you have a Spatially Enabled DataFrame, you can use the merge function in pandas to join spatial and non-spatial data into a new Spatially Enabled DataFrame.

airports_sdf = pd.DataFrame.spatial.from_featureclass(airports_fc)
airport_delays_sdf = airports_sdf.merge(
    right=delays_by_airport_df, 
    how='inner', 
    left_on='AIRPORT_SEQ_ID', 
    right_on='ORIGIN_AIRPORT_SEQ_ID'
)

Spatially Enabled DataFrames can be inputs for many geoprocessing tools.

While a DataFrame is not a feature class, you can treat it as a feature class for purposes of many tools. That is useful because you do not have to export the DataFrame to a new format just to do some additional processing.

# Spatially join a DataFrame to a feature class
arcpy.analysis.SpatialJoin(
    airport_delays_sdf, 
    climate_data, 
    delays_by_airport_weather, 
    match_option="CLOSEST"
)

Use variables to make it easier to chain geoprocessing tools together.

In complex analyses, you often want to take the output of one tool and pass it as the input to another tool. Variables make it easier to write flexible code, because you can change the value of the variable without having to alter all the geoprocessing tool code you wrote.

# Spatially join a DataFrame to a feature class
arcpy.analysis.SpatialJoin(
    airport_delays_sdf, 
    climate_data, 
    delays_by_airport_weather, 
    match_option="CLOSEST"
)

# Pass output from Spatial Join as input to GLR
arcpy.GeneralizedLinearRegression_stats(
    in_features=delays_by_airport_weather,
    dependent_variable="WEATHER_DELAY",
    explanatory_variables= ["Mean_T_f_01_Jan", "Mean_mmPr_01_Jan"]
    model_type="CONTINUOUS",
    output_features=delay_climate_glr", 
)

Use variables to make it easier to iterate / experiment with parameter values.

Often with a complex task, you will want to try different values for some of the parameters. Or you will want to make those parameters user-defined so the script can handle different users' input. Hard coding the values into the parameters makes it harder to do that. Assigning variables to those values and moving them out of the function call makes it easier.

# Move values out of the function call
dependent_variable = "WEATHER_DELAY"
explanatory_variables = [
    "Mean_T_f_01_Jan",  
    "Mean_mmPr_01_Jan" 
]

arcpy.GeneralizedLinearRegression_stats(
    in_features=delays_by_airport_weather,
    dependent_variable=dependent_variable,
    explanatory_variables= explanatory_variables
    model_type="CONTINUOUS",
    output_features=delay_climate_glr
)

Learn the productivity shortcuts of your development environment

You are probably familiar with tools like ctrl+c to copy or ctrl+z to undo. Different development environments also have their own keyboard shortcuts and macros that enable you to be more productive. It can take some time to learn these shortcuts, but the more code you write, the more time you will save in the end by learning them. In a notebook environment, you can click on the command palette on the right side of the toolbar to see all the shortcuts.

When you get a frustrating error, walk away for a bit.

By far the best debugging strategy is a break. Talk a walk. Get a drink of water. Work on something else. Go to bed. When you come back to the problem, you will often have a different perspective that allows you to break through the issue. If you are getting angry or frustrated, you know it is time to take a break.

Get a buddy to help you.

The second most valuable debugging strategy is to ask for help. Spend some time trying to solve the problem on your own first, but do not keep the issue to yourself forever. Sometimes a buddy can spot a problem very quickly that you did not see. And when your buddy has a problem, they can come to you for help.

Start at the bottom of the error message.

An error message is not a novel. Do not read it from beginning to end for fun. It is a mystery novel. Jump to the end to find out the culprit. The last few lines in an error message will likely be the most useful.

Success in debugging means getting a different error than you did before.

This is a key tip for managing your emotional state during a debugging session. If you conceptualize success as getting your code perfect, you will never win. Because code that does something interesting is always imperfect. But if you think about success as incrementally moving towards perfection, you will experience many wins along the way.

Use print statements to check values and datatypes.

Python variables can change values and even datatypes. That makes it extremely flexible, but that flexibility is also the source of many errors. The simplest way to check if a variable is the correct value or type is to print it out.

# Print the value of the csv_list to make sure it is correct
arcpy.env.workspace = r"C:\LTS"
csv_list = arcpy.ListFiles("*.csv")
print(csv_list)

Incorrect file paths are a frequent source of error.

Copy/pasting file paths can help problems associated with mistyping file paths. But you still might copy the wrong path. Or somebody else using your script may not have the data on the same path you hard-coded into the script. If there is a problem accessing or loading data, the file path should be among the very first things you check.

Typos are another frequent source of error.

A funny thing about the human brain is that you know what you meant to type, so you see what you meant to type, not what you actually typed. File paths, long function names, of special characters are common sources of typos. Finding typos is a really good use case for having a buddy help you, because they only see what you actually typed.

Code completion can help avoid typos.

An IDE has sophisticated code completion tools that can help you find the right function or parameter name. It not only saves you time to make heavy use of auto-complete, it also dramatically reduces the risk of typing the wrong value. These code completion tools are one of the first things you should practice to make full use of your development environment.

Python complements your subject matter expertise. It does not replace it.

Perfect code that runs without error and solves the wrong problem is worthless. Your subject matter expertise in climatology, demography, transportation, or any other domain that uses spatial data matters immensely. Python is an extra tool you can use to multiply the value of that expertise.

Additional resources

For further Python training, check out Esri's Creating Python Scripts for ArcGIS course: https://go.esri.com/python-course.
Ask and answer questions in the Esri Communities for Python/ArcPy and the ArcGIS API for Python