Python Snippets Blog

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Latest Activity

(6 Posts)
JeffreyThompson2
MVP Frequent Contributor

I have been struggling for several days trying to convert a legacy Python 2/ArcMap based script to run in the Python 3/ArcPro environment. It is a multi-step, data management script that involves clearing out the old data and putting in the new data. The script kept failing before a call to TruncateTable with "ERROR 000187: Only supports Geodatabase tables and feature classes." and/or at Compact with "ERROR 000535: Workspace does not exist." This didn't make sense since the data being feed to TruncateTable was a feature class and the Workspace being compacted did exist. And this script works just fine in the ArcMap runtime.

We figured pretty early on that it was probably some sort of data locking issue or possibly something to do with the file path names being interpreted differently in the different flavors of Python. We tried writing our file paths differently, explicitly starting and stopping an edit session, and a bunch of other stuff. None of that worked.

Then, I found a StackExchange answer that mentioned ClearWorkspaceCache. I threw arcpy.management.ClearWorkspaceCache() on the line before the TruncateTable and Compact calls and the script just worked. So the data was in fact being locked by the caching.

Moral of the story: If you have a script that doesn't work and you don't know why and it's giving you errors that don't make sense, try ClearWorkspaceCache().

more
3 4 407
DanPatterson
MVP Esteemed Contributor

Yes... some more einsum examples.

Read more...

more
1 0 365
DanPatterson
MVP Esteemed Contributor

A smart person and his notation has many practical applications today

Read more...

more
2 0 290
HaydenWelch
MVP Regular Contributor

A simpler way to use arcpy.List* functions

Read more...

more
4 4 750
DanPatterson
MVP Esteemed Contributor

lists of lists and nothing is equal in size.  how to flatten things down

Read more...

more
2 6 528
EarlMedina
Esri Regular Contributor

These days, it is quite common for people to use the rasterio, rasterstats, numpy, or geopandas Python packages in their Raster processing/analysis workflows. This post aims to illustrate how some of these packages might be used to perform zonal statistics:

  • rasterio
  • rasterstats
  • geopandas
  • fiona

If you're ever used the Summarize Raster Within or Zonal Statistics tools and you've wondered a bit about what goes on behind the scenes his article will help break down a few of the processes.

Prerequisites:
A number of packages are required for the below script to work - some of these packages are dependent on gdal.

The easiest way to get started is to use Anaconda to create a new environment:

conda create -n raster python=3.7

conda install -n raster gdal

From there, you can use conda/pip to install the remainder of the packages:

  • rasterio
  • rasterstats
  • descartes
  • mapclassify
  • geopandas
  • fiona
  • shapely
  • matplotlib

Overview:
When you run a tool like Zonal Statistics or Summarize Raster Within, you are asked for a few things:

  1. Input Zone Layer
  2. Zone Field
  3. Input Raster Layer to Summarize
  4. Statistic Type

As an example, let's say you're working with this then:

  1. A polygon feature class of parcels
  2. Owner Name field
  3. DEM
  4. Average

In the equivalent workflow presented here, the following occurs: The polygon feature class is loaded into data frame > dissolved by the zone field >  > statistics are calculated using the zones > the result is rasterized.

Begin by importing the required packages:

import fiona, rasterio
import geopandas as gpd
from rasterio.plot import show
import matplotlib.pyplot as plt
import rasterio.plot as rplt
from rasterio.features import rasterize
from rasterstats import zonal_stats‍‍‍‍‍‍‍‍‍‍‍‍‍‍

In order to run the required tools, it helps to view the data - the below help with adding a bit of interactivity:

def enum_items(source):
    print("\n")
    for ele in enumerate(source): 
        print(ele)
 
def list_columns(df):
    field_list = list(df)
    enum_items(field_list)
    return field_list‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

These functions are used to load data into a geopandas dataframe for easy manipulation. Either an .shp or a File Geodatabase Feature class (the input vector layer).

# For loading feature classes into geopandas dataframe
def loadfc_as_gpd(fgdb):
    layers = fiona.listlayers(fgdb)
    enum_items(layers)
    index = int(input("Which index to load? "))
    fcgpd = gpd.read_file(fgdb,layer=layers[index])
    return fcgpd
 
# For loading shapefiles into geopandas dataframe
def loadshp_as_gpd(shp):
    data = gpd.read_file(shp)
    return data‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If you want to filter the data before processing (e.g., maybe you only want to use parcels within a certain county):

# For optional filtering of data
def filter_gpd(fcgpd):
    field_list = list_columns(fcgpd)
    index = int(input("Filter by which field (index)? "))
    field = field_list[index]
    value = input("Enter value: ")
    result = fcgpd[fcgpd[field] == value]
    return result‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Ideally, the input vector layer is in the same projection as the input raster layer. If it's not, this function can be used to re-project the vector layer to the raster's projection. The re-projected vector layer is plotted along with the raster data for visual inspection:

# For re-projecting input vector layer to raster projection
def reproject(fcgpd, raster):
    proj = raster.crs.to_proj4()
    print("Original vector layer projection: ", fcgpd.crs)
    reproj = fcgpd.to_crs(proj)
    print("New vector layer projection (PROJ4): ", reproj.crs)
    fig, ax = plt.subplots(figsize=(15, 15))
    rplt.show(raster, ax=ax)
    reproj.plot(ax=ax, facecolor='none', edgecolor='red')
    fig.show()
    return reproj‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  • Note: the to_proj4 method is introduced in later versions of rasterio (e.g. 1.0.22).

This is used to perform a dissolve on the selected zone field:

# For dissolving geopandas dataframe by selected field
def dissolve_gpd(df):
    field_list = list_columns(df)
    index = int(input("Dissolve by which field (index)? "))
    dgpd = df.dissolve(by=field_list[index])
    return dgpd‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Here desired statistics are selected - I've included a list of common statistics but rasterstats offers additional options. You can actually select multiple statistics to calculate and this may be one reason why someone would choose to incorporate a tool like rasterstats in their workflow (Zonal Statistics as Table will let you get multiple statistics, but not Zonal Statistics/Summarize Raster Within😞

# For selecting which raster statistics to calculate
def stats_select():
    stats_list = ['min', 'max', 'mean', 'count', 
              'sum', 'std', 'median', 'majority', 
              'minority', 'unique', 'range']
    enum_items(stats_list)
    indices = input("Enter raster statistics selections separated by space: ")
    stats  = list(indices.split())
    out_stats = list()
    for i in stats:
        out_stats.append(stats_list[int(i)])
    return out_stats
 
# For calculating zonal statistics
def get_zonal_stats(vector, raster, stats):
    # Run zonal statistics, store result in geopandas dataframe
    result = zonal_stats(vector, raster, stats=stats, geojson_out=True)
    geostats = gpd.GeoDataFrame.from_features(result)
    return geostats‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The result of rasterstats' zonal_stats function is not a raster (e.g. .tif) by default so you must rasterize the result. In the previous function, the result is stored as GeoJSON in a geopandas dataframe and this is what gets rasterized to a .tif.

# For generating raster from zonal statistics result
def stats_to_raster(zdf, raster, stats, out_raster, no_data='y'):
    meta = raster.meta.copy()
    out_shape = raster.shape
    transform = raster.transform
    dtype = raster.dtypes[0]
    field_list = list_columns(stats)
    index = int(input("Rasterize by which field? "))
    zone = zdf[field_list[index]]
    shapes = ((geom,value) for geom, value in zip(zdf.geometry, zone))
    burned = rasterize(shapes=shapes, fill=0, out_shape=out_shape, transform=transform)
    show(burned)
    meta.update(dtype=rasterio.float32, nodata=0)
    # Optional to set nodata values to min of stat
    if no_data == 'y':
        cutoff = min(zone.values)
        print("Setting nodata cutoff to: ", cutoff)
        burned[burned < cutoff] = 0 
    with rasterio.open(out_raster, 'w', **meta) as out:
        out.write_band(1, burned)
    print("Zonal Statistics Raster generated")‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Altogether the script is:

import fiona, rasterio
import geopandas as gpd
from rasterio.plot import show
import matplotlib.pyplot as plt
import rasterio.plot as rplt
from rasterio.features import rasterize
from rasterstats import zonal_stats

def enum_items(source):
    print("\n")
    for ele in enumerate(source): 
        print(ele)

def list_columns(df):
    field_list = list(df)
    enum_items(field_list)
    return field_list

# For loading feature classes into geopandas dataframe
def loadfc_as_gpd(fgdb):
    layers = fiona.listlayers(fgdb)
    enum_items(layers)
    index = int(input("Which index to load? "))
    fcgpd = gpd.read_file(fgdb,layer=layers[index])
    return fcgpd

# For loading shapefiles into geopandas dataframe
def loadshp_as_gpd(shp):
    data = gpd.read_file(shp)
    return data

# For optional filtering of data
def filter_gpd(fcgpd):
    field_list = list_columns(fcgpd)
    index = int(input("Filter by which field (index)? "))
    field = field_list[index]
    value = input("Enter value: ")
    result = fcgpd[fcgpd[field] == value]
    return result


# For re-projecting input vector layer to raster projection (Rasterio v. 1.0.22)
def reproject(fcgpd, raster):
    proj = raster.crs.to_proj4()
    print("Original vector layer projection: ", fcgpd.crs)
    reproj = fcgpd.to_crs(proj)
    print("New vector layer projection (PROJ4): ", reproj.crs)
    fig, ax = plt.subplots(figsize=(15, 15))
    rplt.show(raster, ax=ax)
    reproj.plot(ax=ax, facecolor='none', edgecolor='red')
    fig.show()
    return reproj

# For dissolving geopandas dataframe by selected field
def dissolve_gpd(df):
    field_list = list_columns(df)
    index = int(input("Dissolve by which field (index)? "))
    dgpd = df.dissolve(by=field_list[index])
    return dgpd

# For selecting which raster statistics to calculate
def stats_select():
    stats_list = ['min', 'max', 'mean', 'count', 
              'sum', 'std', 'median', 'majority', 
              'minority', 'unique', 'range']
    enum_items(stats_list)
    indices = input("Enter raster statistics selections separated by space: ")
    stats  = list(indices.split())
    out_stats = list()
    for i in stats:
        out_stats.append(stats_list[int(i)])
    return out_stats

# For calculating zonal statistics
def get_zonal_stats(vector, raster, stats):
    # Run zonal statistics, store result in geopandas dataframe
    result = zonal_stats(vector, raster, stats=stats, geojson_out=True)
    geostats = gpd.GeoDataFrame.from_features(result)
    return geostats

# For generating raster from zonal statistics result
def stats_to_raster(zdf, raster, stats, out_raster, no_data='y'):
    meta = raster.meta.copy()
    out_shape = raster.shape
    transform = raster.transform
    dtype = raster.dtypes[0]
    field_list = list_columns(stats)
    index = int(input("Rasterize by which field? "))
    zone = zdf[field_list[index]]
    shapes = ((geom,value) for geom, value in zip(zdf.geometry, zone))
    burned = rasterize(shapes=shapes, fill=0, out_shape=out_shape, transform=transform)
    show(burned)
    meta.update(dtype=rasterio.float32, nodata=0)
    # Optional to set nodata values to min of stat
    if no_data == 'y':
        cutoff = min(zone.values)
        print("Setting nodata cutoff to: ", cutoff)
        burned[burned < cutoff] = 0 
    with rasterio.open(out_raster, 'w', **meta) as out:
        out.write_band(1, burned)
    print("Zonal Statistics Raster generated")

def main():
    rst = '/path/to.raster.tif'
    raster = rasterio.open(rst)

    fgdb = '/path/to/fgdb.gdb'
    vector = loadfc_as_gpd(fgdb)
    
    p_vector = reproject(vector, raster)
    
    d_vector = dissolve_gpd(p_vector)
    out_rst = '/path/to/out.tif'
    stats_to_get = stats_select()

    zs = get_zonal_stats(d_vector, rst, stats_to_get)
    stats_to_raster(zs, raster, stats_to_get, out_rst)

if __name__ == '__main__':
    main()‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Usage:

As an example, suppose we have the following data where the vector layer is a feature class of of urban areas and the raster layer is a DEM:

Start by setting the paths in the main function of the final script then run either in a terminal or a notebook (I ran it in a notebook). The first prompt will ask you which Feature Class in the File Geodatabase you would like to load. Enter the index:

From review of the data, I know the vector layer needs to be projected so I included the reproject function in main. Here you can see that the data was re-projected to North America Albers Equal Area Conic (the proj4 representation of this projection is shown):

The result is plotted for visual inspection:

Next, a prompt to choose which field to dissolve on:

Here I select the statistics I want to calculate, separating the selections by space:

Finally, the statistic to use for rasterization is selected:

The result is plotted:

This compares well with the result you would get from the Zonal Statistics tool:

more
6 0 21.1K
88 Subscribers