Python help needed... select polygons from one shapefile, based on values in another table...

5704
12
Jump to solution
01-07-2015 02:58 PM
John_S_Wood
New Contributor II

I have a table in a geodatabase with the following field names (types):

OBJECTID (Object ID)

SpeciesNum (String)

Sci_Name  (String)

Spp_number (String)

Sum_A1  (Long)

Sum_A2 (Long)

Sum_A3 (Long)

Sum_H2 (Long)

Sum_H3 (Long)

Sum_H4 (Long)

Sum_H5 (Long)

Sum_H6 (Long)

 

 

I also have a shapefile with regions A through G, subdivided into depth classes 1-6…. Each of the 48 polygons is coded A1, A2, A3….H5, H6. Field Name is ‘Code’ (String).

 

I want to select polygons from the shapefile, based on whether the corresponding field in the table has a ‘1’ or a ‘0’ value. So if Sum_A1 and Sum_A2 both contain ‘1’s, the A1 and A2 polygons will be selected. I then want to export the selected polygons as a new shapefile with the name from Sci_Name, and a field containing the SpeciesNum (a string value).

 

Since I have about 13,000 species, each of which should become a separate shapefile, I need to automate this process. 

 

Oh yeah, I’m working in ArcMap 10.2, with Python 2.7. Any help or helpful fragments ware greatly appreciated! I am new to the Forums, so I've probably just broken all sorts of protocols. I apologize in advance!

0 Kudos
1 Solution

Accepted Solutions
John_S_Wood
New Contributor II

Yes... nice to keep skills sharp! I got it to work... here's what I ended up with:

import arcpy

... import os

... #import codecs

... # parameters

... in_table = r'C:\Users\jwood\Documents\ArcGIS\Default.gdb\BGM_Chpt9'

... in_fc = r'C:\Users\jwood\Documents\ArcGIS\Default.gdb\OctDepths'

... out_folder = r'C:\Users\jwood\Documents\IUCN Red List\Shapefiles\Chpt9'

... polygon_code_field = "Code"

...  #9

... # list all fields from the table

... all_fields = [f.name for f in arcpy.ListFields(in_table)]

...  #12

... # loop through all rows in table

... with arcpy.da.SearchCursor(in_table, '*') as sc:

...     for row in sc:

...  #16

...         item = dict(zip(all_fields, row)) # represent row as a dictionary

...   #18

...         species_name = item.get('BINOMIAL', 'Unknown')

...         species_number = item.get('SpeciesNum', 0)

...         iucn_num = item.get('IUCN_number', 'Unknown')

...         origin = item.get('ORIGIN', 'Unknown')

...         taxon_id = item.get('TAXON_ID', 'Unknown')

...         compiler = item.get('COMPILER', 'Unknown')

...         presence = item.get('PRESENCE', 'Unknown')

...         citation = item.get('Citation', 'Unknown')

...         seasonal = item.get('SEASONAL', 'Unknown')

...         dist_comm = item.get('DIST_COMM', 'Unknown')

...         year_ = item.get('YEAR_', 'Unknown')

...         island_ = item.get('Island_', 'Unknown')

...         source_ = item.get('Source_', 'Unknown')

...         subPop = item.get('SUBPOP', 'Unknown')

...         legend= item.get('LEGEND', 'Unknown')

...         subSpec = item.get('SUBSPECIES', 'Unknown')

...         chaptNum = item.get('ChaptNum', 'Unknown')

...         tax_comm = item.get('TAX_COMM', 'Unknown')

...         BGM_url = item.get('URL', 'Unknown')

... #38

...

...         # get list of polygon codes that should be selected

...         polygons_to_select = []

...         for field_name, value in item.iteritems():

...             field_name_upper = field_name.upper() # to make comparison case insensitive

...             if field_name_upper and value == 1:

...                 polygon_code = field_name_upper

...                 polygons_to_select.append(polygon_code)

... #47

...         if polygons_to_select:

...             # create a string like "'A1','B2',..."

...             codes = ",".join(["'" + str(pg) + "'" for pg in polygons_to_select])

...             # build an SQL expression

...             sql = '"%s" IN (%s)' % (polygon_code_field, codes)

... #53

...             #print codes      

...             #print polygon_code_field

... #56

...             # export as a new shapefile with species_name in the file name

...             out_fc = os.path.join(out_folder, species_name)

...             out_fc = arcpy.analysis.Select(in_fc, out_fc, sql).getOutput(0)

... #60

...             # add the column with species number as string

...             arcpy.management.AddField(out_fc, "BINOMIAL", "TEXT")

...             arcpy.CalculateField_management(out_fc, "BINOMIAL", "'" + str(species_name) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "SpeciesNum", "TEXT")

...             arcpy.CalculateField_management(out_fc, "SpeciesNum", "'" + str(species_number) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "ORIGIN", "TEXT")

...             arcpy.management.AddField(out_fc, "TAXON_ID", "TEXT")   #67

...             arcpy.CalculateField_management(out_fc, "TAXON_ID", "'" + str(taxon_id) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "COMPILER", "TEXT")

...             arcpy.CalculateField_management(out_fc, "COMPILER", "'" + str(compiler) + "'", "PYTHON_9.3") #70

...             arcpy.management.AddField(out_fc, "PRESENCE", "TEXT")

...             arcpy.CalculateField_management(out_fc, "PRESENCE", "'" + str(presence) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "Citation", "TEXT")

...             arcpy.CalculateField_management(out_fc, "Citation", "'" + str(citation) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "SEASONAL", "TEXT")

...             arcpy.CalculateField_management(out_fc, "SEASONAL", "'" + str(seasonal) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "DIST_COMM", "TEXT")

...             arcpy.CalculateField_management(out_fc, "DIST_COMM", "'" + str(dist_comm) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "YEAR_", "TEXT")

...             arcpy.CalculateField_management(out_fc, "YEAR_", "'" + str(year_) + "'", "PYTHON_9.3") #80

...             arcpy.management.AddField(out_fc, "Island", "TEXT")

...             arcpy.CalculateField_management(out_fc, "Island", "'" + str(island_) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "SOURCE", "TEXT")

...             arcpy.CalculateField_management(out_fc, "SOURCE", "'" + str(source_) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "SUBPOP", "TEXT")

...             arcpy.CalculateField_management(out_fc, "SUBPOP", "'" + str(subPop) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "LEGEND", "TEXT")

...             arcpy.CalculateField_management(out_fc, "LEGEND", "'" + str(legend) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "SUBSPECIES", "TEXT")

...             arcpy.CalculateField_management(out_fc, "SUBSPECIES", "'" + str(subSpec) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "ChaptNum", "TEXT")

...             arcpy.CalculateField_management(out_fc, "ChaptNum", "'" + str(chaptNum) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "TAX_COMM", "TEXT")

...             arcpy.CalculateField_management(out_fc, "TAX_COMM", "'" + str(tax_comm) + "'", "PYTHON_9.3")

...             arcpy.management.AddField(out_fc, "BGMUrl", "TEXT")

...             arcpy.CalculateField_management(out_fc, "BGMUrl", "'" + str(BGM_url) + "'", "PYTHON_9.3")

...

...             #print island

...             #print source

...             #arcpy.Dissolve_management(out_fc, out_fc_Dissolv, "SpeciesNum", "", "MULTI_PART", "DISSOLVE_LINES")

...           

...         else:

...             # no 'A1,A2' column contains 1 for this species, skip this species

...             pass

View solution in original post

12 Replies
PaulHuffman
Occasional Contributor III

So you don't want to have the polygon's attribute table contain an item for the presence of each species because that would be 13000 items in the attribute table? 

What about a link or relate between the data table and the attribute table so a query on the data table will select the polygons that contain the species of interest?  But maybe you need to deliver a shapefile, not a map project.

0 Kudos
John_S_Wood
New Contributor II

Yes, I need to deliver a separate shapefile for each species (each polygon shows the distribution of the species) . Filip's code (revised) gets close, and I get a shapefile with the appropriate field, but no polygons.

WARNING 000117: Warning empty output generated.

Succeeded at Thu Jan 08 12:58:12 2015 (Elapsed Time: 0.19 seconds)

Executing: AddField "C:\deleteme\Script_Output\Lyngbya sordida.shp" SpeciesNum TEXT # # # # NULLABLE NON_REQUIRED #

Start Time: Thu Jan 08 12:58:14 2015

Adding SpeciesNum to C:\deleteme\Script_Output\Lyngbya sordida.shp...

Succeeded at Thu Jan 08 12:58:14 2015 (Elapsed Time: 0.01 seconds)

So, what I have so far is:

import arcpy

... import os

...

... # parameters

... in_table = r'C:\Users\jwood\Documents\ArcGIS\Default.gdb\BGM_dummy'

... in_fc = r'C:\Users\jwood\Documents\ArcGIS\Default.gdb\OctDepths'

... out_folder = r'C:\deleteme\Script_Output'

... polygon_code_field = "Code"

...

... # list all fields from the table

... all_fields = [f.name for f in arcpy.ListFields(in_table)]

...

... # loop through all rows in table

... with arcpy.da.SearchCursor(in_table, '*') as sc:

...     for row in sc:

...

...         item = dict(zip(all_fields, row)) # represent row as a dictionary

...

...         species_number = item.get('SpeciesNum', 0)

...         species_name = item.get('Sci_Name', 'Unknown')

...

...         # get list of polygon codes that should be selected

...         polygons_to_select = []

...         for field_name, value in item.iteritems():

...             field_name_lower = field_name.lower() # to make comparison case insensitive

...             if field_name_lower.startswith('sum_') and value == 1:

...                 polygon_code = field_name_lower.replace('sum_', '')

...                 polygons_to_select.append(polygon_code)

...

...         if polygons_to_select:

...             # create a string like "'A1','B2',..."

...             codes = ",".join(["'" + str(pg) + "'" for pg in polygons_to_select])

...             # build an SQL expression

...             sql = '"%s" IN (%s)' % (polygon_code_field, codes)

...

...             #print codes           

...                 # export as a new shapefile with species_name in the file name

...             out_fc = os.path.join(out_folder, species_name)

...             out_fc = arcpy.analysis.Select(in_fc, out_fc, sql).getOutput(0)

...

...             # add the column with species number as string

...             arcpy.management.AddField(out_fc, "SpeciesNum", "TEXT")

...             #arcpy.management.CalculateField(out_fc, "SpeciesNum", "'" + str(species_number) + "'")

...             # It seems to me it would be better to use species number in file name

...             # and to add column with species name though.

...

...         else:

...             # no Sum_* column contains 1 for this species, skip this species

...             pass

...             # or you may want to create an empty shapefile

I suspect the problem is in the select sql statement....

0 Kudos
DarrenWiens2
MVP Honored Contributor

What does 'print sql' look like after you've built it?

Afterthought: try escape-quoting the field name to retain the quotes (see the where clause explanation here for more options): sql = ' \"%s\" IN (%s)' % (polygon_code_field, codes)

0 Kudos
DarrenWiens2
MVP Honored Contributor

Your basic workflow will be:

- create a da.SearchCursor() to loop through each row. In the call to create the cursor, you will list all of your fields.

- Loop through each field inside the cursor. If the value equals 1, add it to an SQL string that would fit into Select By Attributes, like: "POLYGON_ID" in ('A1','A2','C4', etc.)

- Once you've looped through all fields, run Select tool using the constructed where clause

FilipKrál
Occasional Contributor III

Hi John,

It seems to me that whatever you are trying to achieve with this, there might be an easier way. If you give it one more round of thought and still want to get on with what you described, perhaps the code below can help.

I didn't test it because I had no sample data so it may contain some typos or glitches. If you understand the code though, you should be able get the job done.

Hope this helps.

Filip.

# Given that species number is unique for each row in the table:

import arcpy

# parameters
in_table = r'c:\full\path\to\table'
in_fc = r'c:\full\path\to\polygon_fc'
out_folder = r'c:\full\path\to\output\folder'
polygon_code_field = "Code"

# list all fields from the table
all_fields = [f.name for f in arcpy.ListFields(in_table)]

# loop through all rows in table
with arcpy.da.SearchCursor(in_table, '*') as sc:
    for row in sc:

        item = dict(zip(all_fields, row)) # represent row as a dictionary

        species_number = item.get('SpeciesNum', 0)
        species_name = item.get('Sci_Name', 'Uknown')

        # get list of polygon codes that should be selected
        polygons_to_select = []
        for field_name, value in item.iteritems():
            field_name_lower = field_name.lower() # to make comparison case insensitive
            if field_name_lower.startswith('sum_') and value == 1:
                polygon_code = field_name_lower.replace('sum_', '')
                polygons_to_select.append(polygon_code)

        if polygons_to_select:
            # create a string like "'A1','B2',..."
            codes = ",".join(["'" + str(pg) + "'" for pg in polygons_to_select])
            # build an SQL expression
            sql = '"%s" IN (%s)' % (polygon_code_field, codes)

            # export as a new shapefile with species_name in the file name
            out_fc = os.path.join(out_folder, "spp_" + species_name)
            out_fc = arcpy.analysis.Select(in_fc, out_fc, sql).getOutput(0)

            # add the column with species number as string
            arcpy.management.AddField(out_fc, "SpeciesNum", "TEXT")
            arcpy.management.CalculateField(out_fc, "SpeciesNum", "'" + str(species_number) + "'")
            # It seems to me it would be better to use species number in file name
            # and to add column with species name though.

        else:
            # no Sum_* column contains 1 for this species, skip this species
            pass
            # or you may want to create an empty shapefile
John_S_Wood
New Contributor II

Thanks!!! I can hardly wait to try it!!!

Sent from my Verizon Wireless 4G LTE smartphone

0 Kudos
John_S_Wood
New Contributor II

Close... it goes as far as creating the shapefile, but it has no polygons... I'll play with it. Thanks for the start!

0 Kudos
John_S_Wood
New Contributor II

I got it to work... many thanks to you. The script that worked finally is below. Mind if I ask what editor you are using? It sure looks nice with the line numbers and shading.

0 Kudos
FilipKrál
Occasional Contributor III

Hi, it does indeed suggest that the problem may be in the sql statement. Maybe you can print(sql) before arcpy.analysis.Select to see if it is correct.

Try to paste some of the print outs here so we can try to spot some problems.

Filip.

0 Kudos