Select to view content in your preferred language

Get Count returning wrong specified selected feature class

673
15
4 weeks ago
GISIntern21
Emerging Contributor

Hello! I am working on a Python project, and I am just about done with it, but I am struggling with formatting my Get Count function. I am writing a script to get the number of sales within a 1/2 mile of feedlots, but to finish it off, I need to return the number of feedlots that are within that half mile of the sales. So if a particular sale point has 1-3 feedlots within a half mile, I need to return that number, but right now, I am getting the number of sales instead (basically the opposite of what I need). I have tried swapping the feature classes in my function, but I only get '1' as a result. Is there a better way to write my code, or am I missing something?

GISIntern21_0-1747747201722.png

 

0 Kudos
15 Replies
AlfredBaldenweck
MVP Regular Contributor

Hard to say without the actual data, and also hard to conceptualize your dictionary, but here's my guess of where to look:

  • line 39: you set your variable y to 1
    • on line 48, you set count to be y
    • So if you're looking specifically at "count" in the final dictionary, it's always going to be 1.
      • One way to fix this is to copy y=1 (actually y=0) to outside that nested cursor, e.g. line 35-ish, and change what is currently line 39 to be y+=1.
      • This would let you update the count as it loops through the sales, and reset it for each feed point.
 for row in arcpy.da.SearchCursor(feedPoint_lyr, ['PIN']):
        select_feedlot = arcpy.SelectLayerByAttribute_management(
                               feedPoint_lyr, "NEW_SELECTION", 
                               "PIN = '{}'".format(row[0]))
        sales = arcpy.SelectLayerByLocation_management(
                      address_lyr, 'WITHIN_A_DISTANCE', 
                      feedPoint_lyr, '2640 FEET', 'NEW_SELECTION')
        y = 0
        for sale in arcpy.da.SearchCursor(sales, ['CountyPIN']):
            if sale[0] in s:
                x += 1
                y += 1
                #sales_check(sale[0], sale_site)
                saleParcel = sale[0]

The other thing I'm really not sure of is what the purpose of is. You iterate it per sale and use it as a key in your sale_site dictionary, but what's the point? Is it doing the same thing as y?

0 Kudos
MErikReedAugusta
MVP Regular Contributor

I haven't had enough spare moments to dig through the full logic of your code or of some of the other options shared here, yet (hopefully this evening).  But at a glance, I'm still seeing both syntax & best practice issues in your cursor calls:

Line 28:

From the screenshot: MErikReedAugusta_0-1747948273334.png

From the code block: MErikReedAugusta_1-1747948293897.png

Your two calls here don't match in the two versions you posted.  The first one isn't valid python, and the second one isn't the recommended format for cursor calls.

with arcpy.da.SearchCursor(feedPoint_lyr, ['SHAPE@', 'PIN']) as cursor:
    for row in cursor:
        select_feedlot = [...] # Do stuff here
            # (Theoretically this would be Lines 29–48
            #   in your code, though that section has many
            #   syntax & logic problems, too, that I'll
            #   dig into later.)
del cursor

This is the recommended syntax for your call in Line 28.  You really should be using context managers (the with statement) in your cursor calls, and the "with row in [...] as cursor" isn't valid Python.

That example will be true of all of your Cursor calls (except maybe the InsertCursor; I'll need to review the broader logic to comment on that one).

 

Nested Cursors

You also still have a nested SearchCursor at Line 34 that's looping inside of the SearchCursor at Line 28.  As mentioned before, that's always at least a Yellow Flag, and sometimes an outright Red Flag in code.  Again a case where I'll need to follow through the broader logic when I find some time later.

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Lazarus Long, in Time Enough for Love, by Robert A. Heinlein
0 Kudos
AlfredBaldenweck
MVP Regular Contributor

I've been ignoring the nested cursor in this case because it refers to just the layer created from selection a line above.

0 Kudos
MErikReedAugusta
MVP Regular Contributor

Okay, so I took some time and went through this whole script, and I think your first problem is that your code is unnecessarily convoluted.  It's not that it's bad, per se, but that it's not doing you any favors troubleshooting and identifying the code.  There are also some logical analysis concerns, here.  So I'm going to go down the rabbit hole and try to address all of it that I caught.

This is going to be long, so I'm also going to color code the headings (and possibly break the post in two, if I hit forum limits).  I'll also put a letter tag for the color, in case you or anyone else is color-blind: 

  • [G] Green Headings mean you really should do this, but technically it'll probably work if you don't.  I'll give my justifications why I think the change is worth it, though. 
  • [R] Red Headings mean you need to do this.  Either it's a syntax error or a serious logical error, or maybe it's just a massively bad practice that goes beyond "should".
  • [P] Purple Headings are sidenotes.  They might not require any action, but they're generally assumptions or thoughts I had while parsing through; if any are wrong, it might actually be a red flag.

Also, I'm going to refer to variable names in bolded blue, since these boards lack a good inline code option for things that don't warrant a full block.

1a. [G] Variable Names (especially your input variables)

Single-letter variable names like the (s, t, u, r) in your function definition really aren't the time saver many coders think they are.  Trust me when I tell you, the 1–2 seconds longer it takes you to type out sales_listing instead of s is irrelevant against the mental load it's going to take you to troubleshoot and especially the mental load it's going to take others to troubleshoot.  And too often forgotten: "Others" also includes yourself, but months in the future, when someone asks you to run this code again and it breaks, or you need to modify something to alter the calculation you're running.  Be nice to future you; write longer variable names, and do at least some rudimentary documentation as you go.  "Schedule time for machine maintenance, or your machine will schedule it for you," as the saying goes.

1b. [P] Try out an IDE other than IDLE

This is 99% a personal preference thing, so you do you.  I personally use PyCharm and quite like it, but everyone's different.  Two benefits relevant to addressing this script in particular, though: Auto-completion will help those longer variable names like sales_listing from 1a take the same amount of time to type as s.  Now you get all the benefits of longer names, without any of the downsides.  Also, most other IDEs will do things like highlight unused variables.  You might've missed this entirely, but parcel_sites and spatial_ref don't actually do anything, other than get created & forgotten in Lines 9 & 11.  They're not realistically hurting anything, but the extra clutter is going to make it harder to troubleshoot problems in your code.  The same is true of select_feedlot in Line 34, though that might be another error we'll get to.

2. [G/R] Source Parcel Data & Overwrites
(Possibly harmless; possibly catastrophic—depending on source data)

First note: The way you're writing these means that PINs present in the Tax Parcel polygons will overwrite any matching PINs in the Address Points.  If your source data is anything like our municipality's, the locations of the Address Points and the locations of the centroids of the Parcels (as represented by SHAPE@X & SHAPE@Y in Line 17) likely don't match.  And if the parcels are big enough or close enough to that half-mile line, you could end up with lots of false positives and/or false negatives, depending on which point is more "correct".

If you can be sure that all PINs in one data set exist in the other dataset, then I'd just import one, and ignore the other.  If there are unique PINs in either, then you'll need to decide which one wins, and make sure that one is second.  But know that's going to increase the error bars in your resulting analysis.

Also, if you can use a single dataset, then Lines 21–27 become irrelevant, saving you and your computer some work!

2b. [P] dict.iteritems() vs dict.items()

In Line 25, you call iteritems(), but that method was removed for Python 3.  If you're in ArcGIS Pro, you need to be writing in Python 3, even if your code is running outside Pro.  If you're in ArcGIS Desktop, then there might be bigger concerns elsewhere, but for now this will presume you're working in Pro/Python 3.  If you can't go down to one dataset (per #2), then you want lot_site.items().

For more information of the difference, there's a pretty good explanation in the Stack Exchange question here.

2c. [G] Unpacking & repacking lists/tuples

v = (x, y)
# These two lines do the same thing, but notice one is MUCH clearer.  Why
# unpack & repack?  The only difference is tuple vs. list.  You could just
# save your original as a list, or cast it with list(v), instead. Much easier
# to debug later.

feed_point = [v[0],v[1]]
feed_point = v

 3. [R] SelectLayerByLocation against selected parcel or parcel coordinates?

In Line 34, you do a SelectLayerByAttribute to get just the parcel you're looking at in this iteration of the loop.  Then, you feed in the geometry token from the loop into SLBL.  The parcel SLBA doesn't do anything.

Worse, this is the outer Cursor of your nested cursors, and it doesn't appear necessary at all.  Lines 13–27 create a Feature Class that you then iterate through in Line 33, but you could just as easily have been iterating through a dictionary in far fewer lines:

lot_site = {}
with arcpy.da.SearchCursor(address_points, ['CountyPIN', 'SHAPE@']) as cur1:
  for pin, geo in cur1:
    lot_site[pin] = geo
with arcpy.da.SearchCursor(tax_parcels, ['PIN', 'SHAPE@']) as cur1:
  for pin, geo in cur1:
    lot_site[pin] = geo

for pin, geo in lot_site.items():
  saleAddresses = arcpy.management.SelectLayerByLocation(address_lyr,
                                                         'WITHIN_A_DISTANCE',
                                                         geo,
                                                         '2640 FEET')
  # do other stuff

As stated above in #2, you might not even need Lines 2–4 of the code above!

4. [G] Inner Cursor vs Definition Query

The other reason you have nested cursors is because you only want to pull the PINs that are in your Sales Listing.  There's a nonzero chance this could be done by applying a definition query to the PIN data source(s) you're reading in.  Examples are beyond the scope of this already-mammoth post, because they depend on how you're feeding the source PIN data in.  But it's yet another case of "If you need to nest cursors, take a second look at your basic structural assumptions."

5. [R] Variable names again, but way more of a mess.

It took me absurdly long to parse through Lines 41–47.  Partly, that was because of the nested cursors, and partly that was because of all the tuple slicing going on.  Instead of calling sale[0] every time, just save it to a variable.  In my test case where I was scribbling notes, I called it innerPIN, but I still maintain that the nested cursors aren't necessary here.  You might also be able to combine some of those lines, depending on the structure of the data it's reading from the Sales & Feedlot listings:

adj, netsale, ratio = s.get(innerPIN)

6. [R] I think you missed a step...

Lastly, the most critical problem appears to be in Lines 38–48.  As pointed out by Alfred: In Line 39, you set y=1.  Every PIN; every time.  Then, you pass that 1 over to count, which you save in your output dictionary.  Every time.

No matter what the rest of this script does, you're always going to get a count of 1, because you manually passed that in.  But I suspect you missed it because of the complications elsewhere, which is why I wrote out all that stuff above.  At a loose glance, you appear to be missing a step, but to be sure, I would need to see an example output dictionary of what you expect to get back.  You can put dummy PINs and such in, if you're worried about confidentiality concerns.  Instead of the usual string of numbers and dashes most municipalities use, just use "dummy-PIN-1" and "dummy-PIN-2" in your example.

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Lazarus Long, in Time Enough for Love, by Robert A. Heinlein
GISIntern21
Emerging Contributor

Hello! My apologies for the delay in a response, I've been making a lot of changes to the script! 
After looking over everything, I made a lot of tweaks and edits.. I did end up getting rid of the Parcel Polygon layer, as we don't actually need it. I did also skim down some of the Search Cursors as well to make everything function and overall look much more simpler and less messy. Everything is running as it should, but I am still not getting the Get Count down for whatever reason.. I tried implementing the 'Y += 1' suggestion, but when I get my output in my CSV, it is returning an overall count of the amount of times a feedlot is recorded "ex: 1, 2, 3, etc" instead of just an overall total. My end goal is to get the amount of feedlots within a half mile of each address point I am getting, so if 1 address point has 3 feedlots within that half mile, I want it to reflect that result in my count and CSV output. Any suggestions for what I should do next, or any changes I could/should make to my updated script? The overall run time I am getting is around 1-1.5 minutes.

def sales_listing(file_location):

    workbook = xlrd.open_workbook(file_location)
    sheet = workbook.sheet_by_name("SalesListing")
  
    sales_dict = dict()

    for row in range(1, sheet.nrows):
        parcelID = str(sheet.cell(row, 2).value).replace('.','')
        sales_dict.update({parcelID :  [int(sheet.cell(row, 8).value), str(sheet.cell(row, 20).value), str(sheet.cell(row, 9).value), float(sheet.cell(row, 21).value)]})

    return sales_dict
    

########### Geo work


def mapping(s, t, u):

    addy_site = dict()
    sale_site = dict()

    parcel_site =[]
    spatial_ref = arcpy.Describe(u).spatialReference


    fc_salesListing = arcpy.CreateFeatureclass_management('in_memory', 'SalesPoint', 'POINT', '', '', '', spatial_ref)
    arcpy.AddField_management(fc_salesListing, 'PIN', 'TEXT')

    for row in arcpy.da.SearchCursor(u, ['CountyPIN', 'SHAPE@X', 'SHAPE@Y']):
        if row[0] in s:
            addy_site.update({row[0] : (row[1], row[2])})
            
    with arcpy.da.InsertCursor(fc_salesListing, ['SHAPE@', 'PIN']) as cur:
        for k, v in addy_site.items():
            addy_point = v
            cur.insertRow([addy_point, k])


    feedPoint_lyr = arcpy.MakeFeatureLayer_management(t, 'feedlot_lyr')
    address_lyr = arcpy.MakeFeatureLayer_management(fc_salesListing, 'Address_lyr')
    x = 0

    for row in arcpy.da.SearchCursor(feedPoint_lyr, ['ParcelID', 'AU_State', 'Permit_Num']):
        select_feedlot = arcpy.SelectLayerByAttribute_management(feedPoint_lyr, "NEW_SELECTION", "parcelID = '{}'".format(row[0]))
        sales = arcpy.SelectLayerByLocation_management(address_lyr, 'WITHIN_A_DISTANCE', feedPoint_lyr, '2640 FEET', 'NEW_SELECTION')
        y = 0
        for sale in arcpy.da.SearchCursor(sales, ['PIN']):
            if sale[0] in s:
                x += 1
                y += 1
                #sales_check(sale[0], sale_site)
                saleParcel = sale[0]
                feedParcel = row[0]
                feedPerm = row[2]
                adj = s.get(sale[0])[0]
                netsale = s.get(sale[0])[1]
                pt = s.get(sale[0])[2]
                au_state = row[1]
                ratio = s.get(sale[0])[3]
                count = y

                sale_site.update({x : [saleParcel, feedParcel, feedPerm, adj, netsale, pt, au_state, ratio, count]})


    arcpy.Delete_management('in_memory')
    return sale_site

 

0 Kudos
HaydenWelch
MVP Regular Contributor

If you want a really compact way to do this, you can build mappings between features really easily using SearchCursors and dictionary comprehensions:

from typing import TypeAlias
from arcpy import LinearUnitConversionFactor
from arcpy.da import SearchCursor

feedPoint_lyr = "feedPoint"
address_lyr = "address"
feature_units = 'Meters' # Replace with your feature units
_MILES = LinearUnitConversionFactor(feature_units, "Miles")
# Half Mile in in feature units
BUFFER = 0.5*_MILES

# Change this to the datatype of your PIN fields
PinType: TypeAlias = int
#PinType: TypeAlias = str

sale_to_lot: dict[PinType, list[PinType]] = {
    # Find all lots within `BUFFER` of each address
    pin: list(SearchCursor(feedPoint_lyr, ['PIN'], spatial_filter=point.buffer(BUFFER)))
    # Iterate all addresses
    for pin, point in SearchCursor(address_lyr, ['CountyPIN', 'SHAPE@'])
}

lot_to_sale: dict[PinType, list[PinType]] = {
    # Find all addresses within `BUFFER` of each feedlot
    pin: list(SearchCursor(address_lyr, ['CountyPIN'], spatial_filter=point.buffer(BUFFER)))
    # Iterate all feed lots
    for pin, point in SearchCursor(feedPoint_lyr, ['PIN', 'SHAPE@'])
}

This will give you two mappings, one mapping each feedlot to the addresses within 0.5 miles and one mapping addresses to lots within 0.5 miles.

You can then use these PIN mappings however you want without having to do a spatial query again.

This solution is more memory efficient, but spools up an ungodly amount of search cursors (len(address)+len(feedlots)). You can avoid this by loading the features into memory first:

from typing import TypeAlias
from arcpy import LinearUnitConversionFactor, PointGeometry
from arcpy.da import SearchCursor

feedPoint_lyr = "feedPoint"
address_lyr = "address"
feature_units = 'Meters' # Replace with your feature units
_MILES = LinearUnitConversionFactor(feature_units, "Miles")
# Half Mile in in feature units
BUFFER = 0.5*_MILES

# Change this to the datatype of your PIN fields
PinType: TypeAlias = int
#PinType: TypeAlias = str

# Load addresses
addresses: dict[PinType, PointGeometry] = {
    pin: shape
    for pin, shape in SearchCursor(address_lyr, ['CountyPIN', 'SHAPE@'])
}

# Load feedlots
feedlots: dict[PinType, PointGeometry] = {
    pin: shape
    for pin, shape in SearchCursor(feedPoint_lyr, ['PIN', 'SHAPE@'])
}


sale_to_lot: dict[PinType, list[PinType]] = {}
for address_pin, address_shape in addresses.items():
    sale_to_lot[address_pin] = [
        lot_pin
        for lot_pin, lot_shape in feedlots.items()
        # Skip lot if it is not within `BUFFER`
        if not lot_shape.disjoint(address_shape.buffer(BUFFER))
    ]

lot_to_sale: dict[PinType, list[PinType]] = {}
for lot_pin, lot_shape in feedlots.items():
    lot_to_sale[lot_pin] = [
        address_pin
        for address_pin, address_shape in addresses.items()
        # Skip address if it is not within `BUFFER`
        if not address_shape.disjoint(lot_shape.buffer(BUFFER))
    ]