Hi,
I have 5 million points and I would like to generate buffer and find the points that falls in each buffer region and perform some calculation.
I have tried with Spatial join and getting out of memory exception and finally I am trying with Multiprocessing.
I would like to know, Is there any other way we can do this more efficiently in python or any other method.
Solved! Go to Solution.
The tool 'Generate Near Table' (BTW: only available at the ArcInfo license level) would also work for something like this... Basically it builds a distance-based one-to-many association table. A warning that the resulting output table is often many orders of magnitude larger than the number of input features - the explanation being that one house can be within 100 miles of many different restaurants. Be sure to apply a search radius! As a experiment, you might 1st apply a very small search radius (< 1 mile) so as to see the format of the output data you will be dealing with.... before you accidently end up with an output table that is 300 million records.
If you are running out of RAM, consider installing the 64-bit background geoprocessor: Background Geoprocessing (64-bit)—Help | ArcGIS for Desktop
Not sure exactly what you are doing here, but I think initially if I were doing this in a script, I would just loop through the restaurants, and select the houses, gather their stats, and write out some result. Something like:
foodPnts = r"C:\temp\test.gdb\restaurants" housePnts = r"C:\temp\test.gdb\houses" arcpy.MakeFeatureLayer_management(housePnts, "fl") searchRows = arcpy.da.SearchRows(foodPnts, ["SHAPE@", "RESTAURANT_ID"]) for searchRow in searchRows: shapeObj, restaurantId = searchRow arcpy.SelectLayerByLocation_management("fl", "INTESECT", shapeObj, "100 MILES") recordCount = int(arcpy.GetCount_management("fl").getOutput(0)) print("There are " + str(recordCount) + " houses within 100 miles of restaurant #" + str(restaurantId)) #using the selected features/records in "fl", you now have a hook so as to get their names, incomes, etc. #blah, blah... del searchRow, searchRows
Hi Dan, Chris,
The scenario which I explained in my previous post was example scenario.
I am using following code to find the customer within 200-mile radius from our distribution center.
It took almost 2 hr to analyse 250 distribution center points against 0.4 Millon customer locations. Is there anyway we can improve the performance
import arcpy DC_Points = r"C:\temp\TestGDB1.gdb\DistributionSites" CL_Points = r"C:\temp\TestGDB1.gdb\CustomerLoaction" fcSumPoint=r"C:\temp\TestGDB1.gdb\fcSumPoint"searchRows = arcpy.da.SearchCursor(DC_Points, ["SHAPE@", "site_id","cov1val"]) print("searchcursor") fields=("SHAPE@","dc_Id","cl_count","acSum") inCursor = arcpy.da.InsertCursor(fcSumPoint, fields) print("insert cursor") for searchRow in searchRows: shapeObj,dc_id,AcSUm = searchRow buffSelectionLyr=arcpy.SelectLayerByLocation_management("cl_points", "INTERSECT", shapeObj, "20 MILES","NEW_SELECTION") recordCount = int(arcpy.GetCount_management("cl_points").getOutput(0)) sumValues = 0 sumValues += AcSUm print(siteid,recordCount,sumValues) infiled=[shapeObj,siteid,recordCount,sumValues] inCursor.insertRow(infiled) print("There are " + str(recordCount) + " points in #" + str(siteid)) #using the selected features/records in "flyou now have a hook so as to get their names, incomes, etc. del searchRow, searchRows, inCursor
- arcpy.MakeFeatureLayer_management(CL_Points, "cl_points")
not bad...unless you have to do it daily... If you have lots of memory (ie 16-32 Gig) and ArcGIS Pro, why don't you benchmark it in that since, it should run without alteration in that environment. It would be an interesting comparison. You are using the Python 3.4.x version of the print statement already, which is good, and I can't see anything else that would cause fault in the next python as well
PS
print("some stuff {} plus more stuff {}".format(first_stuff,second_stuff))
takes care of type conversion the { } brackets are sequential. See Python's mini-formatting language for more details or one of my blogs on formatting.
I have to run the analysis on daily and it is taking more time (for 95000 points it took almost 9 hrs). I will try with ArcGIS Pro.
one more question, Is there any other way to we can improve the performance
As indicated previously, you will have to narrow your search from all the locations to those just near the point-buffer you are looking at. You won't be able to do all for all at once within an arcmap environment
I am executing the script as a batch process. I have tried to minimise the candidates, but still it is huge data.
Then you will need to prune your data set based on some criteria. We still don't know exactly what you are doing and why you need every site within your buffer.
Each candidate in the buffer zone has its own statistical value. So, I have to find all candidates in given buffer, get each candidate value and perform additional statistical calculation like sum. It is points within polygon analysis. the only problem which I am facing is performance and out of memory issue.
Then you need to delimit the candidates whose values change.
So if your buffer size and locations remain the same, the problem can be greatly reduced to a smaller set by only querying for the 'changelings' after your benchmark conditions are determined.
Yes, there is a way to improve performance... Like Dan mentioned, don't use ArcGIS (directly at least).
Here is another approach leveraging Python dictionaries. Note that this code only works for planar coordinates (UTM, State Plane, etc.) so if your data is in Geographic coordinates you need to use another distance function! BTW: This code assumes all fields are fully populated with valid values - if not you'll have to handle those exceptions.
Numpy has some nice ways to do all this stuff too - I usually prefer to roll my own though .
import math def getDist(x1, y1, x2, y2): return math.sqrt((x1-x2)**2 + (y1-y2)**2) sitesDict = {r[0],r[1]:r[2:] for r in arcpy.da.SearchCursor(sitesFC, ["SHAPE@X","SHAPE@Y","RESTAURANT_ID"])} customersDict = {r[0],r[1]:r[2:] for r in arcpy.da.SearchCursor(custFC, ["SHAPE@X","SHAPE@Y","INCOME"])} for x1, y1 in sitesDict: incomeList = [] for x2, y2, in customersDict: if getDist(x1, y1, x2, y2) <= 1000: #or whatever map unit theashold... income = customersDict[(x2, y2)][0]) if income > 0: incomeList.append(customersDict[(x2, y2)][0]) #blah incomeListSum = sum(incomeList) incomeListLen = len(incomeList) print ("Restaurant #" + str(sitesDict[(x1,y1)][0] + " has " + str(incomeListLen) + " customers - average income is " = str(incomeListSum / float(incomeListLen)))