Select to view content in your preferred language

How to make this extraction process faster

754
6
03-08-2012 02:07 PM
zhengniu1
Deactivated User
I have a point shapefile wihch has 527k data points.
1st step: I am trying to create a 4 meters buffer at each points to find its neighbours and corresponding information (such as X, Y, direction) within the buffer.
2nd step: To identify the points which fall in the overlapping area of two or more buffer, and assign them to their real buffer by comparing distances to each buffer's center
3rd step:  Using all points (including the focal point) within the buffer to calculate average position (avg_X and avg_Y) and avg_Direction.
4th step: change the buffer point's x, y and direction to the avg_X, avg_Y, and avg_Direction.

Following code is for the 1st step


import arcpy, math
arcpy.env.workspace = r'C:\TempArcGISCalculate\TempFile\TestforReadingFile'
arcpy.env.overwriteOutput=True
ptfc = 'Section01_SplitedTrips.shp'
arcpy.MakeFeatureLayer_management(ptfc,'section01pts')
arcpy.Buffer_analysis(ptfc,'in_memory/Section01Buffer','4 meters')

arcpy.MakeFeatureLayer_management('in_memory/Section01Buffer','S1buffer')
#Cut out points inside each buffer
rows = arcpy.SearchCursor('in_memory/Section01Buffer')
row = rows.next()
count = 1

try:
    while row:
        selPoly =arcpy.SelectLayerByAttribute_management('S1buffer',"NEW_SELECTION","\"FID\"="+str(row.FID))

        selPts = arcpy.SelectLayerByLocation_management('section01pts',"WITHIN",selPoly, 0, "NEW_SELECTION")

        curs = arcpy.SearchCursor(selPts)
        for cur in curs:
            print count, cur.getValue('LON_X')," ", cur.getValue('LAT_Y')," ", cur.getValue('InitialBea')," ", cur.getValue('Trace')," ", row.getValue('LON_X'), " ", row.getValue('LAT_Y'), " ", row.getValue('Trace')

        del cur,curs
        row =rows.next()
        count = count + 1

   del row, rows    
except Exception, e:
    #If an error occurred, print line number and error message
    import traceback, sys
    tb = sys.exc_info()[2]
    print "line %i" % tb.tb_lineno
    print e.message        


    

Q1: Is it possible to make this process faster? it is ok for dataset which has row less than 1000, but takes longer as more date involved
Q2: When I add " print arcpy.AddMessage("Elapsed time: " + str(time.clock() - beginTime))" before "del row, rows", It always tells below. This command used to be ok for my previous script.
Traceback (most recent call last):
  File "C:\Python26\ArcGIS10.0\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 312, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\TempArcGISCalculate\PythonScript\SelectbyLocation.py", line 31, in <module>
    arcpy.AddMessage("Elapsed time: " + str(time.clock() - beginTime))
NameError: name 'time' is not defined
Tags (2)
0 Kudos
6 Replies
ChrisSnyder
Honored Contributor
Anything is possible...

Can you give a narative description of what this code is doing? Looks like it is attempting to define some sort of 'spatial association' - like identifying the points within the buffer of a polygon.

Maybe a conventional overlay tool would better serve this purpose... For example, Intersect (Intersect tool) your polygons buffers with your points and cursor through that output.
0 Kudos
ChrisSnyder
Honored Contributor
And you need to import the time module to get time.clock() to work

import time
0 Kudos
zhengniu1
Deactivated User
Anything is possible...

Can you give a narative description of what this code is doing? Looks like it is attempting to define some sort of 'spatial association' - like identifying the points within the buffer of a polygon.

Maybe a conventional overlay tool would better serve this purpose... For example, Intersect (Intersect tool) your polygons buffers with your points and cursor through that output.



Sorry for making you confuse. I restate my problem again.
0 Kudos
ChrisSnyder
Honored Contributor
Thanks for the narative.

So I think you can:

1. Buffer your points
2. Run an Identity (or Intersect) tool on your points and buffer polygon - This will "tag" each point with what point buffer it is within - be carefull because you will end up with duplicate points where the buffer polygons overlap!!!
3. In a cursor (maybe helpfull to use a Python dictionary to sort out some of the realtionships) - use the Intersect/Identity tabular information (x/y/point_id/buffer_id) to analyze the realtionships.

Run a small sample area of your point and point buffers through the Identity or Intersect tools to see what can be done with the output table info.

A recursive SelectByLocation will be extreemly slow compared with the convetional Intersect or Identity tools.
0 Kudos
zhengniu1
Deactivated User
Thanks for the narative. 

So I think you can: 

1. Buffer your points 
2. Run an Identity (or Intersect) tool on your points and buffer polygon - This will "tag" each point with what point buffer it is within - be carefull because you will end up with duplicate points where the buffer polygons overlap!!!  
3. In a cursor (maybe helpfull to use a Python dictionary to sort out some of the realtionships) - use the Intersect/Identity tabular information (x/y/point_id/buffer_id) to analyze the realtionships. 

Run a small sample area of your point and point buffers through the Identity or Intersect tools to see what can be done with the output table info. 

A recursive SelectByLocation will be extreemly slow compared with the convetional Intersect or Identity tools.


Here is my code using Intersect_analysis instead of SelectLayByLocation
import arcpy, time
arcpy.env.workspace = r'C:\TempArcGISCalculate\TempFile\TestforReadingFile'
arcpy.env.overwriteOutput=True
ptfc = 'Section01_SplitedTrips.shp'
t0=time.clock() 
arcpy.Buffer_analysis(ptfc,'in_memory/Section01Buffer','4 meters')


arcpy.MakeFeatureLayer_management('in_memory/Section01Buffer','S1buffer')

ptCurs=arcpy.SearchCursor('S1buffer')
ptcur = ptCurs.next()
count = 1
while ptcur:
    selPoly=arcpy.SelectLayerByAttribute_management('S1buffer',"NEW_SELECTION","\"FID\"="+str(ptcur.FID))
    themeList=['S1buffer',ptfc]
    arcpy.Intersect_analysis(themeList,'in_memory/points_within_buffers')
    bufPts = 'in_memory/points_within_buffers'
    PtCount = arcpy.GetCount_management(bufPts)
    curs = arcpy.SearchCursor(bufPts)
    print PtCount
    for cur in curs:
        print count, cur.getValue('LON_X')," ", cur.getValue('LAT_Y')

    del cur,curs
    ptcur = ptCurs.next()
    count = count + 1
    if count>10:
        break

del ptcur, ptCurs
print "Elapsed time: " , time.clock() - t0


I use 10 sample data points for both script. the results are same. However, the Intersect takes 39.5106703156 secs; SelectLayByLocation takes 9.16266229449 sec

Could you please have a look at the Intersect one, please?
0 Kudos
ChrisSnyder
Honored Contributor
No need to loop - just this:

import arcpy, time
arcpy.env.workspace = r'C:\TempArcGISCalculate\TempFile\TestforReadingFile'
arcpy.env.overwriteOutput=True
ptfc = 'Section01_SplitedTrips.shp'
t0=time.clock()
bufferFC = 'in_memory/Section01Buffer'
arcpy.Buffer_analysis(ptfc, bufferFC,'4 meters')
identityFC = r"in_memory\identity"
arcpy.Identity_analysis(ptfc, bufferFC, identityFC, "ALL", "", "")


Then I think all the info you should need is it the attribute table of  identityFC
Maybe add/populate an X andY field. Beware that you will now have poiuts that will overlap n times in areas that have n overlapping buffer polygons.
Then use a searchcursor and a python dictionary. You can key the dictionary by (x,y) coordinate pairs. Might have to build a few different dictionaries to keep track of everything.
0 Kudos