Loop through features and create euclidean distance raster per feature

KarenHornigold · ‎08-07-2013

Hi,

I am completely new to Python, but need to turn to it due to the nature of my research.

I have a point shapefile with 45,000 points ('visits') spread across the UK and I want to obtain a euclidean distance raster (with a cut off of say 100km).

I know the tool I need to use to create the distance raster is Euclidean Distance, but I need to iteratively select each point (by 'visitID') from the shapefile and create a distance raster for each one in turn.

Can anyone direct me to useful resources or share their scripting wisdom?

Thank you in advance!

Anonymous User · ‎08-08-2013

Well that is interesting....I don't know why you are getting that error. That means it does not know the format of the output raster. I deliberately did not add a file extension to the name of the raster so it would be the Esri GRID format. I usually just use GRID or tif rasters.

Maybe try and save at as a .tif?

eucDist.save('dist_%s.tif' %i)

Curtis Price is the raster master...perhaps he can weigh in on this?? He may even have a better methodology for this whole process.

KarenHornigold · ‎08-08-2013

Now it creates an output but the raster values are strange (they are not the same as when I run the script with hard-coded paths) and it also doesn't save to the location I am specifying, so I'm still doing something wrong! I may give up on creating a script tool with this.

More important than this is the processing time issue. Is there a way I could batch process subsets of my points file and run them simultanously outside of ArcGIS with a Python programme?

KarenHornigold · ‎08-08-2013

One last idea,

how should I change this line

query = '{0} = {1}'.format(arcpy.AddFieldDelimiters(points,'VISITID'),i)

to change VISITID to the field selected via the tool dialogue box?

Anonymous User · ‎08-08-2013

More important than this is the processing time issue. Is there a way I could batch process subsets of my points file and run them simultanously outside of ArcGIS with a Python programme?

I'm not sure if this will be any faster, but since you have ArcGIS 10.1 you can download Service Pack 1 (if you have not done so already) and then download the 64" rel="nofollow" target="_blank">http://resources.arcgis.com/en/help/main/10.1/index.html#/Backgro... bit background processing. This will allow you to utilize the 64 bit python as a stand alone script.

The 64 bit processing does not necessarily increase speed, but it does take advantage of more processing power.

As for why your parameters are getting switched around, maybe you did not set up the script tool correctly? Based on your post reporting the errors, it looks like it the params were set up correctly but I am not sure as I cannot see your actual tool. The script looks fine to me though (after the cursor change).

I would recommend downloading the 64 bit background processing and run it as a stand alone script. ( The 64 bit BP can be found in your customer support page under downloads) This will take a very long time to run either way.

One last idea,

how should I change this line

Code:

query = '{0} = {1}'.format(arcpy.AddFieldDelimiters(points,'VISITID'),i)

to change VISITID to the field selected via the tool dialogue box?

You had it correct in your last post to be interpreted from the tool dialog (get param as text). So if you are using the "field" variable, just leave it as you had it:

query = '{0} = {1}'.format(arcpy.AddFieldDelimiters(points,field),i)

as a stand alone script you can just use 'VISITID' or make a field variable for it. Your choice.

By the way...You mentioned you wanted to split the data set up into smaller chunks? You could use this script to split it into groups of 1000 (or however many) and run the first code on one or 2 sets of data a day. This shows how you can split your points into 1000's:

import arcpy, itertools

arcpy.env.workspace = r'C:\TEMP\Points_divided'
arcpy.env.overwriteOutput = True
points = r'C:\TEMP\Address_points.shp'

oid = arcpy.AddFieldDelimiters(points, arcpy.Describe(points).oidFieldName)
ids = list(r[0] for r in arcpy.da.SearchCursor(points,['OID@']))

sql_dict = {}
num = 1000  #splits by every 1000
mini = 0
for key, group in itertools.groupby(ids, lambda k: k//num):  
    maxi = int(max(group))
    sql_dict[key] = [mini, maxi]
    mini += num


# Make feature layer
lyr = arcpy.MakeFeatureLayer_management(points, 'points_lyr')
for v in sql_dict.values():
    query = '{0} >= {1} AND {2} <= {3}'.format(oid, v[0], oid, v[1])
    print query
    arcpy.SelectLayerByAttribute_management(lyr, 'NEW_SELECTION', query)
    arcpy.CopyFeatures_management(lyr, 'points_%s.shp' %int(v[1] +1))
    print 'Created points_%s.shp' %int(v[1] + 1)
    
print 'done'

Anonymous User · ‎08-08-2013

I was rereading this post, and I saw that after you ran the first code I posted on 10 points, you said it took ten minutes...I thought that was extremely slow for how small each raster is. So I did a quick test on 36 points and it took way too long as well...Then I saw the size of each raster (600 MB). So from there I knew something was definitely wrong since these rasters are "supposed" to be small. I figured out that each raster is actually very large due to each raster filling in "NODATA" values for the entire extent of the points feature class.....so in your case, the extent is probably massive so there are a bunch of extra "NODATA" cells being created.

So a workaround:

I added a step to the original code to create a buffer (120 m) around each selected point...I then grab the extent of that buffer and set the "extent" environment to the extent of that buffer so now instead of being extremely large rasters they are now 24x24 and only 2.25 KB. This works much faster now. It will still take a very long time but it now does all 36 of my points in ~3 as opposed to ~ 12 minutes. The best performance gain here is it will take up a lot less space on disk...So try this on say 100 points and see how long it takes. If it it does not take too long, I would say try it on your entire data set and *hopefully* it will finish in a couple days.

import arcpy
from datetime import datetime as d
startTime = d.now()
arcpy.CheckOutExtension('Spatial')
from arcpy.sa import *

# set output workspace
arcpy.env.workspace = r'C:\TEMP\Output_Rasters'
arcpy.env.overwriteOutput = True
arcpy.env.nodata = 'NONE'
arcpy.env.pyramid = 'PYRAMIDS'

# your points
points = r'C:\TEMP\Stream_points.shp'

# clean up
for raster in arcpy.ListRasters('dist_*'):
    arcpy.Delete_management(raster)

# oid field and list of id's 
field = 'VISITID'
ids = list(r[0] for r in arcpy.da.SearchCursor(points,[field]))

# loop through table to make a distance raster for each point
# EucDIstance(points, max_distance, cell_size)  <-- parameters
lyr = arcpy.MakeFeatureLayer_management(points, 'points_lyr')
for i in ids:
    query = '{0} = {1}'.format(arcpy.AddFieldDelimiters(points,field),i)
    arcpy.SelectLayerByAttribute_management(lyr, 'NEW_SELECTION', query)
    temp_buff = 'buff_%s.shp' %i
    arcpy.Buffer_analysis(lyr, temp_buff, '120 METERS')
    extent = arcpy.Describe(temp_buff).extent
    arcpy.env.extent = extent
    eucDist = EucDistance(lyr, 100, 10)
    eucDist.save('dist_%s' %i)
    arcpy.Delete_management(temp_buff)
    print 'Created Euclidean Distance raster for point ID: %s' %i
print 'Done'
print '(Elapsed time: %s)' %(str(d.now() - startTime)[:-3])

You can leave out the "clean up" part if you want. This is still pretty slow, but should be at least 30x faster than before.

I did just spot some interesting behavior...About 1/4 of the rasters are still coming out with the large extent (of the whole point fc)....There could be something going on behind the scenes here or a bug...They should ALL come out the extent of the buffer...I think I will contact tech support and see if they can figure out the issue here. In the meantime, you could try the above code on a decent amount of points (i.e. 100) and see how long it takes. I added a statement to test the time.

KarenHornigold · ‎08-09-2013

Wow, this is amazing thank you! You're right, I have unnecessary no data cells for my whole extent. Thanks for your time and effort, I don't know what I would have done without your help!

Interestingly, yesterday I reran the original script on data from one county (560 points) and it was racing through the script- it took only 30 minutes! The things I changed were - 1) output raster format to .img and 2) scratch workspace from default geodatabase(! - don't know why that was the default) to c:\temp. It was really quick even though the extent was still for the whole country.

Nevertheless I would still like to use your buffer version to reduce file size, but I got the following error:

Runtime error 
Traceback (most recent call last):
  File "<string>", line 32, in <module>
AttributeError: DescribeData: Method extent does not exist

Thanks again

Anonymous User · ‎08-09-2013

Karen,

Scratch that, I have found a much improved and lightning fast way to do this (well lightning fast compared to the other way....)

Ok, this was driving me crazy so I spent several hours trying to figure out the most efficient way to do this...I think I finally found something that can do this as quickly as possible.

So, I tried to resolve this issue by creating a buffer around each point that is a little larger than your desired 100 unit neighborhood (I used 120). I then set the output extent environment to that of the buffer...This sped it up a little bit, but the euclidean distance still took a long time to run (since each raster was only 24x24 rows by columns). This got me thinking I could try making ascii files for each one. So I exported 5 or 6 of those rasters to ascii to look at each one...I found that every single ascii file was exactly the same except for the 3rd and 4th line (xllcorner, yllcorner). These coordinates are the x,y coordinates for the lower left corner for each raster. This matched the lower left corner for the extent of each buffer. This was good news...

So knowing that every ascii text file was exactly the same except for each point except for those 2 lines (because they all have the same distance neighborhood), I knew that I could take one of the ascii files and the accompanying prj file and just rename them and change the lower left corner coordinates.

So the script no just creates a buffer for each point and grabs the extent for that buffer. It then fixes the xllcorner/yllcorner coordinates (i.e. just moves a copy of that ascii grid to that lower left corner) and makes an ascii raster for each one....This part is very fast...The slow part will just be converting each ascii file to raster...but this is still about 50x faster than using the euclidean distance tool.

So how it works....It creates one euclidean distance raster for the first point as a template, then converts that raster to ascii and uses that ascii file to build all the others by just adjusting the lower left corner. Anyways it seems to work great...

I did 120 points in ~1 minute, and 1000 points in little over an hour...So this may work for all 45k of your points in a ~50 hours (seems to be a snowball effect on memory, still looking into that). If you have an i7 processor and 16GB of RAM you may have better luck than me; I have i5, 8GB. I'd say try it on 1000 points first and see how that goes. I don't know if you saw, but in post # 14 I added a script to divide the points by every 1000 (or however many you want). It may be better to run this in smaller chunks anyways to not kill your memory.

Here is the results window from running it as a script tool (with a lot of print statements removed of course)...The print statements can be taken out, they will only slow it down for a lot of points. I just left those in there from when I was testing.

Executing: DistanceFromEachPoint C:\TEMP\Points_divided\points_1000.shp C:\TEMP\Points_divided\rasters 100 FEET 10
Start Time: Fri Aug 09 09:36:16 2013
Running script DistanceFromEachPoint...
count: 1000 points...
buff_dist: 120.0
created euclidean distance template
Created ascii templates...
processed point ID: 0
processed point ID: 1
processed point ID: 2
processed point ID: 3
....
....
processed point ID: 998
processed point ID: 999
converting ascii files to raster...this may take a while
Complete (Elapsed time: 1:05:26.909)
Completed script DistanceFromEachPoint...
Succeeded at Fri Aug 09 10:41:43 2013 (Elapsed Time: 1 hours 5 minutes 26 seconds)

Here is the script, if you run by stand alone just change the parts highlighted in red...One note, I have this set up to work on the 'OID' field. You can change this to your 'VISITID' field. I just used OID as a default. Otherwise I have also attached a [bold] version 10.1 Script Tool[/bold]

'''
Euclidean distance for individual points-  This tool was written for cases where
  the user wants Euclidean distance rasters for each point in a feature class. This
  is useful when the neighborhoods will overlap.

Written by:  Caleb Mackey
Date:  8/8/2013
'''

import arcpy, os, shutil, sys
arcpy.CheckOutExtension('Spatial')
from arcpy.sa import *
from datetime import datetime as d
startTime = d.now()

try:
    odir = os.path.dirname(__file__)
except:
    odir = os.getcwd()

def Message(msg):
    print str(msg)
    arcpy.AddMessage(str(msg))
    
def bufferPointExtent(point_lyr, field, value, distance, units):
    query = '''{0} = {1}'''.format(field, value)
    arcpy.SelectLayerByAttribute_management(point_lyr, 'NEW_SELECTION', query)
    temp_buff = r'in_memory\buff'
    arcpy.Buffer_analysis(point_lyr, temp_buff, '{0} {1}'.format(distance, units))
    ext = arcpy.Describe(temp_buff).extent
    arcpy.Delete_management(temp_buff)
    return ext
    

def DistanceFromEachPoint(points, outws, distance, units, cellsize, *args):
    '''
                Params
    points:     points feature class
    outws:      output workspace for euclidean distance rasters
    distance:   distance for euclidean distance tool (units are same as points) i.e. 100
    units:      linear units (used for buffer) i.e. Feet or Meters

    '''
    # set output workspace
    arcpy.env.workspace = outws 
    arcpy.env.overwriteOutput = True
    arcpy.env.nodata = 'NONE'
    arcpy.env.pyramid = 'PYRAMIDS'
    SR = arcpy.Describe(points).spatialReference
    arcpy.env.outputCoordinateSystem = SR
    ws = os.path.join(odir,'ascii')
    if not os.path.exists(ws):
        os.makedirs(ws)

    # oid field and list of id's 
    oid = arcpy.AddFieldDelimiters(points,arcpy.Describe(points).OIDFieldName)
    ids = list(r[0] for r in arcpy.da.SearchCursor(points, ['OID@']))

    # Make feature and count points
    lyr = arcpy.MakeFeatureLayer_management(points, 'points_lyr')
    count = int(arcpy.arcpy.GetCount_management(lyr).getOutput(0))
    Message('count: %s points...' %count)

    # generate template ascii file/prj file
    buff_dist = float(distance) + (float(cellsize)*2)
    buff_ext = bufferPointExtent(lyr, oid, min(ids), buff_dist, units)
    arcpy.env.extent = buff_ext
    query = '''{0} = {1}'''.format(oid, min(ids))
    arcpy.SelectLayerByAttribute_management(lyr, 'NEW_SELECTION', query)
    sample = EucDistance(lyr, distance, cellsize)
    eucDist = os.path.join(odir, 'eucDist_tmplt')
    if arcpy.Exists(eucDist): arcpy.Delete_management(eucDist)
    sample.save(eucDist)
    Message('created euclidean distance template')
    arcpy.env.extent = None # if this is NOT set to none, will cause no selected features

    # templates
    ascii = os.path.join(ws, 'ascii_template.txt')
    arcpy.RasterToASCII_conversion(eucDist, ascii)
    arcpy.Delete_management(eucDist)
    prj = ascii[:-4] + '.prj'
    Message('Created ascii templates...')

    # Get ascii lines
    lines = open(ascii,'r').readlines()

    # Create ascii raster for each point 
    for i in ids:
        ext = bufferPointExtent(lyr, oid, i, distance, units)
        if ext: 
            lines[2] = 'xllcorner     %s\n' %str(float(str(ext.lowerLeft).split()[0]) - (float(cellsize)*2))
            lines[3] = 'yllcorner     %s\n' %str(float(str(ext.lowerLeft).split()[1]) - (float(cellsize)*2))
            copy = os.path.join(ws, 'tmp_%s.txt' %i)
            prj_copy = shutil.copy(prj, copy[:-4] + '.prj')
            with open(copy, 'w') as w:
                w.writelines(lines)
            Message('processed point ID: %s' %i)

    # ascii to raster
    Message('converting ascii files to raster...this may take a while')
    arcpy.env.workspace = ws
    for txt in arcpy.ListFiles('tmp*.txt'):
        number = int(''.join(n for n in txt if n.isdigit()))
        out = os.path.join(outws, 'dist_%s' %number)
        if arcpy.Exists(out): arcpy.Delete_management(out)
        arcpy.ASCIIToRaster_conversion(txt, out)
        arcpy.DefineProjection_management(out, SR)
        arcpy.Delete_management(txt)
        arcpy.Delete_management(txt[:-4] + '.prj')
        
    # delete templates and ascii folder
    for txt in arcpy.ListFiles('ascii_*'):
        arcpy.Delete_management(txt)
    os.rmdir(ws)
    Message('Complete (Elapsed time: %s)' %(str(d.now() - startTime)[:-3]))

if __name__ == '__main__':

    try:
        # always try to run as script tool first, otherwise, exception will allow stand alone params
        # Get params
        argv = tuple(arcpy.GetParameterAsText(i) for i in range(arcpy.GetArgumentCount()))
        DistanceFromEachPoint(*argv)

    except:
        arcpy.AddMessage('Script Tool failed')
        arcpy.AddError(arcpy.GetMessages(2))
        # stand alone
        points = r'C:\TEMP\Stream_points.shp'
        work = r'C:\TEMP\Output_Rasters\ascii_test\Rasters'

        # run it
        DistanceFromEachPoint(points, work, 100, 'Feet',  10)

One other option...If you don't mind them being in ascii format, it would be done in less than 2 hours probably. I am not sure if ascii rasters are valid inputs for processing tools though. The conversion of these to raster is what slows the tool down; but is still faster than running euclidean distance for each one.

One more note...The lines highlighted in blue may need to be changed to + instead of - depending on the units...I had to subtract 2 cell sizes to make sure it was centered. Anyways, this worked perfectly for me, and much faster! Enjoy!

KarenHornigold · ‎08-09-2013

Wow, this is incredible! Thank you so much! I will cite your tool in any papers I submit.

I actually only need the ascii file as I can read this into R where I am doing my analysis. Currently I am reading in the rasters so I can just alter this line of the R script for the different file type! So do I just remove this part of the code:

   # ascii to raster
    Message('converting ascii files to raster...this may take a while')
    arcpy.env.workspace = ws
    for txt in arcpy.ListFiles('tmp*.txt'):
        number = int(''.join(n for n in txt if n.isdigit()))
        out = os.path.join(outws, 'dist_%s' %number)
        if arcpy.Exists(out): arcpy.Delete_management(out)
        arcpy.ASCIIToRaster_conversion(txt, out)
        arcpy.DefineProjection_management(out, SR)
        arcpy.Delete_management(txt)
        arcpy.Delete_management(txt[:-4] + '.prj')
        
    # delete templates and ascii folder
    for txt in arcpy.ListFiles('ascii_*'):
        arcpy.Delete_management(txt)
    os.rmdir(ws)

Anonymous User · ‎08-09-2013

I actually only need the ascii file as I can read this into R where I am doing my analysis.

WOW! That is good news...Well you can eliminate the last part then! The way I wrote that is to have it delete the ascii files after the conversion...I have eliminated the convert ascii to raster so you will just have ascii files. This will probably do all 45k points in a few short hours.

I have just attached a new toolbox, this one has 2 tools...The first one is the tool that converts each ascii to raster...You do not want to use this one. Use the (ascii) one. This will just write out ascii files and not convert to raster...These ascii files can still be used in ArcMap...If you pull them in they will be projected properly.

So if you use the (ascii) tool, it will hopefully be done before you leave the office today on all 45K points! I usually do not spend this much time working on a problem I find on the forums, but this one really interested me. I think this tool is useful. Let me know how it works for you!

One more note....Be sure to pull a few of the ascii rasters into ArcMap overlaid with your points to make sure they are centered...You also may want to use 110 meters for the distance parameter to be safe. Enjoy!

KarenHornigold · ‎08-09-2013

One more note...The lines highlighted in blue may need to be changed to + instead of - depending on the units...I had to subtract 2 cell sizes to make sure it was centered.

Could you just explain this a bit more please? Why did 2 cell sizes need to be subtracted and what will affect this? My units will always be meters. I'd like to be sure I'm using the script correctly, and be able to check it has worked. Thanks!