finding the centroid between two points

998
8
02-01-2013 10:48 AM
janehansen
New Contributor III
Hi, I have a very large data set that contains two sets of Lat/long.  One set of lat/long is for the start point and the other for the end point of a particular event. What I need to do is find the centroid of each start and end point for each row (event) found within the data set, there are over 26,000 rows found in each of the 6 sheet for this particular project.

Thanks,
Jane
Tags (2)
0 Kudos
8 Replies
BenjaminGale
New Contributor III
Sounds like you could do this without out python. You could create two new fields in your table for the center Lat and Long and use the field calculator to calculate the values.

If you'd rather use python you could do something like this...
input = "test_points"
fieldName_01 = "cenLat"
fieldName_02 = "cenLong"
expression_01 = "(!StartLat!+!EndLat!)/2"
expression_02 = "(!StartLong!+!EndLong!)/2"

#Add fields
arcpy.AddField_management(input,fieldName_01,"DOUBLE")
arcpy.AddField_management(input,fieldName_02,"DOUBLE")

#Calculate coordinate between the points
arcpy.CalculateField_management(input,"centLat",expression,"PYTHON")
arcpy.CalculateField_management(input,"cenLong",expression,"PYTHON")



These might be helpful if you decide on the python route...

Add Fields: http://resources.arcgis.com/en/help/main/10.1/index.html#//001700000047000000
Calculate Field: http://resources.arcgis.com/en/help/main/10.1/index.html#//00170000004m000000
0 Kudos
DanPatterson_Retired
MVP Emeritus
The centroid will be along a great circle and not simply the average of the latitudes and longitudes.  It may be more appropriate to use projected data and the sample code provided to accomplish this task
0 Kudos
janehansen
New Contributor III
The centroid will be along a great circle and not simply the average of the latitudes and longitudes.  It may be more appropriate to use projected data and the sample code provided to accomplish this task


Hi Dan, thanks for your imput, I have spent some time going calculating the centroids by the dissolve method.  I will loose time here, but need to be accurate so I will average the start and end lat and start and end long.
Thanks for your help,
Jn
0 Kudos
T__WayneWhitley
Frequent Contributor
Hello Jane, I am posting a Python snippet... very basic stuff, no error trapping, but it's so easy to modify to be made into a script tool, as you see fit.  I think it's just good at first to post 'sample code' at least to show the idea or technique, then improve it if necessary.

I'm not sure about the projection parameters -- very interesting topic though, and very important to get that right, please change those params as necessary to improve your results.

This simply reads Excel for data that is defined (named ranges only), creates an xy multipoint event using an array and multipoint class.  The 'trick' that may not be immediately apparent is where 5 lines need to be commented out if you don't want to 'see' the centroids as computed before projecting...it was easy to leave that in as related to Dan Patterson's comment about the error associated with finding the actual center.  This is much more complex in terms of coordinate systems and transformation methods than it may initially seem.  ...why the comparison was warranted.  You probably cannot determine anything conclusive with this data, but there are differences in the 2 datasets (look at the attached zip, the contained gdb - the script is also listed below for viewing purposes):

multipointPCS_1 (the projected GCS result, centroid computed before projecting)
multipointPCS_2 (the PCS result, centroid computed after event layer was projected)

Enjoy,
Wayne

import arcpy, os, sys
arcpy.env.overwriteOutput = True

scriptLoc = sys.argv[0]
rootpath = scriptLoc.rsplit(os.sep)[0]
arcpy.env.workspace = os.path.join(rootpath, 'avePairCoords.xlsx')
rootpath = os.path.join(rootpath, 'stage.gdb')

multipointGCS = os.path.join(rootpath, 'multipointGCS')
multipointPCS_1 = os.path.join(rootpath, 'multipointPCS_1')
multipointPCS_2 = os.path.join(rootpath, 'multipointPCS_2')

installDir = arcpy.GetInstallInfo()['InstallDir']
SR_GCS = os.path.join(installDir, r'Coordinate Systems\Geographic Coordinate Systems\World\WGS 1984.prj')
SR_PCS = os.path.join(installDir, r'Coordinate Systems\Projected Coordinate Systems\National Grids\Canada\NAD 1983 BC Environment Albers.prj')
transform = 'WGS_1984_(ITRF00)_To_NAD_1983'

arcpy.CreateFeatureclass_management(rootpath, 'multipointGCS', 'MULTIPOINT', os.path.join(rootpath, 'multipointGCS_template'), '', '', SR_GCS)
arcpy.CreateFeatureclass_management(rootpath, 'multipointPCS_2', 'MULTIPOINT', os.path.join(rootpath, 'multipointPCS_template'), '', '', SR_PCS)

pointArray = arcpy.Array()
pnt = arcpy.Point()

worksheets = arcpy.ListTables()

sheets = []
for sheet in worksheets:
 if '$' not in sheet: sheets.append(sheet)

rowsGCS = arcpy.InsertCursor(multipointGCS, SR_GCS)

for sheet in sheets:
        print 'processing: ' + sheet
        fields = arcpy.ListFields(sheet)
        rows = arcpy.SearchCursor(sheet)
        for row in rows:
                ID = int(row.getValue(fields[0].name))
                for i in range(2, len(fields), 3):
                        lat, lon = row.getValue(fields.name), row.getValue(fields[i + 1].name)
                        pnt.X, pnt.Y = lon, lat
                        pointArray.add(pnt)
                feat = rowsGCS.newRow()
                multiPoint = arcpy.Multipoint(pointArray)
                feat.shape = multiPoint
                rowsGCS.insertRow(feat)
                pointArray.removeAll()

                # Comment out the following 4 lines below to prevent adding the centroid to the GCS output.
                # These lines were added to observe and experiment with processing the centroid before projecting the event layer.
                trueCentroid = multiPoint.trueCentroid
                feat = rowsGCS.newRow()
                feat.shape = trueCentroid
                rowsGCS.insertRow(feat)
                
del row, rows, rowsGCS

arcpy.Project_management(multipointGCS, multipointPCS_1, SR_PCS, transform)

rowsPCS1 = arcpy.SearchCursor(multipointPCS_1)
rowsPCS2 = arcpy.InsertCursor(multipointPCS_2)

row = rowsPCS1.next()
while row:
        feat1 = row.shape
        for pnt in feat1:
                pointArray.add(pnt)
        multiPoint = arcpy.Multipoint(pointArray)
        trueCentroid = multiPoint.trueCentroid
        feat2 = rowsPCS2.newRow()
        feat2.shape = multiPoint
        rowsPCS2.insertRow(feat2)
        feat2 = rowsPCS2.newRow()
        feat2.shape = trueCentroid
        rowsPCS2.insertRow(feat2)

        pointArray.removeAll()
        row = rowsPCS1.next()

        # This single following line below will also need commenting out as well if not processing the centroid GCS intermediate output.
        # If the multipoint row does not exist, then there is no need to skip an extra row on the search cursor.
        row = rowsPCS1.next()

del row, rowsPCS1, rowsPCS2
print 'done.'
0 Kudos
janehansen
New Contributor III
Hello Jane, I am posting a Python snippet... very basic stuff, no error trapping, but it's so easy to modify to be made into a script tool, as you see fit.  I think it's just good at first to post 'sample code' at least to show the idea or technique, then improve it if necessary.

I'm not sure about the projection parameters -- very interesting topic though, and very important to get that right, please change those params as necessary to improve your results.

This simply reads Excel for data that is defined (named ranges only), creates an xy multipoint event using an array and multipoint class.  The 'trick' that may not be immediately apparent is where 5 lines need to be commented out if you don't want to 'see' the centroids as computed before projecting...it was easy to leave that in as related to Dan Patterson's comment about the error associated with finding the actual center.  This is much more complex in terms of coordinate systems and transformation methods than it may initially seem.  ...why the comparison was warranted.  You probably cannot determine anything conclusive with this data, but there are differences in the 2 datasets (look at the attached zip, the contained gdb - the script is also listed below for viewing purposes):

multipointPCS_1 (the projected GCS result, centroid computed before projecting)
multipointPCS_2 (the PCS result, centroid computed after event layer was projected)

Enjoy,
Wayne

import arcpy, os, sys
arcpy.env.overwriteOutput = True

scriptLoc = sys.argv[0]
rootpath = scriptLoc.rsplit(os.sep)[0]
arcpy.env.workspace = os.path.join(rootpath, 'avePairCoords.xlsx')
rootpath = os.path.join(rootpath, 'stage.gdb')

multipointGCS = os.path.join(rootpath, 'multipointGCS')
multipointPCS_1 = os.path.join(rootpath, 'multipointPCS_1')
multipointPCS_2 = os.path.join(rootpath, 'multipointPCS_2')

installDir = arcpy.GetInstallInfo()['InstallDir']
SR_GCS = os.path.join(installDir, r'Coordinate Systems\Geographic Coordinate Systems\World\WGS 1984.prj')
SR_PCS = os.path.join(installDir, r'Coordinate Systems\Projected Coordinate Systems\National Grids\Canada\NAD 1983 BC Environment Albers.prj')
transform = 'WGS_1984_(ITRF00)_To_NAD_1983'

arcpy.CreateFeatureclass_management(rootpath, 'multipointGCS', 'MULTIPOINT', os.path.join(rootpath, 'multipointGCS_template'), '', '', SR_GCS)
arcpy.CreateFeatureclass_management(rootpath, 'multipointPCS_2', 'MULTIPOINT', os.path.join(rootpath, 'multipointPCS_template'), '', '', SR_PCS)

pointArray = arcpy.Array()
pnt = arcpy.Point()

worksheets = arcpy.ListTables()

sheets = []
for sheet in worksheets:
 if '$' not in sheet: sheets.append(sheet)

rowsGCS = arcpy.InsertCursor(multipointGCS, SR_GCS)

for sheet in sheets:
        print 'processing: ' + sheet
        fields = arcpy.ListFields(sheet)
        rows = arcpy.SearchCursor(sheet)
        for row in rows:
                ID = int(row.getValue(fields[0].name))
                for i in range(2, len(fields), 3):
                        lat, lon = row.getValue(fields.name), row.getValue(fields[i + 1].name)
                        pnt.X, pnt.Y = lon, lat
                        pointArray.add(pnt)
                feat = rowsGCS.newRow()
                multiPoint = arcpy.Multipoint(pointArray)
                feat.shape = multiPoint
                rowsGCS.insertRow(feat)
                pointArray.removeAll()

                # Comment out the following 4 lines below to prevent adding the centroid to the GCS output.
                # These lines were added to observe and experiment with processing the centroid before projecting the event layer.
                trueCentroid = multiPoint.trueCentroid
                feat = rowsGCS.newRow()
                feat.shape = trueCentroid
                rowsGCS.insertRow(feat)
                
del row, rows, rowsGCS

arcpy.Project_management(multipointGCS, multipointPCS_1, SR_PCS, transform)

rowsPCS1 = arcpy.SearchCursor(multipointPCS_1)
rowsPCS2 = arcpy.InsertCursor(multipointPCS_2)

row = rowsPCS1.next()
while row:
        feat1 = row.shape
        for pnt in feat1:
                pointArray.add(pnt)
        multiPoint = arcpy.Multipoint(pointArray)
        trueCentroid = multiPoint.trueCentroid
        feat2 = rowsPCS2.newRow()
        feat2.shape = multiPoint
        rowsPCS2.insertRow(feat2)
        feat2 = rowsPCS2.newRow()
        feat2.shape = trueCentroid
        rowsPCS2.insertRow(feat2)

        pointArray.removeAll()
        row = rowsPCS1.next()

        # This single following line below will also need commenting out as well if not processing the centroid GCS intermediate output.
        # If the multipoint row does not exist, then there is no need to skip an extra row on the search cursor.
        row = rowsPCS1.next()

del row, rowsPCS1, rowsPCS2
print 'done.'

Thanks Wayne, I really appreciate your help.  I've done one sheet by adding two columns then using the field cal St_lat +End_lat/2 to find the centroids, but it is time consuming.. so thank you for the script, can't wait to try it.

Jane
0 Kudos
T__WayneWhitley
Frequent Contributor
Sure... just thinking, but if I get time, maybe I can show you how to correctly implement it as a script tool, with a few small modifications of course.
If you want, send an xlsx copy containing a couple of abbreviated worksheets you haven't done yet - zip and attach here if not too big.
I say 'abbreviated' because of the size limit you can successfully attach...it's quick and easy to maybe attach 1000 records or so each onto 2 sheets.  (2 sheets are better than 1 because then you'll see the result of looping on the sheets - I could partition it myself, but would rather you do it so that it's as 'original' as possible...I'll note anything I may have to 'fix' to make the run successful.)

You also have 10.0, so whatever I zip and attach you should be able to unzip and execute, simple as that.  I'll make the toolbox 'relative-pathed' to be able to find both the py script and the larger set of sample data you send, no matter where you place it.  You'll see how it runs and apply it to the rest of your xlsx data.

How does that sound?  I'm just curious about it because it could be extremely useful for training purposes - newer users are much more comfortable (and sometimes less suspicious of ArcGIS) when they can use their data in this more immediate way, data that they've worked on in Excel... so it is almost like ArcGIS is 'conforming' to use data in another environment in an interoperable way rather than requiring the complete conversion up front.  Still technically a conversion for the processing to complete, but I think it seems less intrusive to the user, at least.

Enjoy,
Wayne
0 Kudos
janehansen
New Contributor III
Sure... just thinking, but if I get time, maybe I can show you how to correctly implement it as a script tool, with a few small modifications of course.
If you want, send an xlsx copy containing a couple of abbreviated worksheets you haven't done yet - zip and attach here if not too big.
I say 'abbreviated' because of the size limit you can successfully attach...it's quick and easy to maybe attach 1000 records or so each onto 2 sheets.  (2 sheets are better than 1 because then you'll see the result of looping on the sheets - I could partition it myself, but would rather you do it so that it's as 'original' as possible...I'll note anything I may have to 'fix' to make the run successful.)

You also have 10.0, so whatever I zip and attach you should be able to unzip and execute, simple as that.  I'll make the toolbox 'relative-pathed' to be able to find both the py script and the larger set of sample data you send, no matter where you place it.  You'll see how it runs and apply it to the rest of your xlsx data.

How does that sound?  I'm just curious about it because it could be extremely useful for training purposes - newer users are much more comfortable (and sometimes less suspicious of ArcGIS) when they can use their data in this more immediate way, data that they've worked on in Excel... so it is almost like ArcGIS is 'conforming' to use data in another environment in an interoperable way rather than requiring the complete conversion up front.  Still technically a conversion for the processing to complete, but I think it seems less intrusive to the user, at least.

Enjoy,
Wayne

Hi Wayne, thanks for your continued interest.  Here is an excel with the sheets containg some sample data. I look forward to hearing back from you on this.

Cheers,
0 Kudos
T__WayneWhitley
Frequent Contributor
Jane, the pointCentroidTest.zip attachment (post #5 above) contains the following:

- the script, pointCentroidTest.py.
- the file geodatabase, stage.gdb.
- the test Excel spreadsheet, avePairCoords.xlsx.

When you download and unzip the contents, the script is designed to access the xlsx as input and the gdb as output, but they have to remain 'side-by-side' or in the same root folder.  This can very well be made into a script tool, but I didn't bother - the more important task at hand is whether this can handle your input and create centroids in your NAD 1983 BC Environment Albers projection.

To 'gear' this to the sample you provided, I downloaded and copied the Wayne_exel.xls to the same folder as the script and gdb (and, of course, the original test xlsx).  Then I had to change the input file the script accesses.  The only line in the script that needs changing in order to run on the new sample should be where the workspace is set:

arcpy.env.workspace = os.path.join(rootpath, 'Wayne_exel.xls')

However that will not run - Why?  Because the sample you provided is not in the anticipated format we already established -- this script will run on records of the following explicit field formatting, where 'n' represents the nth set of lat/lon coordinates of the same 'station' collected on some corresponding n date:

ID, DATE1, LAT1, LON1, DATE2, LAT2, LON2, ... , DATEn, LATn, LONn

You were missing the header information and initial 1st 2 fields - and Sheet1 made no sense to me.  So what I actually did was enter a 'dummy' header and made sure the necessary fields existed for Sheets 2 and 3 (and deleted Sheet 1).  Also I defined names for ranges on each worksheet and saved to a new file, Wayne_exel.xlsx to use in the script.  It ran without error - I am attaching a small PDF outlining the procedure.

I am also attaching a revised script to 'copy' the centroids only to a new fc (using the same gdb as before).  ...think I'll also try to attach your revised spreadsheet, which I saved to xlsx...the command line in the script reflects the revised input:

arcpy.env.workspace = os.path.join(rootpath, 'Wayne_exel.xlsx')

In order to use this, copy both provided xlsx and script to the same root location the previous attachment was unzipped - that contained gdb will be required.  More info provided in the attached PDF.

Enjoy,
Wayne
0 Kudos