Spatial correlation of two data sets

1843
8
Jump to solution
01-07-2018 08:11 AM
FrankHanssen
New Contributor III

I have one modelled power-line path that I want to compare statistically with an existing power-line path. What I would like to measure is how the model output correlate spatially with the reality. So, no other variables than the XY-coordinates.

Best,

Frank

0 Kudos
1 Solution

Accepted Solutions
XanderBakker
Esri Esteemed Contributor

I just wrote some code to see what can be done. 

The code used two featureclasses with the two lines (it extracts the first feature as used the polyline) and it created two output featureclasses. The step_size (line 10) defines at what distance the points are extracted from the line:

def main():
    import arcpy
    arcpy.env.overwriteOutput = True

    fc_gis = r'C:\GeoNet\compareLines\data.gdb\GIS_route'
    fc_real = r'C:\GeoNet\compareLines\data.gdb\Real_route'
    fc_out1 = r'C:\GeoNet\compareLines\data.gdb\dist_gis2real'
    fc_out2 = r'C:\GeoNet\compareLines\data.gdb\dist_real2gis'

    step_size = 100

    # get lines
    line_base = arcpy.da.SearchCursor(fc_real, ("SHAPE@")).next()[0]
    line_compare = arcpy.da.SearchCursor(fc_gis, ("SHAPE@")).next()[0]
    steps = int(divmod(line_base.length, step_size)[0])
    sr = arcpy.Describe(fc_gis).spatialReference

    lst_feats = []
    for i in range(steps):
        d = i * step_size
        pnt = line_base.positionAlongLine(d, False)
        pnt_near = line_compare.queryPointAndDistance(pnt, False)[0]
        polyline = arcpy.Polyline(arcpy.Array([pnt.firstPoint, pnt_near.firstPoint]), sr)
        lst_feats.append(polyline)

    arcpy.CopyFeatures_management(lst_feats, fc_out1)


    steps = int(divmod(line_compare.length, step_size)[0])

    lst_feats = []
    for i in range(steps):
        d = i * step_size
        pnt = line_compare.positionAlongLine(d, False)
        pnt_near = line_base.queryPointAndDistance(pnt, False)[0]
        polyline = arcpy.Polyline(arcpy.Array([pnt.firstPoint, pnt_near.firstPoint]), sr)
        lst_feats.append(polyline)

    arcpy.CopyFeatures_management(lst_feats, fc_out2)


if __name__ == '__main__':
    main()

It creates two new featureclasses by projecting lines between points created on one line (based on the step_size) to the nearest point on the other line. This is what is created (using green for short distances and red for high distances):

If you look at the attribute table of one of the results you can run some statistics on the length of the segments being created:

Apart from analyzing the variation between them, it might be more interesting to understand why the real line does not follow the one calculated in GIS. Are there some variables that were not included? Are there features on the ground that were not considered by the GIS calculation since they may not have been available in the data used for the analysis?

View solution in original post

8 Replies
DanPatterson_Retired
MVP Emeritus

Show the paths you have.  Are you looking for deviations of modelled path vs actual path?  I suspect this isn't a correlation in the true sense of the word since causation isn't a play here.

There are a number of factors that you could derive including distance differences, directional changes etc, but there is no reason to suspect that the actual path could be used to 'predict' the location of your modelled path since you don't have an independent and a dependent variable as is required by correlation

FrankHanssen
New Contributor III

Thanks Dan! I am interested to get a measure on how well the model path fit spatially with the actual path. I have derived distance differences and have max-mean-min values. Is it possible to correlate the coordinates? 

0 Kudos
XanderBakker
Esri Esteemed Contributor

Perhaps creating a polygon from both lines and use the area divided by the length of the path to say something about the deviation between the two lines.

DanPatterson_Retired
MVP Emeritus

Association... not correlation.

what parameters/variables are used to determine your predicted path?  That is the question.

There is a strong 'association' between the actual and predicted path in several areas and not so much in others.

The question becomes, what is in the vicinity of the locations where the paths are similar?  What is in the vicinity of where they are not?

What is driving your 'predicted path'.  Nothing moves by itself, so how do you determine the locations that form the line.

A lot is missing from your description on how the path is derived and why it moves in the direction that it does forming a similar path and length.

XanderBakker
Esri Esteemed Contributor

I just wrote some code to see what can be done. 

The code used two featureclasses with the two lines (it extracts the first feature as used the polyline) and it created two output featureclasses. The step_size (line 10) defines at what distance the points are extracted from the line:

def main():
    import arcpy
    arcpy.env.overwriteOutput = True

    fc_gis = r'C:\GeoNet\compareLines\data.gdb\GIS_route'
    fc_real = r'C:\GeoNet\compareLines\data.gdb\Real_route'
    fc_out1 = r'C:\GeoNet\compareLines\data.gdb\dist_gis2real'
    fc_out2 = r'C:\GeoNet\compareLines\data.gdb\dist_real2gis'

    step_size = 100

    # get lines
    line_base = arcpy.da.SearchCursor(fc_real, ("SHAPE@")).next()[0]
    line_compare = arcpy.da.SearchCursor(fc_gis, ("SHAPE@")).next()[0]
    steps = int(divmod(line_base.length, step_size)[0])
    sr = arcpy.Describe(fc_gis).spatialReference

    lst_feats = []
    for i in range(steps):
        d = i * step_size
        pnt = line_base.positionAlongLine(d, False)
        pnt_near = line_compare.queryPointAndDistance(pnt, False)[0]
        polyline = arcpy.Polyline(arcpy.Array([pnt.firstPoint, pnt_near.firstPoint]), sr)
        lst_feats.append(polyline)

    arcpy.CopyFeatures_management(lst_feats, fc_out1)


    steps = int(divmod(line_compare.length, step_size)[0])

    lst_feats = []
    for i in range(steps):
        d = i * step_size
        pnt = line_compare.positionAlongLine(d, False)
        pnt_near = line_base.queryPointAndDistance(pnt, False)[0]
        polyline = arcpy.Polyline(arcpy.Array([pnt.firstPoint, pnt_near.firstPoint]), sr)
        lst_feats.append(polyline)

    arcpy.CopyFeatures_management(lst_feats, fc_out2)


if __name__ == '__main__':
    main()

It creates two new featureclasses by projecting lines between points created on one line (based on the step_size) to the nearest point on the other line. This is what is created (using green for short distances and red for high distances):

If you look at the attribute table of one of the results you can run some statistics on the length of the segments being created:

Apart from analyzing the variation between them, it might be more interesting to understand why the real line does not follow the one calculated in GIS. Are there some variables that were not included? Are there features on the ground that were not considered by the GIS calculation since they may not have been available in the data used for the analysis?

DanPatterson_Retired
MVP Emeritus

Xander that is good.... yes, it is what's on the ground that should be of interest/concern for the predictive model, 

FrankHanssen
New Contributor III

Thanks Dan and Xander,

for all your valuable contributions! The modelled line is calculated based on a MCA-generated cost-surface and LCP based on a subset of criterion used in the planning of the original powerline. I have control about all criterion, my main issue was just to get comparison about how well they fitted. I did a similar calculation as you Xander in modelbuilder.

0 Kudos
XanderBakker
Esri Esteemed Contributor

Is the thread solved or do you additional questions? If it is solved, can you mark the post that best answered your question as the correct answer? If not, can you indicate what additional questions you have?

0 Kudos