Algorithm to compare fields to determine match

LeroneSavage · ‎07-12-2013

it there any function(s) that compares two user's x y fields (or a list of this data) to verify if the the points match over a period of time?

MathewSchmidtlein · ‎07-13-2013

It sounds like you are saying you have two input tables, each with x/y coordinates in each row, and you are wanting to see if the coordinates for rows in one table match the coordinates for rows in the other table. Is that correct?

If so, I don't know of an existing function to do it. But you could likely proceed using the following steps:

1. Join the two tables
2. Create a Boolean variable to be used to indicate if the coordinates match
3. Use a search cursor to run through the rows, and compare the x's and y's for each row. Set the Boolean variable to false if the don't match. Alternatively, if you don't care about slight precisional differences, you could calculate the Euclidean distance between the coordinates, and set the Boolean to false only if it exceeds some threshold.
4. Report out the value of that variable at the end.

If the value of the variable is false at the end, you know that the coordinates don't match at some point.

RichardFairhurst · ‎07-13-2013

it there any function(s) that compares two user's x y fields (or a list of this data) to verify if the the points match over a period of time?

There is no single function that does a fully automated comparisons between two tables or feature classes at all for anything. You have to build the function yourself for your specific table/feature class and matching requirements, since there are too many variations of what you might want to do.

You obviously have two tables or feature classes and I assume that the X and Y values are already in a field. Provide more information about the following to get a better answer:

1. What is the unique field value for each row or feature? Is it the X and Y coordinate pair that is supposed to be your unique key or is there some other field value that is a join key between the two tables/feature classes? Are the X and Y coordinate pairs supposed to be unique for every row or feature in both tables? Or are there multiple rows and features that can share the same X and Y coordinate pair?

2. What are you trying to match up, just the X and Y values for a given unique key field or the set of all other field values associated with each X and Y pair?

3. Are the X and Y pair values supposed to be an exact match in the two tables or do you intend to permit some form of tolerance where nearby points within a given distance are considered identical between the two tables/feature classes?

The techniques that you should apply depend on your set up. All of the techniques I would recommend can be handled using standard Model Builder tools (Join primarily), standard SQL selections and standard field calculations, and do not require a cursor.

For example, if you want to create a join on the X and Y coordinate pairs between two tables where the X and Y coordinates are unique for each row or feature in both tables, you can concatenate the X and Y values in both tables/feature classes and use a join. The formula I use for calculating my X/Y coordinate concatenation join field from a point feature class that has no X and Y field, only point geometry, is:

# Process: Calculate X_Y_Link Field (2)...
arcpy.CalculateField_management(Output_Feature_Class, "X_Y_LINK", "Output(!SHAPE.FIRSTPOINT!)", "PYTHON", "def Output(FirstPoint):\\n FPX = round(float(FirstPoint.split() [0]), 4)\\n FPY = round(float(FirstPoint.split() [1]), 4)\\n return \"{%(FX)012.4f}{%(FY)012.4f}\" % {'FX': FPX, 'FY': FPY}")

I would calculate this into a 28 character text field called X_Y_LINK in both point feature classes using the field calculator tool. In Model Builder this field calculator set up would appear as follows:

Parser: Python

Show Codeblock: Checked

Pre-Logic Codeblock:

def Output(FirstPoint):
  FPX = round(float(FirstPoint.split() [0]), 4)
  FPY = round(float(FirstPoint.split() [1]), 4)
  return "{%(FX)012.4f}{%(FY)012.4f}" % {'FX': FPX, 'FY': FPY}

Expression: Output(!SHAPE.FIRSTPOINT!)

In my case my coordinates are in State Plane with a precision of 1/10,000th of a foot required. For your projection you would have to determine the format and precision of the coordinate numbers you want to use. For example, if your typical coordinates were in Decimal Degrees and you wanted to maintain precision to 8 decimal places you would probably use:

def Output(FirstPoint):
  FPX = round(float(FirstPoint.split() [0]), 8)
  FPY = round(float(FirstPoint.split() [1]), 8)
  return "{%(FX)012.8f}{%(FY)012.8f}" % {'FX': FPX, 'FY': FPY}

(I have not tested the code with signed values, so the formula might need to be adjusted to 013.8f to allow for a negative sign in addition to a maximum 3 digit integer, a decimal point and a maximum of 8 decimal place numbers with leading and trailing zero padding.)

Once joined on these fields, to find out all of the records in the parent table that are not matched at all in the join table, use this SQL on a geodatabase table:

Join_Table.ObjectID is Null

Once you have the selection, break the join and then you can do things like Append or Merge to transfer the selected records to the previously joined table that did not have these records.

For all other field comparisons on a join based on the X_Y_LINK field, you would want the synchronizing table joined to the control table, and the selection would be in the format:

Parent_Table.MyField <> Join_Table.MyField or (Parent_Table.MyField IS NULL and NOT Join_Table.MyField IS NULL)

If any records are selected you can transfer the values of the Join_Table (Control table) to the Parent_Table (Synchronizing Table) with a calculation of:

!Join_Table.MyField!

A cursor routine is preferable if there are many fields to be transferred, but using the join approach you could first select only the records that actually need to be processed by the cursor doing the transfer using this example SQL to find 3 fields that have any differences (extend the example for more fields):

Parent_Table.MyField1 <> Join_Table.MyField1 or (Parent_Table.MyField1 IS NULL and NOT Join_Table.MyField1 IS NULL) or Parent_Table.MyField2 <> Join_Table.MyField2 or (Parent_Table.MyField2 IS NULL and NOT Join_Table.MyField2 IS NULL) or Parent_Table.MyField3 <> Join_Table.MyField3 or (Parent_Table.MyField3 IS NULL and NOT Join_Table.MyField3 IS NULL)

Assuming this subset of records was much smaller than the full set of records and that you had attribute indexes on each field, this approach should process faster than using a paired cursor routine that reads every record of one of the two tables to define a selection against the other. However, involving a dictionary in the paired cursor routine might create a faster performing script. Although, I have not done a benchmark test of the new cursor syntax at 10.1 against my join method. I do know at 10.0, the join method is faster than a 10.0 cursor.

RhettZufelt · ‎07-15-2013

If you are just trying to compare between two existing point datasets, you could select by location, then inverse the selection to find non-matches.

Can apply search distance (buffer) to handle non-exact values.

R_

RichardFairhurst · ‎07-15-2013

My formula used the outdated 9.2 format, which does not work at 9.3 or later in the regular field calculator. To update the calculation for python script and the field calculator the XY concatenation calculation to a point feature class should be:

# Process: Calculate X_Y_Link Field (2)...
arcpy.CalculateField_management(Output_Feature_Class, "X_Y_LINK", "Output(!SHAPE.CENTROID!)", "PYTHON_9.3", "def Output(FirstPoint):\\n FPX = round(float(FirstPoint.X), 8)\\n FPY = round(float(FirstPoint.Y), 8)\\n return \"{%(FX)013.8f}{%(FY)013.8f}\" % {'FX': FPX, 'FY': FPY}")

In the normal 9.3 and later field calculator you would output to a 30 characer field with the calculation entered as shown below:

Parser: Python

Show Codeblock: Checked

Pre-Logic Codeblock:

def Output(FirstPoint):
  FPX = round(float(FirstPoint.X), 8)
  FPY = round(float(FirstPoint.Y), 8)
  return "{%(FX)013.8f}{%(FY)013.8f}" % {'FX': FPX, 'FY': FPY}

Expression: Output(!SHAPE.CENTROID!)

LeroneSavage · ‎07-19-2013

Thnx. I will try it out.

LeroneSavage · ‎07-19-2013

Also, do you happen to know of any python code that converts a excel file into a csv?