Select Max value arcpy

9988
18
Jump to solution
12-13-2015 09:48 PM
timdunlevie
Occasional Contributor

Hi all,

I have a FC in an SDE with Fields: ID & Version.

These are populated like this:

ID "A", Version "1"

ID "A", Version "2"

ID "B", Version "1"

ID "B", Version "2"

etc..

so when the ID object changes we create a new shape with a new Version number, but keep the old shape for historic purposes.

I am doing a join on another table but only want to perform the join on a subset of the original FC, grabbing the latest "Version" of the ID.

For the life of me I cannot seem to be able to select these records.

This will be a python script.

I was thinking of using the arcpy.MakeFeatureLayer_management with an expression and/or arcpy.SelectLayerByAttribute_management function with an expression to form my subset before I join to another table.

"Version" = (SELECT MAX("Version") from FC) - does not seem to work. I also tried a "group by" and got a sql error.

We are using SQL server and ArcGIS 10.2

Even in ArcMap I cannot seem to be able to select the latest version of the ID....I am guessing I don't have the syntax correct.

any help would be appreciated.

thanks

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

On the surface, it seems like using OBJECTID this way should work, but I also don't understand Esri's sequencing of that field well enough to say accurate results are guaranteed.  One of the classic approaches, i.e., has been around for a long time and is SQL ANSI compatible, to solving this problem in the DBMS realm uses NOT EXISTS:

qry = "NOT EXISTS (SELECT 1 FROM fc f WHERE fc.id = f.id AND f.version > fc.version)"
arcpy.SelectLayerByAttribute_management(layerName, "NEW_SELECTION", qry)

One note of caution with using this approach.  Although the SQL reference for query expressions used in ArcGIS documentation states, "EXISTS is supported in file, personal, and ArcSDE geodatabases ...," the type of expression above provides bogus results for file geodatabases.  The results are accurate for personal geodatabases and all enterprise geodatabases I have tried.

View solution in original post

18 Replies
DanPatterson_Retired
MVP Emeritus

error message?

0 Kudos
ChrisSmith7
Frequent Contributor

Advanced editor works here!

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I am guessing you are building your where clause incorrectly, but I can't say for sure unless you post the specific code and error message(s).

If the maximum Version number is the same across all IDs at any given time, then it simplifies the situation and using a basic subquery with MAX will work:

>>> lyr_name = 'tmpLayer'
>>> fc_name = 'tmpFC'
>>> ws = arcpy.env.workspace
>>> fc = arcpy.CreateFeatureclass_management(ws, fc_name, "POINT")
>>> arcpy.AddField_management(fc, "ID", "TEXT", field_length=1)
<Result 'C:\\Users\\bixb0012\\Documents\\ArcGIS\\Default.gdb\\tmpFC'>
>>> arcpy.AddField_management(fc, "Version", "LONG")
<Result 'C:\\Users\\bixb0012\\Documents\\ArcGIS\\Default.gdb\\tmpFC'>
>>> cur = arcpy.da.InsertCursor(fc,["ID", "Version"])
>>> [cur.insertRow([i,j]) for i in ("A", "B") for j in (1, 2)]
[1L, 2L, 3L, 4L]
>>> del cur
>>> lyr = arcpy.MakeFeatureLayer_management(fc, lyr_name, "Version = (SELECT MAX(Version) FROM {})".format(fc_name))
>>> arcpy.GetCount_management(lyr)
<Result '2'>
>>> [row for row in arcpy.da.SearchCursor(lyr, ["ID", "Version"])]
[(u'A', 2), (u'B', 2)]
>>>

If the maximum Version number can be different across all IDs at any given time, then the situation gets more complicated because you will have to use a semi-join instead of a basic equality operator.

I recommend looking at query layers regardless of whether the Version numbers increment uniformly.  The documentation for SQL reference for query expressions used in ArcGIS points out some important limitations to using subqueries, especially with versioned data:

Coverages, shapefiles, and other nongeodatabase file-based data sources do not support subqueries. Subqueries that are performed on versioned ArcSDE feature classes and tables will not return features that are stored in the delta tables. File geodatabases provide the limited support for subqueries explained in this section, while personal and ArcSDE geodatabases provide full support. For information on the full set of subquery capabilities of personal and ArcSDE geodatabases, refer to your DBMS documentation.

timdunlevie
Occasional Contributor

Thanks mate.

I didn’t want to bore everyone with the full code as this is a small part of the script.

What I was hoping to do was to simply add an “expression” to either the SelectLayerByAttribute command, or the MakeFeatureLayer command.

(apologies if the format of this code is not what is required).

inFeatures = "MasterFC"

layerName = "Temp"

inField = "ID"

joinTable = "text”

joinField = "Name"

qry = "[Version] = (SELECT MAX() from MasterFC GROUP BY )"

expression = "text.type = ‘active’”

outFeature = “NewTable”

arcpy.MakeFeatureLayer_management (inFeatures, layerName)

arcpy.SelectLayerByAttribute_management(layerName, "NEW_SELECTION", qry)

arcpy.AddJoin_management(layerName, inField, joinTable, joinField)

arcpy.SelectLayerByAttribute_management(layerName, "NEW_SELECTION", expression)

arcpy.CopyFeatures_management(layerName, outFeature)

In the above example I get “ERROR 000358: Invalid expression”

If I modify this to this placing the “qry” in the MakeFeatureLayer command:

arcpy.MakeFeatureLayer_management (inFeatures, layerName, qry)

    1. arcpy.SelectLayerByAttribute_management(layerName, "NEW_SELECTION", qry)

arcpy.AddJoin_management(layerName, inField, joinTable, joinField)

arcpy.SelectLayerByAttribute_management(layerName, "NEW_SELECTION", expression)

arcpy.CopyFeatures_management(layerName, outFeature)

I get an error again with the query….

ERROR 000230: Failed selecting with = (SELECT MAX()

I guess my question is whether it can done this way using an expression parameter in either command? I’m thinking my expression is completely wrong…

Or should I be using a Search cursor with an expression?

And yes – there can potentially be different “versions” for each “ID”…..eg ID A, Version 1, ID C, Version 3, ID F, Version 2 etc…

The Feature Class is not versioned.

Many thanks,

Tim

0 Kudos
JamesCrandall
MVP Frequent Contributor

I don't believe SelectLayerByAttribute_management supports GROUP BY.

0 Kudos
WesMiller
Regular Contributor III

Try using your sql statement in arcmap to make sure it works. I got this to work for me OBJECTID in (select max( OBJECTID ) from bpbSWBilling.DBO.Impervious GROUP BY FEATURE)  ,notice I'm using "in" and not "="

JoshuaBixby
MVP Esteemed Contributor

On the surface, it seems like using OBJECTID this way should work, but I also don't understand Esri's sequencing of that field well enough to say accurate results are guaranteed.  One of the classic approaches, i.e., has been around for a long time and is SQL ANSI compatible, to solving this problem in the DBMS realm uses NOT EXISTS:

qry = "NOT EXISTS (SELECT 1 FROM fc f WHERE fc.id = f.id AND f.version > fc.version)"
arcpy.SelectLayerByAttribute_management(layerName, "NEW_SELECTION", qry)

One note of caution with using this approach.  Although the SQL reference for query expressions used in ArcGIS documentation states, "EXISTS is supported in file, personal, and ArcSDE geodatabases ...," the type of expression above provides bogus results for file geodatabases.  The results are accurate for personal geodatabases and all enterprise geodatabases I have tried.

timdunlevie
Occasional Contributor

Cheers mate.

This gives me no syntax errors when incorporating the qry in the SelectLayerByAttribute command, and seems like it “should’ work.

However, I am still getting both ID A, Version 1 & ID A, Version 2 in the resulting outFC.

I’ll keep playing around with this – I believe it’s what I am looking for!

Thanks,

Tim

0 Kudos
timdunlevie
Occasional Contributor

mate just tested this again and it works! Brilliant!!!!

I had an incorrect reference to my fc in the syntax when I tested it initially!!

Many thanks.

0 Kudos