subquery in personal geodatabase giving incorrect results

4562
6
Jump to solution
05-06-2015 09:49 PM
GarretDuffy
Occasional Contributor

Hi,

I am having a strange problem with an SQL query that I have used successfully in the past, namely:

SELECT * FROM TABLE WHERE:

[ELEVATION] IN (SELECT MAX( [ELEVATION] ) FROM TABLE GROUP BY [GRID_BOX] )

(I have binned many mountain peaks into a grid using the INTERSECT TOOL to give the GRID_BOX field.)

I run the query without any errors but the results are not at all as expected.  Instead of returning a single point per grid square, 1 or more points are being selected.  I know the issues with subqueries but I have run this exact query on another dataset successfully but when running with this dataset it just doesn't work.  Are there any issues with FLOATs or INTs I need to be aware of?  I am at my wits end having spent many hours on this problem and I would be so grateful for any pointers.

I would like to upload the personal geodatabase somewhere but not sure how to do this?

Thanks.

Garret

0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

Garret Duffy, thanks for uploading sample data, it helps to troubleshoot and explain what is fouled up.

As I suspected, the problem is with the elevation values being returned by the subquery and how the IN clause is structured.  The elevation values are fine, in and of themselves, and the subquery and query are returning correct results based on the SQL as written.  Let's take a closer look at the example in your attached screenshot:

table_sql_group_by_max_problem.PNG

The Sample_Dataset table is in the background with the results of your subquery in the foreground.  Although the subquery is calculating the maximum elevation value by the FID_T_25_projected column, it is only returning a result set of elevation values.  With the query as written, two rows with FID_T_25_projected equal to 663 are being returned instead of one because there are two 663 records that have an elevation matching one of the elevation values in the subquery.

Even if you were to include FID_T_25_projected in the SELECT list of the subquery, the query as written still wouldn't return the results you want.  If you were working with one of the database platforms that support multiple column IN clauses, like Oracle, a fairly small change to your SQL would get things workings.  Unfortunately, neither file nor personal geodatabases support such functionality.

There are numerous ways to solve this riddle using SQL natively within a DBMS, but you are working within the confines of ArcGIS and the limitations that imposes on structuring SQL.  Fortunately, you are working with personal geodatabases so there is a fairly straightforward solution:

objectid IN (
    SELECT objectid
      FROM sample_dataset t1
         ,(SELECT fid_t_25_projected, MAX(elevation) as maxe
             FROM sample_dataset
         GROUP BY fid_t_25_projected) t2
     WHERE t1.fid_t_25_projected = t2.fid_t_25_projected
       AND t1.elevation = t2.maxe)

The above SQL works with personal geodatabases and desktop geodatabases/personal SDE, but it doesn't work with file geodatabases or shapefiles.  I suspect the above SQL would work with all enterprise geodatabases, but I haven't verified it.

View solution in original post

6 Replies
JoshuaBixby
MVP Esteemed Contributor

It would be good if you could upload your data or some sample data.  To upload an attachment, click on "Use advanced editor" in the upper right corner, and then click on "Attach" in the lower right corner.

Without knowing more about your data, I have to speculate.  Seeing you have run this query before on different data with expected results, my guess is that you have non-unique elevation values being returned by your subquery (multiple grid boxes have same maximum elevation).  The select * part makes it really difficult to understand what you are after and what the unique identifier may be.

GarretDuffy
Occasional Contributor

Hi Joshua,

Thanks for the response and for the info on how to upload .

I ended up working around the problem in elegantly using Excel but obviously that is unsatisfactory moving forward.

I've attached a sample of the data in the personal geodatabase.

If you run the following query in the "Select by Attributes" function within the attributes window in ArcMap:

[ELEVATION] IN (SELECT MAX( [ELEVATION] ) FROM Sample_Dataset GROUP BY [FID_T_25_projected] )

You will see that two points are selected with in one grid square (FID 663 see attached screenshot).  There should only be one selected (the maximum value).  If I uploaded the entire dataset you would see that in some cases as many as 7 points are seemingly randomly selected instead of the single maximum value.

Any tips appreciated!

Garret

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Garret Duffy, thanks for uploading sample data, it helps to troubleshoot and explain what is fouled up.

As I suspected, the problem is with the elevation values being returned by the subquery and how the IN clause is structured.  The elevation values are fine, in and of themselves, and the subquery and query are returning correct results based on the SQL as written.  Let's take a closer look at the example in your attached screenshot:

table_sql_group_by_max_problem.PNG

The Sample_Dataset table is in the background with the results of your subquery in the foreground.  Although the subquery is calculating the maximum elevation value by the FID_T_25_projected column, it is only returning a result set of elevation values.  With the query as written, two rows with FID_T_25_projected equal to 663 are being returned instead of one because there are two 663 records that have an elevation matching one of the elevation values in the subquery.

Even if you were to include FID_T_25_projected in the SELECT list of the subquery, the query as written still wouldn't return the results you want.  If you were working with one of the database platforms that support multiple column IN clauses, like Oracle, a fairly small change to your SQL would get things workings.  Unfortunately, neither file nor personal geodatabases support such functionality.

There are numerous ways to solve this riddle using SQL natively within a DBMS, but you are working within the confines of ArcGIS and the limitations that imposes on structuring SQL.  Fortunately, you are working with personal geodatabases so there is a fairly straightforward solution:

objectid IN (
    SELECT objectid
      FROM sample_dataset t1
         ,(SELECT fid_t_25_projected, MAX(elevation) as maxe
             FROM sample_dataset
         GROUP BY fid_t_25_projected) t2
     WHERE t1.fid_t_25_projected = t2.fid_t_25_projected
       AND t1.elevation = t2.maxe)

The above SQL works with personal geodatabases and desktop geodatabases/personal SDE, but it doesn't work with file geodatabases or shapefiles.  I suspect the above SQL would work with all enterprise geodatabases, but I haven't verified it.

GarretDuffy
Occasional Contributor

Hi Joshua,

Thanks for this clear explanation.  I guess my simple query worked on a simple dataset purely by chance because the ELEVATIONS were floats so were unique figures just by chance. For this dataset, the elevations are integers so were less likely to be unique.

I was just wondering how exactly to implement your SQL into a definition query in Arcmap?

I tried this:

[OBJECTID] IN ( SELECT [OBJECTID] FROM NW_US_mountains_intersect t1, (SELECT [FID_T25km_grid_washington_IDAHO_MONTANA], MAX([ELEVATION]) as maxe FROM NW_US_mountains_intersect GROUP BY [FID_T25km_grid_washington_IDAHO_MONTANA] t2 WHERE t1.[FID_T25km_grid_washington_IDAHO_MONTANA] = t2.FID_T25km_grid_washington_IDAHO_MONTANA] AND t1.[ELEVATION] = t2.maxe)

Thanks!

Garret

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I just tried my original SQL against your sample dataset, and it worked with a definition query.  So, it appears the functionality is there, now it is a question of why it isn't working for you.

Are you getting an error message or simply unexpected results?  I do notice you are missing a left bracket on the t2.[FID...] in the last statement of the where clause.

0 Kudos
GarretDuffy
Occasional Contributor

Hi Joshua,

I re-checked my SQL and your query works!  Thanks so much for this; I tip my hat to your SQL expertise!

Garret

0 Kudos