Improve Support for Grouping, Predicate, and Aggregate SQL Functions (COUNT, GROUP BY, SUM) in ArcMap, ArcGIS Pro

3947
13
07-13-2018 08:53 AM
Status: Open
Labels (1)
AndrewStickney1
New Contributor III

I recently discovered that you cannot use standard aggregate or grouping SQL keywords (i.e. COUNT, GROUP BY, SUM) in data query functions within both ArcMap (as of 10.6) and ArcGIS Pro (I am running 2.1), at least without using a subquery - which is not always a straightforward use case. 

ArcMap: Building a query expression—Help | ArcGIS Desktop  

ArcGIS Pro: SQL reference for query expressions used in ArcGIS—ArcGIS Pro | ArcGIS Desktop 

This is not as big a deal for functions like Select by Attribute, but is a serious limitation on tools like Make Query Layer (or Table) in terms of their usefulness for data analysis and extraction workflows. 

For example, I recently had a scenario where I had a polygon feature class of tax parcels and three related tables containing tax assessment information for parcels in Missoula County: Property, Commercial, and Residential. Each parcel has a geocode (unique ID). The Property table contains general tax assessment information for each parcel and tax year. The Property ID uniquely identifies a tax parcel in a given tax year (one-to-one relationship with parcels). The Residential and Commercial tables contain assessment information on residential or commercial improvements (structures) for each parcel and tax year. There may be zero or multiple records for each tax parcel/property ID (one-to-many relationship). 

Here is an example of what these tables look like, with PropertyID 327491 highlighted. 

Example of three related tables of property assessment information to join on Property ID and tax year

If I want to count how many residential, commercial, or total structures are associated with each parcel for the whole county, currently I have a process in Modelbuilder using a combination of ArcToolbox tools (i.e. Summary Statistics and Join Field) to:

1) count (SUM) the number of PropertyID instances in the Residential and Commercial Tables by tax year.

2) Join it to the Property table. 

3) Count (SUM) the number of propertyIDs, commercial structures, and residential structures by geocode and tax year (GROUP BY). 

4) Join the aggregate structure and property ID counts to the tax parcel feature class for visualization and analysis. 

model for summarizing property assessment information (i.e. structure count) and joining to tax parcel data

While this model is helpful, it has to process over 15,000 records in each table and takes about 6-8 hours to run, so making adjustments to the model or re-running it is time-intensive. 

When I discovered the Make Query Table and Make Query Layer tools, I thought, what a great replacement! I can write an SQL query that will prepare that table for me and then I can either create a join or write a Python script to extract data from it. An example SQL Query might look like this:

SELECT Property.GeocodeSearch,Property.PropertyID,Property.TaxYear, COUNT(Res.PropertyID) AS ResCount FROM Res, COUNT(Com.PropertyID) AS ComCount FROM Com
LEFT JOIN Property ON Property.PropertyID = Res.PropertyID
LEFT JOIN Property ON Property.PropertyID = Com.PropertyID
GROUP BY Property.Geocode;
WHERE Property.TaxYear = 2017;



‍‍‍‍‍‍‍‍

Unfortunately, this is not possible with SQL on the client side (in ArcGIS Desktop - ArcMap or ArcGIS Pro) unless I converted the above SQL statement into an appropriate series of subqueries, which is beyond my skills. For those GIS users who don't have database privileges to create a view on the database side, this can result in a loss of query/extraction function and having to develop slow or clunky workarounds. 

Please consider expanding support for predicate, aggregate, and grouping functions in ArcMap and ArcGIS Pro to enable these types of data workflows. 

13 Comments
JoshuaBixby

Often times the support for SQL depends on the DBMS.  Are you working with enterprise or file geodatabases?  If the former, what error are you getting when trying to use your SQL statement to generate a Query Layer?

MarcoBoeringa

Yes, you definitely need to give more information as to the exact error you are experiencing. Just saying "it doesn't work" is not going to help to solve this.

As to one tip though:

You may also consider using the WITH "Common Table Expression" construct in your SQL. This doesn't require the right to create views in your database, yet still achieves something close to it, causing the database to create a kind of temporary table that should act much like the view you may need.

SQL WITH: Organize Complex Queries 

AndrewStickney1

Marco Boeringa‌, Joshua Bixby‌. I thought my original post was clear, but here's another example to illustrate the problem further. 

If you followed the help links, you would realize that you cannot use an aggregate function (i.e. COUNT) in a primary query - like the following simple statement that counts the number of PropertyIDs in the Res associated with a geocode uniqueID and the PropertyID Join field. 

SELECT Property.Geocode, Property.PropertyID, Property.TaxYear, Res.PropertyID, Count(Res.PropertyID) as IDCount 
FROM Property, Res
LEFT JOIN Property.PropertyID = Res.PropertyID
WHERE Property.TaxYear = 2017
GROUP BY Property.Geocode
;‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Instead, you can only use aggregate functions within a subquery, which would look something like this. 

SELECT Property.Geocode, Property.PropertyID, Property.TaxYear 
WHERE Property.TaxYear = 2017 AND Property.PropertyID IN 
     (
      SELECT Res.PropertyID, Count(RES.PropertyID) as IDCount FROM PropertyID
      WHERE IDCount > 0
      );
‍‍‍‍‍‍‍‍‍‍‍

The problem with the second example is:

1) I don't have access to the original count of PropertyIDs...I only know which PropertyIDs have a match to the subquery. Even if I used the ANY keyword to get all records, this is a loss of data for me and not helpful to my analysis.

2) To get close to my original goal of compiling counts (or just count matches) from both Res and Com tables, I would need to make two query table versions of the second SQL example and then a third query table or layer to merge the first two together into a single view. 

I hope this explains the problem more clearly. 

JoshuaBixby

For ArcGIS Pro, the Query Layer functionality has been improved.  Using ArcGIS Pro 2.2, I just created a SQL statement using COUNT and GROUP BY on non-spatial data, and it validated and loaded into Pro just fine.  The same query on the same data fails in ArcMap, ostensibly because there is no spatial column, but I am guessing it would fail even with one.

I think what you want is already implemented in Pro.  I also think what you want will never be implemented in ArcMap, but such is life with legacy products.

AndrewStickney1

Joshua Bixby‌,

Would you be able to post an example of the SQL statement with the COUNT that worked in ArcGIS Pro (or send it to me privately)? The Esri support person who I contacted about this previously said that it could not be done in Pro (at 2.1). It would be helpful to see what you did and see whether I misunderstood the limitatons (or misread the documentation). 

Thank you.

JoshuaBixby

I was just doing some quick checking, but the structure of what I tried and got to work was:

SELECT field1, COUNT(field1) FROM table GROUP BY field1

I can try something with a joined table a bit later and let you know

MarcoBoeringa

Andrew,

Have you, or haven't you, tried the SQL WITH that I suggested to you? I am pretty sure this will solve your issue.

I have used WITH with some pretty horrendous SQL statements in both ArcMap and ArcGIS Pro without issues.

You also still haven't answered the question as to what exactly doesn't function or what (database) error you receive when using your SQL statement as part of a Query Layer's SQL. Please post the geoprocessing error or database error here (e.g. as captured screenshot by using PrtScreen and paste) instead of referring back to the Help and another example query.

AndrewStickney1

Marco Boeringa‌, it took me quite a while to understand how to use the WITH statement, but I think I have figured it out. I  use the site SQL Fiddle to test and refine the SQL statements I needed, and here is the final code.

WITH ResQuery As (
SELECT
  Res.PropertyID as ResPID,
  Count(Res.PropertyID) as ResCount
FROM
  Res
WHERE Res.TaxYear = 2017
GROUP BY Res.PropertyID
  ), 
ComQuery As (
SELECT
  Com.PropertyID as ComPID,
  Count(Com.PropertyID) as ComCount
FROM
  Com
WHERE Com.TaxYear = 2017
GROUP BY Com.PropertyID
  ),
PropQuery As (
SELECT
  Property.PropertyID as PropPID,
  Property.Geocode as GCode,
  Property.GeocodeSearch as GCodeSearch,
  Property.TaxYear as Tax
FROM
  Property
WHERE Property.TaxYear = 2017
  ),
ComboQuery as 
(
  SELECT P.PropPID as PID, P.GCodeSearch as Code, Count(R.PropertyID) as RCount, Count(C.PropertyID) as CCount
  FROM PropQuery as P LEFT JOIN Res as R ON P.PropPID = R.PropertyID LEFT JOIN Com as C ON P.PropPID = C.PropertyID
  WHERE P.Tax = 2017
  GROUP BY P.PropPID, P.GCodeSearch
)

SELECT * From ComboQuery
;

A direct link to the SQL Fiddle with my query and test data is here: SQL Fiddle 

Unfortunately, I received an error message when copying and pasting the code into ArcGIS Pro's  Make Query Layer tool. The error message is: 000358: Invalid expression.—Help | ArcGIS Desktop  The specific underlying client error is: 'Microsoft SQL Server Native Client 11.0 - Incorrect syntax near the keyword 'WITH'] as shown in the screenshot below.

Have you encountered errors when using the WITH clause in ArcGIS before? The only thing I have heard on forums is adding a ";" before the WITH clause, but otherwise, I am stumped. 

Any suggestions?

MarcoBoeringa

Hi Andrew,

Yes, WITH can seem a bit confusing at first, but is actually quite simple. Unfortunately, many of the examples given on the internet start out with example SQL statements that are way to complex for anyone trying to understand it properly.

In essence, WITH boils down to:

WITH <YOUR_TABLE_ALIAS_NAME> AS

(

<PUT THE ORIGINAL - UNMODIFIED! - QUERY HERE THAT GIVES YOU ISSUES>

)

SELECT * FROM <YOUR_TABLE_ALIAS_NAME> WHERE <A POTENTIAL WHERE CLAUSE>

So you simply need to put the original query within the brackets of the name alias definition of the WITH, and use the alias name a second time in a SELECT * FROM...WHERE after the closing bracket.

So the alias should appear twice. I only see it once in your query (ResQuery) and I miss the closing bracket for the alias definition.

So I guess you want to have something more like this, notice the last line with closing bracket and second reference to the alias name. Please note I have never used a construct like you do here with queries concatened with comma's, so don't know entirely sure if this is the right syntax to make it work.:

NOTE: SQL removed as the answer was incorrect!

MarcoBoeringa

Andrew,

Sorry, I did not look at the SQL Fiddle link you posted. I now see that the syntax and SQL statement you posted, actually does pass the validation in SQL Fiddle, so is probably correct, and my first suggestion of an alternative or fix in the post above incorrect. I have hence removed the suggestion.