Add support for subqueries for definition queries and selections on hosted feature layers

04-20-2022 08:22 AM
Status: Open
Occasional Contributor II

Add the ability to filter a hosted feature layer by a subquery for both definition queries and "select by attribute" workflows.

For me, I want to build a hosted view on top of a hosted layer.  The hosted view would limit the results to only the "most recent record" for each ID that exists in a column.  This query requires a date column and the ID column which is not unique.  

In a sql database this query would take on the following syntax:

For the example the table we are filtering is named myTable. The column with ID's we want to group by is called myGroupID.  The column with dates to use for finding most recent record is named myDate

select * from myTable t1

where t1.myDate =  (SELECT max(t2.myDate) from myTable t1 where t2.myGroupID = t1.myGroupID)


Other scenarios are just filtering the whole table to the one with a max value over all:

select * from myTable t1

where t1.myDate =  (SELECT max(t2.myDate) from myTable


This came up a lot with Covid dashboard data.  You have a table with one row per day with various Covid numbers.  Then some places you want to filter that master dataset to data from the latest row when sorted by date.  In other places you want to use all the data to see trends.  The operations dashboard widgets provide the extra configuration steps to do this kind of sorting and filtering.  But applying the same kind of filtering to the hosted layer itself via a hosted view is missing.   

Now, I assume that on Esri's side this would require the underlying ArcGIS Online data platform to support some kind of SQL structure.  But it wouldn't have to be RDBMS like PostGres or SQL Server.  Azure Synapse allows relatively advanced SQL queries against data lakes that include subqueries and aggregation functions.  Also, I assume that operations like this will add more load on the Esri systems since the filter needs to scan other rows and compare.  But this is handled in RDBMS with indexes.  There is already the concept in ArcGIS Online of tagging one column on a layer to be the "time aware" layer, and the trackaware GeoAnalytics tools in Pro require you to select a "Track ID" column.  If we could define these columns on the AGOL layer, they could be used to selectively build indexes on the backend to optimize these kinds of queries.