For comparison, I have now also looked at what happens in ArcMap, and there does seem to be some vital differences.
What I did:
- Drag & drop the huge table in ArcMap (You can't actually do this yet in Pro by the way!) from ArcMap's Catalog window. ArcMap then opens a dialog about the need to calculate the "spatial extent". This is also a slow operation in ArcMap, which also seems linked to a full table scan (which I also do not fully understand, can't this information be derived from the spatial index much faster?). The SQL of the resulting Query Layer is equivalent to the SELECT * FROM TABLE_NAME of the last post, albeit with the * wildcard replaced by a full list of all the database columns of the source table.
- Without any Definition Query set, I opened up the Attribute Table in ArcMap. This is still a slightly costly operations, taking one or a few minutes, but not nearly as bad as in Pro taking half an hour in the same situation.
Now, when I look at the PostgreSQL logs, I see at least one difference with Pro. Opening the attribute table seems to happen in two steps. Notice that when the first BINARY CURSOR is created, the SQL statement contains a WHERE clause with what appears to be a spatial constraint. I think I saw this step happening in Pro as well.
Anyway, when you look at the second highlighted BINARY CURSOR being created, which represent the moment ArcMap really starts creating the actual contents for the Attribute Table of the Query Layer, you will notice (highlighted in blue) that ArcMap actually does restrict the record set to the first 100 records or so, by inserting their OBJECTIDs as part of the a large IN (...) statement.
I also did another observation regarding ArcGIS Pro: when I open the Attribute Table the first time, it takes half an hour. Any subsequent opening of the Attribute Table is much faster, and seems equivalent to ArcMap.
As a consequence, I am now getting convinced that the behavior I am seeing is actually NOT A BUG, but a change in how ESRI decided to handle Query Layers and specifically the calculation of the spatial extent of such Query Layer:
- In ArcMap, when you drag a spatial database in the TOC, a dialog opens up explicitly warning you that the "spatial extent" (Calculate Extent is the title of the small dialog) is being calculated and that this may be a costly operation (likely due to the full table scan taking place in the background), and even offers override options by allowing you to manually enter them, or use the Spatial Reference's maximum extent.
- In ArcGIS Pro, no such dialog pops up. Instead, it seems to leave a kind of "unitialized" layer in the TOC. What then happens is that Pro appears to start calculating this spatial extent the first time you open up the Attribute Table or use some geoprocessing tool on the layer, like I have witnessed with the "Apply Symbology From Layer" tool that takes forever to open up when such a Query Layer layer is present. Afterwards, this information seems cached, and being re-used the next time you e.g. open the Attribute Table of the Query Layer.
While Pro's implementation takes away the "annoying" spatial extent calculation dialog that pops up in ArcMap, the ArcMap method, with an explicit calculation when you add the data, is far more desirable for large layers, as it won't lock up the application in an unexpected manner for long periods of time without the user knowing why.
If I am right, I also think the calculation of the spatial extent of Query Layers really needs to be reviewed by ESRI and find out if there isn't a much more efficient way to do this, e.g. by taking a "random sample" of e.g. 1% of the records in case of large tables. It now seems to require a full table scan both in ArcMap and Pro, which is outrages in the context of database tables of tens of millions of records. I do not know if there are restrictions I don't understand, that ESRI has to abide by, that cause the current implementation in ArcMap and Pro regarding this, but it would be highly desirable to have a much more efficient method for this, thus taking away a performance bottleneck on Query Layers based on millions of records.
