Esri Specs:
ArcGIS Server 11.1, ArcPro 3.1
PostGIS running on AWS Aurora. Not a managed geodatabase. DB views having tables, tables having geography columns, with some foreign data wrapper foreign tables joined, pointing to other systems for the per-user permissions content.
We have a unique relational data model, where the features in layers change based on the user viewing them. The primary layers are groups, users, and devices. A group has users, a user has devices. Data is segmented by customer_id.
There are 2 types of users - full administrators can see everything for their customer_id. Restricted administrators can only see certain records, based on dynamic/realtime membership in groups for their customer_id.
No user can see data for all customer_ids.
We have 2 sets of database views - for both sets of views, they are ONLY valid when a where clause restricts the data returned.
Full admin views - must include filter where customer_id = 123
Restricted admin views - must include filter where customer_id = 123 and admin_id = 456
If we were to query these views without a where clause, we are requesting an insane amount of data, beyond the capacity of the systems to deliver it. When publishing services having query layers to these views, the service times out, and will not start up.
For each layer in the service definition, when the service starts esri is asking postgis - "select * from massiveview". This is an impossible query to request for these views.
One workaround has been to have a view on top of a view - before publishing the service, we make a view
CREATE VIEW vw_all_addresses_layer_v1 as
(
select id, name, description
from all_addresses_layer
where customer_id = 123
);
and then after the service is published and started, we recreate the view to exclude the where clause
DROP VIEW IF EXISTS vw_all_addresses_layer_v1;
CREATE VIEW vw_all_addresses_layer_v1 as
(
select id:, name, description
from all_addresses_layer
);
By doing this, we are able to have the service start up, because the select * query is reasonable.
After publishing, if we query the service using the proper filters, the service responds quickly. However, we cannot ever stop and start (or recycle) this service, because it would timeout, due to the massive amount of data.
Why are "select * from queryLayer" queries being executed?
How can we prevent these "select * from queryLayer" queries from being executed?
When are they executed during a service's lifecycle? (upon start, upload uploading service definition, upon recycling)? When more instances spin up? Just the first instance?
Is there data captured from these select * requests, and stored in a file somewhere, that we could pre-populate with content, so that the query is not executed during service start up? Any count that is captured by a select * would not be valid.
Why do we see queries like this: SELECT * FROM sde.sde_layers LIMIT 0 when we are not using SDE? If we had this SDE table, would the "select *" queries still occur?
Last question - does ArcGIS server have a preference for using geometry or geography as the spatial data type? We see requests like this for tables having geography data: SELECT ST_GEOMETRYTYPE(pin::GEOMETRY), ST_SRID(pin) FROM public.vw_all_geoaddresses_layer_esri WHERE pin is not null limit 1;
Thank you, this is a big bottleneck in our potential adoption of this technology.
@TanuHoque your named was mentioned on a call earlier as an SME on this topic.
I'm not going to pretend to be a DBA or answer things on that line. But in days of old when we had issues like this, we'd publish the service with a minimum number of instances of 0 and a max of X. Clearly before the days of shared services, so you would need dedicated instances now. Once it's published, then you go into server manager and increase the min instances to the value you need.
It was a bit hacky, but got around DB issues. Not promising it will work, but give it a try and see if you can at least publish the service?
Good idea.
That does allow the service to start, with 0 instances running. When I go to the service in the rest/services directory, the first instance starts up and we get the "select *" queries failing.
DEBUG | May 17, 2023, 8:12:16 AM | Shutting down context for service: esriViews_v1.MapServer | Server |
SEVERE | May 17, 2023, 8:12:16 AM | Failed to construct instance of service 'esriViews_v1.MapServer'. Service startup timeout. | Server |
DEBUG | May 17, 2023, 8:07:38 AM | EndCursor;ExecuteQuery 7 ms;Database: ip-10-132-64-159.ec2.internal;SQL: SELECT * FROM maps_db.public.vw_all_geoaddresses_layer_v1;Number of rows returned: 0 | esriViews_v1.MapServer |
DEBUG | May 17, 2023, 8:07:18 AM | GetAllDatasets. | esriViews_v1.MapServer |
This is probably DBA territory. I once saw something similar, and the DBA had to do an awful amount of tuning and optimisation to get the query to perform well enough for ArcGIS to use it. Ultimately "in that situation" all of the issues were Database related and not ArcGIS Server. Once the database was tuned, the server worked beautifully. Sorry, I need to step away on that note.
Sorry for the delay. I just tried with a dataset that has about 65 million records. My query that used to define the query layer is a simple one i.e. "Select * from aTable".
And it works fine for me. Map service gets started without any delay or anything.
I have a question though:
Can you pls do this for me?
if they all look good and you are still running into the performance issue, I'd need you to reach out to Esri Support to help us reproduce this in house.
While we will investigating that, I think I might be able to give your a hack-y workaround 😄 since you mentioned about adding the following where-clause give you a trick to work around the initial startup... and of course you can use that as you run into this issue every time map service recycles.
where customer_id = 123
... of course whether the workaround will be acceptable in your situation will be a different question by itself. 🙂
Thank you. I do see those properties populated correctly. "spatialStorageType": 15 only difference.
The only weird thing I see, which I also noticed in server logs, is the dataConnection.dataset property has an incorrect "%" in there. "dataset": "maps_db.public.%vw_all_geoaddresses_layer_v1",
The "select * " forced us to change the data model quite a bit, removing some postgres foreign database dependencies which were causing the crash. Not optimal. Would be great if we could somehow not run this select * query for general stability reasons, that would make the overall architecture much cleaner.
Thanks @RoyJackson_AlertMedia
That looks like a bug. As I said I was not able to repro this but mine was a very simple use case. I guess this bug must be very specific to your workflow. I'd strongly recommend you to reach out to Esri Support and have an analyst look at this and if needed they will log a bug.
On % sign in the name, long story short this is expected.
While you will be waiting on an analyst took at this, I'd like you to try this workaround. For that we will use discrete parameter.
the basic idea is to use a sql that won't return any records to define the query layer and at run time we will overwrite that parameter value to return results via map service export/query operation.
the sql will look like this
SELECT <.....> from aTable
WHERE 1 = ::aValue
then set the default value for this parameter 0
That makes the query to return no records. Therefore, in the next page of query layer creation dialog, you need to specify geometry type, srs etc. yourself.
once published, you need to update the value for this query parameter with 1 to make the where clause be like 1=1... and then you can add your other clauses with AND or OR operator...
In map service export call you need to use layerParameterValues parameter. https://developers.arcgis.com/rest/services-reference/enterprise/export-map.htm
for query operation use parameterValues parameter. https://developers.arcgis.com/rest/services-reference/enterprise/query-map-service-layer-.htm
If you use map viewer, you can set these values by using custom parameters option.
This is very intriguing, experimenting with it now.
Is it possibly to configure these custom discrete parameters on querylayer using arcpy?
I'm not 100% sure on that. Since we allow accessing the full CIM definition for a layer via arcpy, maybe we should be able to do something.
What is your workflow where you want to use arcpy?
thanks
It looks like I can only specify the parameterValues / layerParameter values for MapService, not FeatureService. Is that correct? (we need feature access in this example)