Starting a feature service with query layers results in select * from massiveview, crashing database and service doesn't start

RoyJackson_AlertMedia · ‎05-16-2023

Esri Specs:

ArcGIS Server 11.1, ArcPro 3.1

PostGIS running on AWS Aurora. Not a managed geodatabase. DB views having tables, tables having geography columns, with some foreign data wrapper foreign tables joined, pointing to other systems for the per-user permissions content.

We have a unique relational data model, where the features in layers change based on the user viewing them. The primary layers are groups, users, and devices. A group has users, a user has devices. Data is segmented by customer_id.

There are 2 types of users - full administrators can see everything for their customer_id. Restricted administrators can only see certain records, based on dynamic/realtime membership in groups for their customer_id.

No user can see data for all customer_ids.

We have 2 sets of database views - for both sets of views, they are ONLY valid when a where clause restricts the data returned.

Full admin views - must include filter where customer_id = 123

Restricted admin views - must include filter where customer_id = 123 and admin_id = 456

If we were to query these views without a where clause, we are requesting an insane amount of data, beyond the capacity of the systems to deliver it. When publishing services having query layers to these views, the service times out, and will not start up.

For each layer in the service definition, when the service starts esri is asking postgis - "select * from massiveview". This is an impossible query to request for these views.

One workaround has been to have a view on top of a view - before publishing the service, we make a view

CREATE VIEW vw_all_addresses_layer_v1 as
(
    select id, name, description
    from all_addresses_layer
    where customer_id = 123

);

and then after the service is published and started, we recreate the view to exclude the where clause

DROP VIEW IF EXISTS vw_all_addresses_layer_v1;
CREATE VIEW vw_all_addresses_layer_v1 as
(
    select id:, name, description
    from all_addresses_layer

);

By doing this, we are able to have the service start up, because the select * query is reasonable.

After publishing, if we query the service using the proper filters, the service responds quickly. However, we cannot ever stop and start (or recycle) this service, because it would timeout, due to the massive amount of data.

Why are "select * from queryLayer" queries being executed?

How can we prevent these "select * from queryLayer" queries from being executed?

When are they executed during a service's lifecycle? (upon start, upload uploading service definition, upon recycling)? When more instances spin up? Just the first instance?

Is there data captured from these select * requests, and stored in a file somewhere, that we could pre-populate with content, so that the query is not executed during service start up? Any count that is captured by a select * would not be valid.

Why do we see queries like this: SELECT * FROM sde.sde_layers LIMIT 0 when we are not using SDE? If we had this SDE table, would the "select *" queries still occur?

Last question - does ArcGIS server have a preference for using geometry or geography as the spatial data type? We see requests like this for tables having geography data: SELECT ST_GEOMETRYTYPE(pin::GEOMETRY), ST_SRID(pin) FROM public.vw_all_geoaddresses_layer_esri WHERE pin is not null limit 1;

Thank you, this is a big bottleneck in our potential adoption of this technology.

@TanuHoque your named was mentioned on a call earlier as an SME on this topic.

TanuHoque · ‎05-31-2023

hi @RoyJackson_AlertMedia

yes, that is correct.
that said, I thought you can't enable Feature Access with a database view is used as a source of a query layer.... An analyzer error should block you from publishing such layer as an Enterprise by ref feature service.

So, you have some editing workflow with this type of layers?

RoyJackson_AlertMedia · ‎06-12-2023

We are exposing customer-specific addresses, mobile devices on a map. No user can ever see the whole database, which would result in giant fines for revealing personally identifiable information. Depending on who you are in the customer, you also see different information.

We need to use feature service so that we can interact with these pins using the client-side geometry engine for geofencing, dynamic distance calculations.

An API server in between the web mapping client, and arcgis server, determines which features to display (either with a customer identifier scraped from oauth token, or a list of allowable ids calculated by db lookups / hitting another API). So this API constructs the where clause to perform the appropriate filtering, and passes that to arcgis server.

The publishing workflow is - we need to publish these services programatically to a slew of environments (testing, staging, load testing, prod1 prod2 etc) via CI/CD. So anything we need to setup for a service, we can't really have someone clicking around in ArcPro. Ideally we could create one service definition with pro, and deploy that to different arcgis servers, pointing the definition to different databases, via scripts running on the server instance (not arcpro due to headless processing being something custom to implement if I understand correctly)

I looked and played a bit at the CIM object in python, it didn't seem possible to make the configuration. It barked that the programmatic changes relevant to this topic were not valid.

Thank you.