Geodatabase vs database effective differences in ArcGIS Enterprise

Jay_Gregory · ‎02-23-2022

I'm curious if anyone can tell me the capability differences between a feature service or map service based on a feature class in an enterprise geodatabase vs from a feature class stored in an traditional database.

Lets say I have two PostgreSQL databases: one is an Esri "enterprise geodatabase" and one is just an PostgreSQL out of the box database. Each database has a table in it - the enterprise geodatabase has a point feature class, the regular database has a table with a latitude and longitude field. I can publish map / feature services from each of these - there are a couple different steps to publishing off the table in the regular database (specify coordinate system and x,y fields) but they both result in a map and feature service. Is there any difference between the services?

In general, what functionality might I lose by hosting data in a regular database vs enterprise geodatabase for the purposes of publishing map and feature services?

VinceAngelo · ‎02-24-2022

ArcGIS tools can read and write to PostgreSQL databases natively. No additional client library/configuration is required (unlike all the other supported databases). I just finished a seven year stint as a PG DBA with a group that didn't want to use enterprise geodatabase functionality, but did want hundreds of millions of rows of spatial data in ArcGIS Server services. This worked fine on a query basis. All the editing was done through SQL, or by uploading file geodatabase snapshots via FeatureClassToGeodatabase, then using SQL to drive updates. If you are using ArcGIS Enterprise to render the data, the difference between QueryLayer layers and EGDB FeatureClass layers is pretty subtle. But if you try to use an UpdateCursor or edit through ArcGIS on a non-enabled database, and you're used to an EGDB, you are not likely to enjoy it.

I'm not going to argue that an EGDB isn't useful, just that it isn't required in all situations. You need to evaluate your situation. I've been using enterprise geodatabase since SDBE 1.2.1, and been a GIS DBA for over 33 years, so my view is somewhat unique. One colleague once commented that I could run every version of SDE from 2.0 to 9.3 simultaneously, on a toaster oven (but that wasn't true -- the toaster oven didn't have enough RAM for more than four active database instances). It all depends on what the application demands require [and, I should add, what the DBA is capable of supporting -- Sometimes this is just a bit of training away, others can grow into it on the job, working with geospatial types.]

- V

View solution in original post

Jay_Gregory · ‎02-24-2022

@VinceAngelo - thanks for this thoughtful response - it's along the lines of what I assumed but glad to have a PG DBA's validation. Sounds like "not a lot of differences if you're just querying vector based data."

View solution in original post

VinceAngelo · ‎02-23-2022

The difference in this case isn't "enterprise geodatabase" vs. "regular database", it's "has a spatial index" vs. "doesn't have a spatial index." IMHO, you should always use a native geometry type before constructing geometry on the fly with each query (with the usual low row count caveat, but if you only have a thousand features, why bother with a database?). A GIST index is going to outperform a (float,float) index, and can be leveraged in ST_DWithin queries.

Once you hit the "enterprise geodatabase" vs "non-enabled database", then the difference is more subtle, mostly due to not having an extent property stored in metadata.

- V

Jay_Gregory · ‎02-24-2022

@VinceAngelo Thank you for the response. But we can use native RDMS geometry types for tables, and ensure those columns are indexed without "geoenabling" the database correct? In part I'm playing devils advocate here, the part of a DBA hesitant to convert databases to Esri's enterprise geodatabases (because the DBAs I'm working with are very unfamiliar with ArcGIS Enterprise).

But what I want is to come back and give a list of benefits of using enterprise geodatabases for the purposes of both a) our consumers (who will be accessing the data through Server's web services and b) the DBA.

On the part of the DBA, the justification is more "you don't have to learn how to deal with native geometry types and in part this eases your management role." But I'm curious if there are tangible benefits to the consumers of web services. You mention performance, which makes sense if there is no spatial index and you're performing a lot of spatial queries. Anything else you can think of?

VinceAngelo · ‎02-24-2022

ArcGIS tools can read and write to PostgreSQL databases natively. No additional client library/configuration is required (unlike all the other supported databases). I just finished a seven year stint as a PG DBA with a group that didn't want to use enterprise geodatabase functionality, but did want hundreds of millions of rows of spatial data in ArcGIS Server services. This worked fine on a query basis. All the editing was done through SQL, or by uploading file geodatabase snapshots via FeatureClassToGeodatabase, then using SQL to drive updates. If you are using ArcGIS Enterprise to render the data, the difference between QueryLayer layers and EGDB FeatureClass layers is pretty subtle. But if you try to use an UpdateCursor or edit through ArcGIS on a non-enabled database, and you're used to an EGDB, you are not likely to enjoy it.

I'm not going to argue that an EGDB isn't useful, just that it isn't required in all situations. You need to evaluate your situation. I've been using enterprise geodatabase since SDBE 1.2.1, and been a GIS DBA for over 33 years, so my view is somewhat unique. One colleague once commented that I could run every version of SDE from 2.0 to 9.3 simultaneously, on a toaster oven (but that wasn't true -- the toaster oven didn't have enough RAM for more than four active database instances). It all depends on what the application demands require [and, I should add, what the DBA is capable of supporting -- Sometimes this is just a bit of training away, others can grow into it on the job, working with geospatial types.]

- V

Jay_Gregory · ‎02-24-2022

@VinceAngelo - thanks for this thoughtful response - it's along the lines of what I assumed but glad to have a PG DBA's validation. Sounds like "not a lot of differences if you're just querying vector based data."

Jay_Gregory · ‎08-15-2022

@VinceAngelo Looping back around to this, I've been trying to performance test queries on ArcGIS services built on different types of "layers" to demonstrate we should be storing spatial data with native spatial types. In all cases, I have 500k records with the same points, lat / longs generated randomly roughly over the contiguous US.

1. Native geometry in esri enterprise geodatabase (on top of postGIS) point feature class

2. Query layer on lat / long column from non-spatial table with lat / long columns (table also exists in enterprise geodatabase though because that's all I have access to currently)

3. Database view on same table as 2. above

In the case of the database view and query layer, they were created using SELECT *, ST_SetSrid(ST_MakePoint(longitude, latitude),4326) FROM randomtable.

I have three MapServices published on each of those layers, and have been performing 10 spatial queries with different geometries on each, and taking the average response time of the API for each service.

At 500k records, the differences are in the 100ths or 1000ths of a second, IMO too small. Are the real differences as you indicated going to appear with way more records than 500k, or am I going about this incorrectly?

Thank you!

VinceAngelo · ‎08-16-2022

Sorry, you won't ever get my buy-in on the idea that geometry types are unnecessary.

Storing geometry in multiple scalar columns is a wasted opportunity, and a properly tuned database, even at 500k rows should significantly faster than a "make geometry on the fly" solution, so your test methodology is likely faulty. Randomly distributed features is a known worst case, so you're stacking the deck in the evaluation. You're also testing at the end of a long common pipeline, so the significance of the difference can be masked.

I just did a quick data load of 500k and 2m random global features (by area, so weighted away from poles -- see this StackExchange post for the function), created views, then added an optimized table which was derived from the first by doing an ORDER BY on floor((lon + 180) / 6.0)::int, lat DESC adding an extra integer column of "oldid".

On my home PC (6 years old, 3Ghz 6-cores, 24Gb RAM), using a PG11.8 database, I got the following results (all times in pgAdmin)

query:
SELECT count(geomcol) FROM table
WHERE ST_Within(geomcol,
ST_GeomFromText('POLYGON (( -120 40, -80 40, -80 70, -120 70, -120 40))',4326))

table	rows	returned count	time (in milliseconds)
example1a	500000	8212	119ms
v_example1a	500000	8212	250ms
example1b	500000	8212	80ms
example2a	2000000	32725	126ms
v_example2a	2000000	32725	613ms
example2b	2000000	32725	100ms

query:
SELECT idcol,lat,lon,geomcol
FROM table
WHERE ST_Within(geomcol,
ST_GeomFromText('POLYGON (( 80 40, 120 40, 120 70, 80 70,80 40))',4326))

table	rows	returned count	time (in milliseconds)
example1a	500000	8291	119ms
v_example1a	500000	8291	323ms
example1b	500000	8291	141ms
example2a	2000000	33401	348ms
v_example2a	2000000	33401	738ms
example2b	2000000	33401	315ms

When I changed the search geometry and reversed the query order (v_example2a, example2a, example2b), the view took an additional 100ms over the table, so the example1a query was helping to cache the pages for the view (I should have made 3 tables).

I used to have low-level query performance tools, but the ArcSDE API was deprecated, and those tools haven't been ported to Python yet (too busy). Still we can see that the geometry is clearly better than "on-the-fly" generation, and that the slightest bit of defragmentation (6 degree bands) often improved performance (though not in human perception with so few rows involved).

- V

Jay_Gregory · ‎08-16-2022

Thank you Vince - I am onboard 100%. I always took for granted that native geometry types are required. My challenge is convincing other people (namely DBAs who have little to no spatial experience) of the same. They store polygons as geojson representations in a string field. It makes zero sense. And the geojson doesn't even conform to spec. I'm just trying to make a case for point data with a lat / long column, since like I mentioned I can create a query layer or database view, publish a feature service on top of it and get access to the same geospatial queries. Unfortunately I was raised on esri geodatabases, so dealing with spatial indices, query performance, and the like outside of that environment is somewhat opaque to me.

I'll try PGAdmin directly against my three tables and see if query performance is different. Thanks!

George_Thompson · ‎02-24-2022

Are you using any advanced geodatabase functionality (topology / relationship classes / networks / versioning - branch or traditional / etc.)?

You also cannot edit data in databases directly within ArcGIS Pro, if needed. Here is a great link that has alot of this information: https://pro.arcgis.com/en/pro-app/latest/help/data/databases/databases-and-arcgis.htm

Hope this helps!

--- George T.

Bud · ‎04-04-2022

How To: Set up an Oracle DBMS spatial table for editing via a feature service

I wonder if something similar could be done in PostgreSQL.