Select to view content in your preferred language

Geodatabase vs database effective differences in ArcGIS Enterprise

4588
9
Jump to solution
02-23-2022 04:32 PM
Jay_Gregory
Frequent Contributor

I'm curious if  anyone can tell me the capability differences between a feature service or map service based on a feature class in an enterprise geodatabase vs from a feature class stored in an traditional database. 

Lets say I have two PostgreSQL databases: one is an Esri "enterprise geodatabase" and one is just an PostgreSQL out of the box database.  Each database has a table in it - the enterprise geodatabase has a point feature class, the regular database has a table with a latitude and longitude field.  I can publish map / feature services from each of these - there are a couple different steps to publishing off the table in the regular database (specify coordinate system and x,y fields) but they both result in a map and feature service.  Is there any difference between the services?

In general, what functionality might I lose by hosting data in a regular database vs enterprise geodatabase for the purposes of publishing map and feature services?

0 Kudos
2 Solutions

Accepted Solutions
VinceAngelo
Esri Esteemed Contributor

ArcGIS tools can read and write to PostgreSQL databases natively. No additional client library/configuration is required (unlike all the other supported databases).  I just finished a seven year stint as a PG DBA with a group that didn't want to use enterprise geodatabase functionality, but did want hundreds of millions of rows of spatial data in ArcGIS Server services. This worked fine on a query basis. All the editing was done through SQL, or by uploading file geodatabase snapshots via FeatureClassToGeodatabase, then using SQL to drive updates.  If you are using ArcGIS Enterprise to render the data, the difference between QueryLayer layers and EGDB FeatureClass layers is pretty subtle.  But if you try to use an UpdateCursor or edit through ArcGIS on a non-enabled database, and you're used to an EGDB, you are not likely to enjoy it. 

I'm not going to argue that an EGDB isn't useful, just that it isn't required in all situations.  You need to evaluate your situation. I've been using enterprise geodatabase since SDBE 1.2.1, and been a GIS DBA for over 33 years, so my view is somewhat unique. One colleague once commented that I could run every version of SDE from 2.0 to 9.3 simultaneously, on a toaster oven (but that wasn't true -- the toaster oven didn't have enough RAM for more than four active database instances). It all depends on what the application demands require [and, I should add, what the DBA is capable of supporting -- Sometimes this is just a bit of training away, others can grow into it on the job, working with geospatial types.]

- V

View solution in original post

Jay_Gregory
Frequent Contributor

@VinceAngelo - thanks for this thoughtful response - it's along the lines of what I assumed but glad to have a PG DBA's validation.  Sounds like "not a lot of differences if you're just querying vector based data."

View solution in original post

0 Kudos
9 Replies
VinceAngelo
Esri Esteemed Contributor

The difference in this case isn't "enterprise geodatabase" vs. "regular database", it's "has a spatial index" vs. "doesn't have a spatial index." IMHO, you should always use a native geometry type before  constructing geometry on the fly with each query (with the usual low row count caveat, but if you only have a thousand features, why bother with a database?). A GIST index is going to outperform a (float,float) index, and can be leveraged in ST_DWithin queries.

Once you hit the "enterprise geodatabase" vs "non-enabled database", then the difference is more subtle, mostly due to not having an extent property stored in metadata.

- V

Jay_Gregory
Frequent Contributor

@VinceAngelo Thank you for the response.  But we can use native RDMS geometry types for tables, and ensure those columns are indexed without "geoenabling" the database correct?  In part I'm playing devils advocate here, the part of a DBA hesitant to convert databases to Esri's enterprise geodatabases (because the DBAs I'm working with are very unfamiliar with ArcGIS Enterprise).  

But what I want is to come back and give a list of benefits of using enterprise geodatabases for the purposes of both a) our consumers (who will be accessing the data through Server's web services and b) the DBA.  

On the part of the DBA, the justification is more "you don't have to learn how to deal with native geometry types and in part this eases your management role."  But I'm curious if there are tangible benefits to the consumers of web services.  You mention performance, which makes sense if there is no spatial index and you're performing a lot of spatial queries.  Anything else you can think of?

0 Kudos
VinceAngelo
Esri Esteemed Contributor

ArcGIS tools can read and write to PostgreSQL databases natively. No additional client library/configuration is required (unlike all the other supported databases).  I just finished a seven year stint as a PG DBA with a group that didn't want to use enterprise geodatabase functionality, but did want hundreds of millions of rows of spatial data in ArcGIS Server services. This worked fine on a query basis. All the editing was done through SQL, or by uploading file geodatabase snapshots via FeatureClassToGeodatabase, then using SQL to drive updates.  If you are using ArcGIS Enterprise to render the data, the difference between QueryLayer layers and EGDB FeatureClass layers is pretty subtle.  But if you try to use an UpdateCursor or edit through ArcGIS on a non-enabled database, and you're used to an EGDB, you are not likely to enjoy it. 

I'm not going to argue that an EGDB isn't useful, just that it isn't required in all situations.  You need to evaluate your situation. I've been using enterprise geodatabase since SDBE 1.2.1, and been a GIS DBA for over 33 years, so my view is somewhat unique. One colleague once commented that I could run every version of SDE from 2.0 to 9.3 simultaneously, on a toaster oven (but that wasn't true -- the toaster oven didn't have enough RAM for more than four active database instances). It all depends on what the application demands require [and, I should add, what the DBA is capable of supporting -- Sometimes this is just a bit of training away, others can grow into it on the job, working with geospatial types.]

- V

Jay_Gregory
Frequent Contributor

@VinceAngelo - thanks for this thoughtful response - it's along the lines of what I assumed but glad to have a PG DBA's validation.  Sounds like "not a lot of differences if you're just querying vector based data."

0 Kudos
Jay_Gregory
Frequent Contributor

@VinceAngelo Looping back around to this, I've been trying to performance test queries on ArcGIS services built on different types of "layers" to demonstrate we should be storing spatial data with native spatial types.  In all cases, I have 500k records with the same points, lat / longs generated randomly roughly over the contiguous US.

1. Native geometry in esri enterprise geodatabase (on top of postGIS) point feature class

2. Query layer on lat / long column from non-spatial table with lat / long columns (table also exists in enterprise geodatabase though because that's all I have access to currently)

3. Database view on same table as 2. above

In the case of the database view and query layer, they were created using  SELECT *, ST_SetSrid(ST_MakePoint(longitude, latitude),4326) FROM randomtable. 

I have three MapServices published on each of those layers, and have been performing 10 spatial queries with different geometries on each, and taking the average response time of the API for each service. 

At 500k records, the differences are in the 100ths or 1000ths of a second, IMO too small.  Are the real differences as you indicated going to appear with way more records than 500k, or am I going about this incorrectly?

Thank you!

 

 

0 Kudos
VinceAngelo
Esri Esteemed Contributor

Sorry, you won't ever get my buy-in on the idea that geometry types are unnecessary.

Storing geometry in multiple scalar columns is a wasted opportunity, and a properly tuned database, even at 500k rows should significantly faster than a "make geometry on the fly" solution, so your test methodology is likely faulty. Randomly distributed features is a known worst case, so you're stacking the deck in the evaluation. You're also testing at the end of a long common pipeline, so the significance of  the difference can be masked.

I just did a quick data load of 500k and 2m random global features (by area, so weighted away from poles -- see this StackExchange post for the function), created views, then added an optimized table which was derived from the first by doing an ORDER BY on floor((lon + 180) / 6.0)::int, lat DESC adding an extra integer column of "oldid".

On my home PC (6 years old, 3Ghz 6-cores, 24Gb RAM), using a PG11.8 database, I got the following results (all times in pgAdmin)

query:
SELECT count(geomcol) FROM table
WHERE ST_Within(geomcol,
ST_GeomFromText('POLYGON (( -120 40, -80 40, -80 70, -120 70, -120 40))',4326))

tablerowsreturned counttime (in milliseconds)
example1a 500000  8212119ms
v_example1a 500000  8212250ms
example1b 500000  821280ms
example2a200000032725126ms
v_example2a200000032725613ms
example2b200000032725100ms

 

query:
SELECT idcol,lat,lon,geomcol
FROM table
WHERE ST_Within(geomcol,
ST_GeomFromText('POLYGON (( 80 40, 120 40, 120 70, 80 70,80 40))',4326))

tablerowsreturned counttime (in milliseconds)
example1a 500000  8291119ms
v_example1a 500000  8291323ms
example1b 500000  8291141ms
example2a200000033401348ms
v_example2a200000033401738ms
example2b200000033401315ms

 

When I changed the search geometry and reversed the query order (v_example2a, example2a, example2b), the view took an additional 100ms over the table, so the example1a query was helping to cache the pages for the view (I should have made 3 tables).

I used to have low-level query performance tools, but the ArcSDE API was deprecated, and those tools haven't been ported to Python yet (too busy).  Still we can see that the geometry is clearly better than "on-the-fly" generation, and that the slightest bit of defragmentation (6 degree bands) often improved performance (though not in human perception with so few rows involved).

- V

0 Kudos
Jay_Gregory
Frequent Contributor

Thank you Vince - I am onboard 100%.  I always took for granted that native geometry types are required.  My challenge is convincing other people (namely DBAs who have little to no spatial experience) of the same.  They store polygons as geojson representations in a string field.  It makes zero sense.  And the geojson doesn't even conform to spec.  I'm just trying to make a case for point data with a lat / long column, since like I mentioned I can create a query layer or database view, publish a feature service on top of it and get access to the same geospatial queries.  Unfortunately I was raised on esri geodatabases, so dealing with spatial indices, query performance, and the like outside of that environment is somewhat opaque to me.  

I'll try PGAdmin directly against my three tables and see if query performance is different.  Thanks!

0 Kudos
George_Thompson
Esri Notable Contributor

Are you using any advanced geodatabase functionality (topology  / relationship classes / networks / versioning - branch or traditional / etc.)?

You also cannot edit data in databases directly within ArcGIS Pro, if needed. Here is a great link that has alot of this information: https://pro.arcgis.com/en/pro-app/latest/help/data/databases/databases-and-arcgis.htm

Hope this helps!

--- George T.
Bud
by
Esteemed Contributor

How To: Set up an Oracle DBMS spatial table for editing via a feature service

I wonder if something similar could be done in PostgreSQL.

0 Kudos