We have some simple large layers (about 1M features) that we like to display and use in Dashboards with some statistics.
We can save them to hosted database (that is really Postgres) or build our out Postgres database.
There are some advantages and disadvantage to both option.
Do you think the performances should be different?
Do you think esri build the Postgres of the hosted in a different way that we install a local Postgres?
Thanks
It depends on what type of work you are going to do with the database, and your current connection to your local database, and so what your expectations are. E.g. if you just want to do an occasional update of a limited number of records and / or view the data, or if you need fast low latency access because you are going to re-write millions of records on a regular basis.
Remote databases can have a very large latency (the time to do a round trip from you local machine to the server and back), which can kill the performance if you need to fire thousands or even millions of requests to it. In case you run a remote database and need low latency access, you may need a virtual machine running ArcGIS Pro in the same data center or area, so as to avoid the cost of the round trip.
E.g. I personally run a local PostgreSQL database filled with the entire planet's worth of OpenStreetMap data on a professional dual CPU 44 core HP Z840 workstation. In the most outrages configuration, I filled it with Facebooks now defunct "Daylight distribution" of OpenStreetMap, which integrated an additional >1 Billion buildings of Google "Open Buildings", causing the largest table to exceed >2.4 Billion records. Yes, you read that right, billions!
I subsequently process and update hundreds of millions of records in derived tables. I certainly wouldn't want to do this against a remote database with latencies running in the 100's of milliseconds. It would take ages. Running this all locally against PCIe NVMe drives, I am capable of updating hundreds of millions of records per hour from ArcGIS Pro (although batched in sets of tens of thousands to reduce the number of requests). This would be unthinkable against a remote database, unless running as suggested above, a VM with Pro in the same data center as the remote database.
Finally, running a local database, although a burden, can learn you a lot about the proper configuration and running of a PostgreSQL database and server, knowledge that you may not gain with a remote database and its fully pre-installed configuration (which may or may not suite your use case or be adjustable to your taste or needs).
Hi Marco
Thanks for the information but our configuration are a little different.
Both the hosted database and the stand alone Postgres are in the same network segment and have identical network.
The question is really did esri done any tricks to make the hosted better then just standard installation of Postgres?
Thanks
I am not sure what you mean. If the hosted database is based on PostgreSQL, there is no "magic spell" to make it better than any other PostgreSQL installation.
In the distant past, ArcSDE - one of the backbones of ESRI database technology - supported the 'RAW' spatial storage, which, due to being directly convertible to the ESRI internal geometry representation in applications, was generally regarded as a more performant option over databases 'native' geometry implementations for ESRI software (although ESRI never really made much noise about this or admitted to it). However, 'RAW' is past in favor of 'native' or ESRI ST_Geometry geometry storage in all databases AFAIK.
If with your remark, you are just referring to PostgreSQL configuration as done with e.g. the 'postgresql.conf' file, I suggest you start delving into the wonderful world of PostgreSQL configuration and read up about the subject. There is definitely stuff you need to adjust in there, as the default settings are not suited for large (Post-)GIS databases, but just the absolute minimum to get a server running on minimal hardware.
Hi Marco
We are going to use local PostgreSQL to keep very few (5-10) layers with about 1-2M features each.
We need best performance with drawing and some statistics in dashboards.
Can you give a few hints about the configuration assuming that I do not have the time to become an expert in PostgreSQL configuration?
Thanks
Honestly, that hardly sound like anything needing "special configuration". It is a tiny database by all measures nowadays. As said, I have run a PostgreSQL database with >2.5 B records on a single server. Secondly, any PostgreSQL configuration is also tightly coupled with the actual hardware specs. E.g. something like the 'shared_buffers' setting is almost always recommended to set as a percentage of your RAM, so unless you already know or share your configuration here, there is no sensible recommendation.
That said, the three things that really will make a difference:
- Do no run from HDD, use NVMe attached PCIe SSDs. It is hardly a question nowadays, but up to maybe five years ago, I still regularily read about people putting their database on HDD, which kills performance compared to modern NVMe SSDs.
- Switch off 'synchronous_commit' in the postgresql.conf file. This is probably the single most effective configuration you can make if you do not need it (no replication to standby servers) to enhance performance on writes / UPDATEs.
- Set an appropriate 'shared_buffers' setting to give PostgreSQL breathing room in RAM.
All other settings are likely to only contribute minor, especially in a minor use case like yours.