What is ArcGIS Data Store under the hood? Sql Server Express? Sqlite? Something proprietary?
Hi Jim! To answer your question, the Datastore uses PostgreSQL under the hood.
Have you seen or recommend any detailed documentation on the guts of ArGIS Data Store and how it integrates with Portal? Seems like a black box.
It seems like Datastore is meant to be a simple solution for organizations who want to use Portal to host feature layers without necessarily having to deal with the inner workings of the underlying database. So in a sense it was intentionally designed to be a black box. As such I would not expect to see any detailed information about the framework of Datastore made public.
Yes, PostGreSQL - without the ArcObjects / SDE middleware. That gives the DataStore the advantage of being able to support a lot more Feature Services but on the other hand, don't expect to make a direct connect using pgAdmin. It's possible to direct connect but you can't do anything with the data because there is no ArcObjects / SDE interface to convert the ArcGIS Commands to a DB operation. It's all REST API with the datastore.
So once you've loaded data into it via a service, can you access it via catalog to export, or do you have to access it completely from the feature service?
In short; Yes. You must access it from the Feature Service exclusively. You can add the Feature Service to ArcMap and then right click and export it to a local FC. However, I don't understand why you would. You'd essentially be creating a standalone copy of that data. A better idea might be to right-click on the service, choose editing and create local copy for editing. That way, you can synchronize your edits back to the Service, which will update the data in the datastore.
The problem with thinking about Hosted Services in the same way that you might think about traditional services is that Hosted Services do not have the SDE middleware sitting on top of the Database to expose it to ArcObjects. Without that there to act as the liason between ArcCatalog and the Databas, converting those ArcObjects commands to a valid DB operation, any GP tool you try to run, even an export which relies on ArcObjects, can't be performed.
What's the SOP for keeping data up to date?
We script most everything via python.
Is that an option with Data Store?
Or do you just republish the updated data via the original method?
Well, I dont work for Esri but I suppose it depends on the intent of what mean when you say keeping things up to date.
The Datastore, in my humble opinon, is meant to be a wild-wild-west of sorts. That is, since you don't want users creating, editing and deleting authoritative data but you also want them to be able to do some self-servicing of their basic GIS needs, you need a place for them to host their own data. That's the point of the datastore. It's a database that you can set up and pretty much not have to worry about. It's a decentralized database to host all the crazy permutations of end user data. The data within the datastore is not authoritative - it's user data. All their little files and personal layers.
When it comes to authoritative content - which where I work, we tend to define as content that a team of GIS Professionals creates, validates and publishes for end users to leverage within the Portal - that's where setting up an Enterprise Geodatabase and registering it with your federated server(s) comes into play. To do this, you set up a normal Enterprise Geodatabase on SQL Server/Oracle/PostgreSQL/etc. - whatever your DB technology of choice is and then register that enterprise geodatabase with each of your federated servers.
Once that's done, you load your authoritative content into that enterprise geodatabase and work with it the same way you always have. Direct-Connect from Catalog and run GP tools against it or if you're brave, work with it at the DB level. The key to making this work seamlessly with Portal is federation. You must federate your Portal with ArcGIS for Server. Doing so means that anything you publish to ArcGIS for Server will be made available in Portal. It also means that your Portal Groups and Roles will determine who has access to the services being sourced from your Enterprise Geodatabase.
So once you're federated and you've got your data loaded into your enterprise geodatabase and you've added it to an ArcMap session, go ahead and get it all nice and pretty and then publish it directly to your ArcGIS Server Site (Not Portal!). At this point, since ArcGIS Server is Federated with the Portal, it automagically gets picked up by the Portal as well and that resulting feature service will be added to your My Content directory in Portal. Depending on how your database is set up and registered with your Server, you may need to ensure that your Portal Users have database accounts in order to access the service.
With this approach, what you're essentially doing is offering a decentralized data management system (ArcGIS Datastore) for end-user data so they can go nuts and do whatever they want with their personal data while simultaneously offering a centralized data system (your Enterprise Geodatabase) for authoritative/enterprise data.
In terms of managing data hosted in ArcGIS Datastore via Python, you don't. With the above approach, it's not your data to manage, it's the user's. You manage the Enterprise Geodatabase, user's manage their own data within Portal and the Datastore will take care of itself. If you're really edit data hosted in the Datastore through programmatic means, you're going to need to get familiar with the REST API and manage the data through the Service's REST Endpoints.
The Portal Data Store is a PostGRE/GIS SQL Server. It essentially is a replica of the AGOL data store, but without "some" of the cool JSON API stuff you can do against the AGOL API. "Content" such as shapefiles, FGDB, etc... are in a sense stored as BLOB's, so you won't be able to interact with them outside of the Portal -> Arc Catalog Service Connection, like you would a bona-fide SDE, or CartoDB SQL database that would sit on top of PostGRE.
If you want to do things with SQL/SDE database content, such as automate data QA using SQL code yet interact with the Geography in Portal, than as others posted, you would have to federate a server to Portal to be a Hosted Feature Server. Hypothetically you could stand up another instance of PostGre within the DataStore and throw SDE on top of it to do this, but I suspect Esri tech support would hang up on you when they see this (and you will be on the phone weekly with TS even with an out-of-the-box install of Portal).
Does anyone know if the geometry features in an ArcGIS Data Store is PosgreSQL ST_GEOMETRY or PostGIS GEOMETRY? And whether or not there's any way to connect to and manipulate/query it directly from some other type GIS Client (perhaps QGIS) that is not dependent on ArcObjects/SDE Interface to understand either of these types of spatial data?
Geometry features in the ArcGIS Data Store are stored in the ST_GEOMETRY spatial type. And while it is technically possible to connect to and query this data with a non-ArcGIS client app, it is not supported.
Hope this helps,
Is there any informartion how to connect to the datastore?
Connect with pgadmin to the postgres? Connectionstring?
see also this question:
Publish Mapservice - Where are the data?
Hi GIS Mountains,
> Is there any information how to connect to the datastore? Connect with pgadmin to the postgres? Connectionstring?
No, we don't provide this type of information, because it is NOT a supported workflow. While it is technically possible for you to directly connect to the ArcGIS Data Store using both ArcCatalog/ArcMap and pgadmin - we don't recommend that you do this. The ArcGIS Data Store was designed and meant to be accessed internally by the ArcGIS software only.
> see also this question: Publish Mapservice - Where are the data?
It looks like 2 Esri staff folks are already addressing this thread. Not sure what you need from me.
What exactly are you trying to accomplish by direct-connecting to the Datastore? There are likely much better alternatives to do whatever it is that you want which would not put the integrity of the Datastore at risk. Direct Connecting to and tinkering with the Datastore is almost always a very bad idea.
You're probably no longer messing with this, but if you are still trying to connect to Portal's db under the hood, see my answer below: https://community.esri.com/thread/160435#comment-772773
It's basically create an SSH tunnel to port 7654, db name = gwdb, user/pass= your portal admin user.
I agree with the sentiment that you should steer clear of, or at least be careful, inserting/editing data in this instance, but I disagree that merely connecting to it and poking around will cause any damage. I'm trying to troubleshoot my Portal setup and one of the ideas I had was to expand the logging on this db instance to see if it is throwing any helpful errors or warnings. And I will remain unconvinced by anyone who says troubleshooting a config problem by expanding the logging on a db or looking through its contents puts anything at risk.
I definitely agree that increasing the log level to troubleshoot issues is best practice in general, but any of the contents that would require viewing would be exposed through supported APIs, such as the Portaladmin API or Sharing API. Even if something was amiss, updating it directly would not be supported and any updates should be done through the API's mentioned above. In the case of DB issues with Portal or Data Store, I don't think there's much benefit in accessing the DB directly.
Here's an example of a use case of why one would want to access the DataStore directly. I would like to generate some automated reports on the data in the data store using other programs that can access a Postgres DB. Not trying to insert, delete, or update any records. Otherwise, is there a suggestion on how I can create real-time reporting or exporting of data from the data store?
You ever come up with a solution for this?
mike team and Payne Ringling....I'm not sure if you guys are still messing with this, or if this is even the same thing, but on my setup, I can access Portal's PostGRESql db if I create an SSH connection with Putty, tunneling port 7654 between the host and localhost, then creating a connection with pgAdmin III (or whatever client you want) to a database named "gwdb" with my Portal admin username and password.
If you want to access the postgresql.conf or the pg_hba.conf, etc., on my CentOS setup, they are located at:
..where [ags] was the user account you used to install ArcGIS and/or related add-ons. If you're on Windows, it stands to reason that the filepath is similar from the ArcGIS install directory forward, and I would expect the db connection criteria (port, dbname, user/pass) to be identical.
I hope this helps you guys. I'm having trouble just getting the webadaptor to register with portal and found this thread, and wanted to share this info in case it might benefit someone.
Thanks Elijah, this solution worked for me for the hosted feature service database (9876) and really appreciate the help!! We needed to connect to the backend database to troubleshoot converting hosted feature service back to an enterprise data store. We were running into an issue with column names and case sensitivity since our widgets broke in an existing Web Application as we are trying to port the database to enterprise.
This helped confirm all column names exist as lower case in the backend database including objectid & globalid. Also when column names exceeded 32 chars we did not know how it named them and it basically renamed all of the columns with numeric suffixes and alias values with the entire name.
Was there a reason you couldn't look at the Fields information within the feature service layer?
Yes because there are instances when publications do not work due to underlying database issues and these are not reported back to the end user. In the example of bug BUG-000122241 the publication failed. So knowing as much as possible before going into tech support saves us hours and hours on the phone. The additional details knowing the table published but all of the fields names did not have associations in the service was part of this root cause analysis.
Thank you so much Elijah for this useful answer. I have a follow up question almost similar to this one. I recently failed to access Portal for ArcGIS then found that there were a space issue( disk space full) on the /home directory. Then saw this error under db.log " 2020-04-25 15:22:26 PDT: : PANIC: could not write to file "pg_xlog/xlogtemp.12017": No space left on device".
We added additional storage and restarted the server however we couldn't still access portal for ArCGIS. I assumed the PANIC state had to shutdown the internal PostgreSQL database: my question is is there a way I can start it manually? From which path can I find the postgreSQL.service to stop and start it?
(Restarting datastore did not help).
Thank you for any hint on this.
We experienced a very similar issue about a month or so ago. I reported a support incident and they only thing the support analyst advised me to do was to uninstall Data Store and reinstall. If you had any patches you will want to install those after the base install as well.
In our case this allowed the database instance to start again and everything was in tact, although it took a very long time for this to uninstall and reinstall.
Hope this helps!Ronnie
Hi Celine. Generally when a server runs out of disk memory unexpectedly like that it's because the system has been on its feet for a long time, and meanwhile some application with an aggressive logging behavior has eventually filled the disk. While you say you've added additional storage, are you sure the new space is available to /home? If so, I would try renaming 'pg_xlog/xlogtemp.12017' (i.e. mv pg_xlog/xlogtemp.12017 pg_xlog/xlogtemp.12017.OLD), then reboot and see if things come back online. It's possible some aspect of the last_used log file is corrupted now, and Arc-something-or-other is trying to access it and crashing when it fails to open it. Maybe removing that file from scope will force it to start over, clean.I don't recall what that service name is or what it might be, and unfortunately I'm not in a position to check. But, in the terminal, you might try typing "service p", then hit tab two times fast, and see if CentOS will suggest a bunch of known service names that start with "p" ..if there are multiple service entries that look "postgres-like" (one might say "postgresque.. ..sorry I realize this is no place for a joke :/) then try stopping and restarting each of those services. Do a full service stop, then fresh service start just in case. The next thing I would try would be clearing out the entire log directory (pg_xlog), just in case there is a related log that got hosed. If your organization looks down on deleting logs, copy them into a safe directory before eliminating them so that you'll have copies. Another thing worth considering, if free space in your /home directory is still an issue, deleting one or two recent log files probably won't help. You might need to figure out what happened to all your space and free some up relative to /home. It might be enough just to find and eliminate the largest and/or oldest logs. I would aim to free up at least a gig to give you some headroom. Then once you've done the spring cleaning, reboot and see if the system comes back up. Hopefully it doesn't come down to uninstalling and reinstalling DataStore, entirely, as someone else mentioned. That seems like an awfully heavy-handed solution. (Although, unfortunately, it wouldn't surprise me since it came from the same organization that tightly couple's the system's FQDN to the software at install-time ..an absolutely terrible design decision, IMO.)Anyway, I apologize this isn't more helpful. May the odds be ever in your favor. /Elijah
For a base deployment (portal, server and data store on the same machine), how many PostGreSQL gets installed? Multiple PostGreSQL or one PostGreSQL with multiple instances? same port and dbname for access in pgAdmin? Can anybody please clarify?
If Portal and Data Store on the same machine, you'll have two instances of PostgreSQL running. It'll use different connection strings, (ports, database name, credentials, etc). Repeating what's been mentioned above, though, it's not supported nor recommended to access the database directly. If there is an API that is missing that could be useful for you to retrieve the information you're looking to retrieve, feel free to create an idea on the Ideas site.
Retrieving data ...