Better Data Management in ArcGIS Data Store

2450
7
01-06-2020 12:45 PM
Status: Open
RyanUthoff
Occasional Contributor III

I would like to see some additional data management options in the ArcGIS Data Store. I'm currently noticing some data management issues that could probably be easily resolved if Esri provided some additional data management tools.

For example, the size of our ArcGIS Data Store is getting bloated and we have no way to reduce the file size. We host the ArcGIS Data Store on its own VM, but deleting the data from the hosted feature service (such as through ArcGIS Pro) does not actually release data back to the OS.

I would like to see an Esri tool that would actually release the free space back to the OS (usually accomplished by using a Full Vacuum command in PostgreSQL). I understand that deleting records from the tables will release the space back at the table level (usually through the regular vacuum command which appears to be happening, at least according to the ArcGIS Data Store logs), but I would like to extend that further to release it back to the OS. I deleted approximately 100GB worth of photos, but none of that data is being released back to the OS, and our available storage space is still going down because we're still collecting data in other tables. It is not sustainable for us to just continually adding space to the VM. We need to be able release free space back to the OS to make room for other surveys that contain photos.

I have access to the ArcGIS Data Store through pgAdmin, but would prefer to perform these actions using an "Esri approved workflow", such as using an ArcGIS Data Store command utility.

Furthermore, since I do have access to the ArcGIS Data Store through pgAdmin, I am noticing that Esri is not always deleting tables it no longer needs (such as when deleting the hosted feature service in Portal or when updating an existing Survey that requires it to "delete" and re-add the table because it is making a schema change to the survey). It leaves the old tables in the database, and just creates new tables. This is very dirty data management which results in a bloated ArcGIS Data Store which Esri gives us no option to cleanup unless if we modify the ArcGIS Data Store directly through pgAdmin which Esri does not recommend. If there could be some cleanup operation that detected orphaned tables and deleted them, that would be nice.

7 Comments
LarryJahn

To piggy back off this great idea, I believe it would also be very useful to have some basic data recovery tools for Portal Data Stores.  For instance, if someone accidentally deletes 100 features from a Hosted Feature Service, it's not efficient to recover the deleted data by restoring Enterprise using the webdr tool.

1523Carver

How were you able to connect using pgAdmin? I am trying to do just that and I'm struggling with making it happen. Would you be willing to point me in the right direction?

Henry
by

I've had some frustrations with DataStore as well, especially related to efficient content management.

My experience was limited to using Enterprise deployed on an Azure VM so there may be unique configurations related to that - but it seemed unnecessarily difficult to generate a list of content and how much space that content was taking up so I could audit what could be removed or relocated, not to mention weird duplications of data - specifically anything with attachments and attachment tables that appeared to balloon storage.

Never ended up installing 10.9 - so maybe there are some more tools for that purpose in that release.

JTessier

Agree this needs some attention/tooling.

JTessier

@ThomasEdghill would love to see this get attention in future version.

NHannig

Agreed, this would be very helpful

AndrewRyder1

Some ideas for areas to improve in addition to @RyanUthoff 's initial post:

  • Generate report of feature service (and other applicable item types) size on disk.
    • Perhaps also provide that information in portal or server manager?
  • Clean up data that has been orphaned for whatever reason.  
  • Clean up domain item references. Currently if a feature class is published with a domain table name that already exists in datastore, a new domain item is created with an underscore at the end. This is evident in the AGS logs.
    • ex. scenario: Datastore attempts to create Domain table name "YesNo" but 2 versions of the same table already exist, so a domain item is created named YesNo_2, there are now 3 tables (YesNo, YesNo_1, YesNo_2), referenced by 3 different feature services. If the feature service referencing YesNo is deleted, the domain item still exists in the datastore. This leads to the potential to 100's if not thousands of duplicate domains, with an unknown number of unused versions.

I'm sure there are other areas to address as well, but these would go a long way towards helping.