Select to view content in your preferred language

Better Data Management in ArcGIS Data Store

4316
14
01-06-2020 12:45 PM
Status: Open
RyanUthoff
MVP Regular Contributor

I would like to see some additional data management options in the ArcGIS Data Store. I'm currently noticing some data management issues that could probably be easily resolved if Esri provided some additional data management tools.

For example, the size of our ArcGIS Data Store is getting bloated and we have no way to reduce the file size. We host the ArcGIS Data Store on its own VM, but deleting the data from the hosted feature service (such as through ArcGIS Pro) does not actually release data back to the OS.

I would like to see an Esri tool that would actually release the free space back to the OS (usually accomplished by using a Full Vacuum command in PostgreSQL). I understand that deleting records from the tables will release the space back at the table level (usually through the regular vacuum command which appears to be happening, at least according to the ArcGIS Data Store logs), but I would like to extend that further to release it back to the OS. I deleted approximately 100GB worth of photos, but none of that data is being released back to the OS, and our available storage space is still going down because we're still collecting data in other tables. It is not sustainable for us to just continually adding space to the VM. We need to be able release free space back to the OS to make room for other surveys that contain photos.

I have access to the ArcGIS Data Store through pgAdmin, but would prefer to perform these actions using an "Esri approved workflow", such as using an ArcGIS Data Store command utility.

Furthermore, since I do have access to the ArcGIS Data Store through pgAdmin, I am noticing that Esri is not always deleting tables it no longer needs (such as when deleting the hosted feature service in Portal or when updating an existing Survey that requires it to "delete" and re-add the table because it is making a schema change to the survey). It leaves the old tables in the database, and just creates new tables. This is very dirty data management which results in a bloated ArcGIS Data Store which Esri gives us no option to cleanup unless if we modify the ArcGIS Data Store directly through pgAdmin which Esri does not recommend. If there could be some cleanup operation that detected orphaned tables and deleted them, that would be nice.

14 Comments
MarcusAndersson

Hi @RachaelHarbes,

As I understand the PostgreSQL-database that is running in AGE is that it only performs the "VACUUM" command on a schedule (daily?weekly? can't remember now), this makes it so that the database can never shrink in "disk-size". 

To be able to reclaim space that has been previously used, i.e. stop it from bloating, you sometimes need to run the "VACUUM FULL" command on the database. This operation can "free space", but should not be done without a proper understanding of the potential benefits and drawbacks. More info on the topic can be found here: PostgreSQL: Documentation: 17: 24.1. Routine Vacuuming
The postgreSQL-documentation says that if you regularly vacuum the db then you shouldn't need to perform the vacuum full operation.. While this might be true in the abolute majority of cases, I don't think it's true for all, and not in the case of (at least ours) AGE anyways. If it's down to how AGE interacts with the db or that it's a mismatch on some other level I don't know, but I see that you do in some cases need to use the vacuum full-command on a AGE-system.

I would not recommend doing this in a prod environment without having properly tested it in a dev/sandbox first. Preferrably with an exact replica of the data you have in your production environment. You should also have the backups you need readily available and having tested the backup routines before you do this. Prepare yourselves on that the vacuum-full takes a lot longer time than the regular vacuum-command.

Link to the compatibility of the postgreSQL-versions and ArcGIS Enterprise: FAQ: What PostgreSQL Database Versions Are Supported by ArcGIS Enterprise and ArcGIS Pro? 

I'm by no means any postgreSQL-expert, but these are at least some findings I've done when dealing with similar issues/thoughts that you have. So take it for what it is and good luck 🙂

BR,
Marcus

RachaelHarbes

@MarcusAndersson Thank you for the information on the "VACUUM". I was looking for a more robust clean-up of data which is no longer in the portal/hosting server, but remaining in the data store. Do you happen to know of a safe way to remove the orphaned layers/tables? 

MarcusAndersson

@RachaelHarbes 
Yes. That would be the "Vacuum full", then.

BR,
Marcus

RachaelHarbes

@MarcusAndersson VACUUM scans a table, marking tuples that are no longer needed as free space so that they can be overwritten by newly inserted or updated data.

How would it know what tables/feature layers are orphaned and delete them from the database? These tables still exist in the database, but do not exist in hosted/portal. These tables are still valid and exist as functioning tables from a database perspective. Is this a special function ESRI does to compare?