Any problems with shrinking enterprise geodatabase in SQL Server?

TimMinter · ‎02-06-2020

I'm getting charged per GB for SQL Server storage, so I have an incentive to use just as much storage as I need and no more. There are enough GB involved to create an attractive benefit to using less storage. I immediately think, "shrink the database." Then I think, "uh oh."

I notice that the Esri help is silent on the subject:

While the Microsoft help is not:

SQL Server 2016: Shrink a Database - SQL Server | Microsoft Docs

So... have you ever done this successfully with an enterprise geodatabase in SQL Server? Any gotchas, hints, suggestions, soothing sounds, pointing and laughing?

Thanks,

tim

JCGuarneri · ‎02-09-2023

@John_Spence, are you saying there's a geoprocessing tool to release unused space from within the Esri ecosystem, or are you saying to compress the DB and immediately shrink using the tools in SQL Server administrator?

John_Spence · ‎02-09-2023

Do not shrink the DB using the SQL Server admin tools or commands. You will cause yourself more headache than it is worth. Performance problems with your DB is usually due bad indexes & stats or garbage in the metadata or configuration issues or not using best practices in some form or shape. Obviously, all things we all (generally) deal with on a day to day basis.

If you are running into issues, I would...

Check your how versioning is working for you. Make sure you are at state 0 if possible, that is if you are using traditional versioning. Branched versioning adds its own special flavor to the sauce which makes it more difficult to unravel (it essentially acts like archiving, but different).
Rebuild your stats and indexes.
Delete geoprocessing history (https://support.esri.com/en/technical-article/000027149). You would be surprised how much this can gunk up operations.
If you are SQL Server, check if you are on full or simple recovery model. If full, you may need to check your log file size and handle that.
Shrink your database - Use this
And then there are other database configuration options, etc.

Hope this helps.

JCGuarneri · ‎02-09-2023

Thank you for clarifying.

It sounds like I might have to contact Esri tech support to find a path forward. The DB gets compressed and reindexed every morning, I don't record GP history on all my regular scripts (and I delete it periodically just to be sure), and the log file isn't too scary. My big culprit is an archive class that I don't need (but exists anyway because the table is related to a FC that does need archiving). That archive class had ballooned to 140 GB. I deleted it the other day, but the DB is still tying up all that space. The Shrink workflow you link to seems to only apply to SQL Server Express, as I can't find equivalent tools for my SQL Server data connnections.

John_Spence · ‎02-09-2023

Oh geez! You're right on the nose there. My bad.

I have a similar situation on my end with regards to archiving. I just dealt with it by disabling the relate (during maintenance) and dropping the archiving and then turning it back on followed by a rebuild of the relationship. Cleans it up, keeps it smaller (long term).

JCGuarneri · ‎02-09-2023

No worries! If I'm understanding correctly you do the following:

remove relationship class and archiving
compress/index db
set the relationship class back up

and this gets rid of the unused space?

TimMinter · ‎02-04-2022

Hi @John_Spence & @jsarthur,

Good discussion above. Under my standing "I really don't know anything" principle, I'd like to know more about the hugeness of the shrinking eGDB no-no. If you can point me at some help documentation, technical articles, blogs, etc. that might be easiest. Just chiming in with a few items:

Usage - Vector feature classes, tables, relationship classes, feature datasets, time-enabled data, that sort of thing. No or nearly no raster or other data types in the big (-ish) databases. Relative to other data collections I've helped manage during my career, I don't consider these databases to be "heavily used" from the perspective of total number of requests over time and concurrent connections. They are used enough that performance problems are noticed and communicated.
Recovery model - 'Simple' for the databases we need to shrink. We mostly use these as OLAP databases and just don't care about the logs much at all.
Performance - ArcGIS Pro PerfTools suggests that things are going about as well as can be expected for object-relational handling of the data given our computing environment resources (storage, db servers, network, etc.). We haven't noticed any unexpected negative performance effects related to shrinking the database. During larger data loading activities, we expect and experience some performance hit as auto-grow does its thing. This touches on why we shrink - one update we do loads ~600 million records as 'new' tables and feature classes, builds indexes, statistics, etc. over a fairly long time stretch. In order to minimize availability impact to staff and systems, we switch out the 'old' with 'new' during a brief maintenance window, ensure everything is working as expected, then delete the 'old' object. That leaves a bunch of unused storage that unnecessarily costs us money until the next update.
Maintenance - Yes, we regularly rebuild indexes, analyze, compress, etc. as needed. We have a lot of truncate/append or simply append ETL activity as we move record updates into static schema in the eGDB.
Metadata - Nice. We've started to configure ArcGIS to not store GP processing history in the metadata, but not consistently across all machines that do the work. Thanks for the link to that article. We'll investigate to see about a cleanup and keep getting our house in order.

Cheers,

tim

John_Spence · ‎02-04-2022

Good morning Tim,

Caught me at a good time lamenting my choice of which Crypto asset to get into. LOL.

Give me a call when you get a sec (noticed you are local). Details are in your LinkedIn messaging.

Well the scale and scope of what you are dealing with does change the discussion a bit. From the sounds of it, you are working in a cloud situation too which makes things all that much more fun.

600M record changeouts, YES, please by all means shrink that database. You really cannot afford to do otherwise due to the volume of data movement. Take care to rebuild all indexes when you are complete though as that volume of movement and shrinkage will have a impact on the indexes. The below query will rebuild the indexes for the entire database for you afterwards without having to go one or using the standard tools for Esri (python or toolbox).

DECLARE @Database NVARCHAR(255)
DECLARE @Table NVARCHAR(255)
DECLARE @cmd NVARCHAR(1000)
DECLARE DatabaseCursor CURSOR READ_ONLY FOR
SELECT name FROM master.sys.databases
--WHERE name NOT IN ('master','msdb','tempdb','model','distribution') -- databases to exclude
WHERE name IN ('Carta') -- use this to select specific databases and comment out line above
AND state = 0 -- database is online
AND is_in_standby = 0 -- database is not read only for log shipping
ORDER BY 1
OPEN DatabaseCursor
FETCH NEXT FROM DatabaseCursor INTO @Database
WHILE @@FETCH_STATUS = 0
BEGIN
SET @cmd = 'DECLARE TableCursor CURSOR READ_ONLY FOR SELECT ''['' + table_catalog + ''].['' + table_schema + ''].['' +
table_name + '']'' as tableName FROM [' + @Database + '].INFORMATION_SCHEMA.TABLES WHERE table_type = ''BASE TABLE'''
-- create table cursor
EXEC (@cmd)
OPEN TableCursor
FETCH NEXT FROM TableCursor INTO @Table
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
SET @cmd = 'ALTER INDEX ALL ON ' + @Table + ' REBUILD'
--PRINT @cmd -- uncomment if you want to see commands
EXEC (@cmd)
END TRY
BEGIN CATCH
PRINT '---'
PRINT @cmd
PRINT ERROR_MESSAGE()
PRINT '---'
END CATCH
FETCH NEXT FROM TableCursor INTO @Table
END
CLOSE TableCursor
DEALLOCATE TableCursor
FETCH NEXT FROM DatabaseCursor INTO @Database
END
CLOSE DatabaseCursor
DEALLOCATE DatabaseCursor

Shrinking that database, here is the right spot to read about it though it does only speak to SQL Server Express, the rules apply.

As your databases are for OLAP typically, have you considered using data warehousing options like Snowflake, etc? Pro has some options for that now and you would get a significant performance improvement from making that switchover.

BillFox · ‎02-09-2023

We have had no problems using native sql shrink (if needed).