Geodatabase size increases dramatically when closing a table

5319
15
01-20-2014 10:51 PM
UlrichEgger
New Contributor III
I observed the issue that file geodatabases written with the File Geodatabase API are relatively large. When opening a feature class written with the File Geodatabase API e.g. in ArcMap and then adding a field followed by deleting the same field, the size is reduced very much (about 1/10th of the original size).

I tried to debug the problem and I found that at the end of writing data into the file geodatabase, the size is still o.k.
The size grows during the call of FileGDBAPI::Geodatabase::CloseTable (Table &table) (see code snipped below)

In our case the problem is that we deal with a large number of polygons quite often. So it makes a big difference if the result database is 1GB or 10 GB.

int GeodatabaseResultWriter::Terminate()
{
    if(GdbWrapper::CheckResult(gdb->waterLevelTable.FreeWriteLock()) != 0)
        return 1;
    //size of geodatabase still small
    if(GdbWrapper::CheckResult(gdb->geodatabase.CloseTable(gdb->waterLevelTable))!= 0)
        return 1;
    //geodatabase has grown !
    if(GdbWrapper::CheckResult(CloseGeodatabase(gdb->geodatabase))!= 0)
        return 1;

    return 0;
}
0 Kudos
15 Replies
LanceShipman
Esri Regular Contributor
Rather than a code fragment can you send code that reproduces the problem. We have not been able to reproduce this based on your code fragment.

Also:

"I tried to debug the problem and I found that at the end of writing data into the file geodatabase, the size is still o.k.
The size grows during the call of FileGDBAPI::Geodatabase::CloseTable (Table &table) (see code snipped below)"

The size grows while you are writing data to the table, yes? And then the size increases even more (how much) on CloseTable?
0 Kudos
DavidSousa
New Contributor III
I think that what you are seeing is how the file system works on Windows.

If you open a file, and write data into it, the file is obviously larger than it was before.  One would naturally assume that if you checked the size of the file that you would see that the size had changed.  The confusing thing is that Windows does not guarantee that the file system metadata will be flushed until the file is closed, and even then, there might be a delay until the new size is visible.
0 Kudos
UlrichEgger
New Contributor III
It is obvious that the size must grow when writing the file.
It is also o.k. if that occurs when closing the table and not before.

The strage thing is that when opening the table in ArcGIS and adding /removing a single field,
the size is reduzed ! Is seems like ArcGIS is able to "repair" it somehow.

I'd like to find out what I am doing wrong when writing the database. Or is it a bug of the API ?

In the attachments, you can find an example where I reproduced the behaviour. It does the
same thing like my program but writes some random dummy data. You can take the Editing
example from the samples in API version 1.3 and replace Editing.cpp

Then you also need to put the geodatabase Result2D.gdb in the samples/data directory

After running the program check the size of the geodatabase Result2D_writing.gdb
I get 235 MB
Then open the feature class Topo_decimated of the database Result2_writing.gdb in ArcGIS, open the
attribute table, add a field and delete it again. Then check the size of the geodatabase again.
I get 4,75 MB
0 Kudos
UlrichEgger
New Contributor III
Please see my new message below. I have posted an example to reproduce the error.
Originally I hoped that this might be a known problem.
0 Kudos
UlrichEgger
New Contributor III
I have posted an example to reproduce the problem

Rather than a code fragment can you send code that reproduces the problem. We have not been able to reproduce this based on your code fragment.

Also:

"I tried to debug the problem and I found that at the end of writing data into the file geodatabase, the size is still o.k.
The size grows during the call of FileGDBAPI::Geodatabase::CloseTable (Table &table) (see code snipped below)"

The size grows while you are writing data to the table, yes? And then the size increases even more (how much) on CloseTable?
0 Kudos
DavidSousa
New Contributor III
Thank you for the repro case.

What you are seeing is not at all a bug.  It is the expected behavior when you perform a large number of updates to your data.  Additionally, the same thing happens in ArcGIS.

The reason that the files become very large when you do a great deal of updates is that the update process often creates internal fragmentation in the files.  Usually, this happens when an updated record requires more storage than it did before.  When this occurs, the record must be written out to a new location in the file, and the previous location is marked as deleted or free space within the file.  All of these deleted records are kept track of in the "freelist" file.

There are a couple of ways that the free space can be recovered.  You discovered one of them, which is deleting a field.  When a field is deleted, the entire file is rewritten one record at a time, but the deleted records are skipped.  The better way to recover free space is to run the Compact Database command in ArcGIS.  You can do that from the database property page, or from a GP tool.

In the FileGDB API, we did not expose CompactDatabase functions in the original design.  However, we are in the process of creating an updated release with many enhancements and bug fixes.  We will add CompactDatabase functions as part of this new release.  Our plan is to have the new release ready around the time of the Developer Summit in March.
0 Kudos
UlrichEgger
New Contributor III
Dsousa, thank you very much for your reply.

Yes indeed such a function like CompactDatabase I also looked for in the doku.
This would be a great improvement.

Will I be able to call this method also before closing the table or after I did some
updates (e.g. after each timestep). Or will I have to close table / geodatabase
first?

The intention of the question is: If I have a result of about 2GB, wich is about 100GB
in the fragmented way, will I be also able to avoid that the geodatabase needs 100GB
of free disk space before beeing compacted ?
0 Kudos
DavidSousa
New Contributor III
Compact does not require that the Table or Geodatabase be closed first.  Compact can take a significant amount of time to run, depending on how much fragmentation is present and how big the file is.  Because of that, you might want to minimize the frequency of using it.  It is similar in some respects to running a disk de-fragmenter.  Since it takes a lot of time, you might not want to run it several times a day.
0 Kudos
UlrichEgger
New Contributor III
I thought I would run it once before the table is closed (= the time when by experience the size of the geodatabase on the disk grows)
0 Kudos