Filter by attributes takes so long on File Geodatabase

yockee · ‎11-03-2024

I have File Geodatabase on a shared folder somewhere on the server. It contains 200 million rows with attachment pictures. The size is 133 Gb.

I want to do a simple thing like "select by attributes" using OR.

Why does it take so long ? it has been nearly half an hour and still processing on 4% progress mark (the progress mark does not increase as well)..

Here is the attached pic:

RobertKrisher · ‎11-04-2024

Any latency between you and the file geodatabase, such as the latency of accessing a network share or through a VPN, will adversely impact performance.

If you were to put that file geodatabase locally, or if the data were stored in a mobile geodatabase, the operation would be much faster.

A file geodatabase represents data using a file structure, which means that any time the data is accessed your client must interrogate a number of different files. This can result in hundreds and thousands of requests for even a simple operation like panning a map. The latency of your connection to that file geodatabase is added to every request, even a 100ms latency can add minutes or hours of processing time to a simple operation.

View solution in original post

DanPatterson · ‎11-03-2024

It is the file size and its location. Even if the data were stored locally, I suspect that it woud take an inordinate amount of time as well

... sort of retired...

RobertKrisher · ‎11-04-2024

Any latency between you and the file geodatabase, such as the latency of accessing a network share or through a VPN, will adversely impact performance.

If you were to put that file geodatabase locally, or if the data were stored in a mobile geodatabase, the operation would be much faster.

A file geodatabase represents data using a file structure, which means that any time the data is accessed your client must interrogate a number of different files. This can result in hundreds and thousands of requests for even a simple operation like panning a map. The latency of your connection to that file geodatabase is added to every request, even a 100ms latency can add minutes or hours of processing time to a simple operation.

VinceAngelo · ‎11-05-2024

There are a bunch of things here:

200m rows is an order of magnitude higher than I would feel comfortable using for file geodatabase (yeah, it functions, but a real database would function much better).
Shared folders are performance death for file geodatabase, with a minimum 2x cost accessing a local network share
Full-table-scan queries are performance poison relational databases with very large tables. If it's important enough to do a query, it's important enough to build an index.
You should not be using an OR when you could use an IN: rel_objectid in (26,19804)
Remember that FGDB doesn't have an RDBMS optimizer, so you should always pitch softballs for queries.

- V

yockee · ‎11-05-2024

Hi @VinceAngelo , sorry, my bad. I meant 200 thousand rows. Its pretty slow for records this little.

VinceAngelo · ‎11-06-2024

Even 200k rows can be slow if they're wide enough. You should certainly have an index on the query column, but first priority is to copy the FGDB directory to local disk.

- V

yockee · ‎11-11-2024

Yup. Thats correct @VinceAngelo . I eventually copy the data into my laptop. It's much faster now. Shared folder does not work that fast.