How to promote the performance on querying filegeodatabase entries count??

1368
9
05-24-2018 07:02 PM
Leelin
by
New Contributor II

Hi, I tried to getting the count of some filegeodatabase rows with "File Geodatabase API 1.4", but the performance is very poor as following statements:

The Source code

Geodatabase geodatabase = Geodatabase.Open(@"xxxxxx");

var countStatements = $"SELECT count(*) FROM blk WHERE OBJECTID=1 or OBJECTID=2";
var countRows = geodatabase.ExecuteSQL(countStatements);

var countEnumerator = countRows.GetEnumerator();

if(countRows != null)
{
       countEnumerator.MoveNext();
       count = (int)countEnumerator.Current.GetDouble(0);
}

It took about two seconds to getting the result. The performance is very poor!

How can I promote the performance through query sql statements or the other approach?

I’m very appreciate for your any help.

 

Thanks,

Casper. Lee

0 Kudos
9 Replies
JoshuaBixby
MVP Esteemed Contributor

You state that 2 seconds if very poor performance.  What are you comparing that against?  Have you run an SQL query against a similar data set using a different DBMS or API and seen sub-second response times?

There are several factors that can influence the time it takes for a query to return results.  Since you are working with the File Geodatabase API, you are obviously working with file geodatabases, which means the performance of the underlying storage can influence the time.  Is your data local or on a network share?  If local, what kind of local drives are you using?  Was something else running in the background that was generating disk I/O when you queried?  If the data is on a network share, what kind of latency do you have connecting to the network file server?

I suspect part of what you are experiencing is the cost of setting up the connection to the geodatabase.  If you run the SQL successively without tearing down your connection and creating a new one, does it still take 2 seconds?

0 Kudos
Leelin
by
New Contributor II

Hi Joshua,

Thanks for your replying, I'm trying to query some features from a ,gdb file, the .gdb file size is about 330M on local. I tried to run an SQL query on a .shp file (the size is about 430M), it takes about 980 milliseconds. On the .shp querying side, the querying tool is "OLEDB". 

I also confused on the querying behaviors:

1. Why does querying the count of features is very slow? 

2. The performance is very poor when the iteration of queried features is moved to the last.

My computer environment:

OS: Win10 x64

Hardware:

CPU: i7 4700,

Memory: 16G,

Hard disk: SSD 512G.

I also tried to create a new FGDB and import the data into the new FGDB, but the performance is still poor. It still took about 2 seconds.

Thanks,

Casper. Lee

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

You answered some of my questions, i.e., you are working with file geodatabases on local storage.

Have you had a chance to re-run the query without closing out your connection to see how much of your run time may be related to overhead in creating the connection itself.

In terms of comparing sizes of data sets, size on disk is only one aspect.  One needs to look at numbers of records, numbers of fields, types of fields, etc....  For a query where you are only looking at a specific field, the number of records could be a bigger factor than overall size of the data set.

Also, have you tried modifying your query to not select all fields, e.g., "SELECT count(OBJECTID) FROM blk WHERE OBJECTID=1 or OBJECTID=2".

Leelin
by
New Contributor II

Hi Joshua,

Thanks for your suggestion. Unfortunately, the performance is still poor when I tried to run the querying more time base on a opened connection as following statements:

 

Source code
Geodatabase geodatabase = Geodatabase.Open(@"C:\test\testData.gdb");
var countStatements = $"SELECT count(OBJECTID) FROM blk WHERE OBJECTID=1 or OBJECTID=2";
var countRows = geodatabase.ExecuteSQL(countStatements);
var count = 0;
var countEnumerator = countRows.GetEnumerator();
var stop = Stopwatch.StartNew();
for (int i = 0; i < 2; i++)
{
countEnumerator.MoveNext();
count = (int)countEnumerator.Current.GetDouble(0);
Console.WriteLine(stop.ElapsedMilliseconds / 1000);
countEnumerator.Reset();
}
stop.Stop();

The test result shown: 2 seconds and 4 seconds. It seems like without any changes.

The test data information:

File size: 330M

Records count: 415000

Fields count: 37

Do you have the other suggestion? I'm very appreciate for your any help.

Thanks,

Casper. Lee

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Looking at your code, it appears to be timing the iteration over the enumerator, not the performance of the SQL.  I understand that you are interested in the performance of the whole code block, but it might be good to understand how each sub-part is working. 

What do you get from the following code (I haven't tested, but I think you can get what I am after if it doesn't work as-is😞

var countStatements = $"SELECT count(OBJECTID) FROM blk WHERE OBJECTID=1 or OBJECTID=2";

var stop = Stopwatch.StartNew();
Geodatabase geodatabase = Geodatabase.Open(@"C:\test\testData.gdb");
stop.Stop()
Console.WriteLine(stop.ElapsedMilliseconds / 1000);

var stop = Stopwatch.StartNew();
var countRows = geodatabase.ExecuteSQL(countStatements);
stop.Stop()
Console.WriteLine(stop.ElapsedMilliseconds / 1000);

var stop = Stopwatch.StartNew();
var count = 0;
var countEnumerator = countRows.GetEnumerator();
for (int i = 0; i < 2; i++)
{
countEnumerator.MoveNext();
count = (int)countEnumerator.Current.GetDouble(0);
stop.Stop();
Console.WriteLine(stop.ElapsedMilliseconds / 1000);
}
countEnumerator.Reset();

var stop = Stopwatch.StartNew();
var count = 0;
for (int i = 0; i < 2; i++)
{
countEnumerator.MoveNext();
count = (int)countEnumerator.Current.GetDouble(0);
stop.Stop();
Console.WriteLine(stop.ElapsedMilliseconds / 1000);
}
countEnumerator.Reset();
0 Kudos
Leelin
by
New Contributor II

Hi Joshua,

Thanks for your further information. I have successful run your code. I think you want to find out the place where the performance is poor. The following items are the test result:

1.  Geodatabase.Open(@"C:\test\testData.gdb");

It takes 12 milliseconds.

2. geodatabase.ExecuteSQL(countStatements);

It takes 0 millisecond.

3. The first time:

for (int i = 0; i < 1; i++)
{
countEnumerator.MoveNext();
count = (int)countEnumerator.Current.GetDouble(0);
stop.Stop();
Console.WriteLine(stop.ElapsedMilliseconds / 1000);
}

It takes 2 seconds.

4. The second time:

for (int i = 0; i < 1; i++)
{
countEnumerator.MoveNext();
count = (int)countEnumerator.Current.GetDouble(0);
stop.Stop();
Console.WriteLine(stop.ElapsedMilliseconds / 1000);
}

It takes 2 seconds.

Base on the test result, I guess the iteration of enumerator is the main reason. I haven't any idea to resolve it. Do you have the other suggestion? I'm very appreciate for your any help.

Thanks,

Casper. Lee

JoshuaBixby
MVP Esteemed Contributor

Unfortunately, I don't work with the FileGDB API very much, I usually interact with file geodatabases through ArcPy.  I wish I had some great suggestion at this point, but I don't.  Hopefully some Esri staff that work on FileGDB API development see this and chime in.

0 Kudos
Leelin
by
New Contributor II

Hi Joshua,

Thank you very much. Your suggestion help me to test File GeoDatabase reading/writing performance. I have opened another question to find other answers.

Thanks,

Casper. Lee

0 Kudos
ÁkosHalmai
Occasional Contributor II

Hi,

The problem is in the SQL query: it seems, there is no planer/optimiser in FileGDBApi. To execute the original SQL ($"SELECT count(*) FROM TableName WHERE OBJECTID=1 or OBJECTID=2") takes 4466 ms on my table but to execute this one is only 16 ms on first run and 1 ms on second run: "SELECT Count(*) FROM Original WHERE OBJECTID IN (1, 2);"

The full code:

RowCollection rows = GDB.ExecuteSQL("SELECT Count(*) FROM Original WHERE OBJECTID IN (1, 2);");
foreach (Row row in rows) // Just to be generic.
{
int i = (int)row.GetDouble(0);
}

Ákos Halmai

0 Kudos