I am about to embark on project to edit an existing ESRI desktop AddIn I had created a few years ago for a client which processes AIS data (vessel movement data).
The original tool was designed around a volume of data that was specific to their original project. The data is held in a file geodatabase with key fields having attribute indexes built and the FeatureClass compressed. The client has been throwing larger and larger datasets and the current tool logic needs to be improved so it can deal with these larger datasets. Currently the tool gets slower as the data size increases. The tool is doing a lot of spatial querying and the logic of this needs to change.
I believe their large datasets are about 2 million rows.
I've had a good think about how to approach this and I've come up with 2 solutions: apply definition queries so that the tool is only ever looking at a subset of data based upon date or do some pre-processing splitting out the data into separate FeatureClasses based upon date, so physically creating new but smaller datasets.
Does anyone have any advice, have they done something similar and know of any pit falls? I was wondering if I go down the route of definition queries, is there any performance degradation as the size of the source dataset increases. For argument sake lets say the definition query always subsets a similar number of rows. Is the performance of that influenced by the size of the underlying table? Should I "explode" the dataset into new featureclasses then any processing will always be from a source dataset of a smaller size (probably around 20,000 rows).
I ask this question as the mechanisms for these 2 solutions are quite different and before I start felt I could benefit from other peoples wisdom/experiences?