Optimizing discovery of all layers, datasets, tables, relationships

BillSmith · ‎07-11-2025

Hello,

I am curious if there is anything to do to optimize discovery of the layers in a geodatabase. Basically I am creating a tree of the datasets, layers, and other tables - much like ArcCatalog would. Here is what I have now. I am wondering if using LINQ may offer performance boosts or if there are any other suggestions to speed up the method.

Thanks in advance, -Bil

internal int GetEsriDatasetMap(
int message_id,
bool include_tables,
bool layer_names_only)
{
// Get all the feature datasets and the classes in them.

// ESRI databases have proper datasets containing feature classes as well
// as improper datasets which are feature classes with no enclosing dataset.
// Find all the proper datasets, verify their CRS is WGS84 and
// if so collect their feature classes. Then find all the improper datasets
// which are feature classes without an enclosing dataset. These also
// have a CRS which must be WGS84.
// Feature classes within an enclosing dataset have the same CRS as the dataset.

int error = -1;

//EsriMessageBox.DisplayMessage("Attach debugger bish");

if (m_dataset_map == null) {
m_dataset_map = new EsriDatasetMap();
}

// Check to see if this has already been done
if (m_dataset_map.Count > 0)
{
error = 0;
EsriLogger.Log("Datasets already figured. Moving along.");
}
else
{
string empty_dataset_name = SEE_NO_DATASET;
List<String> all_layer_names = new List<string>();
List<string> layer_names_in_datsets = new List<string>();
string layer_name = "";

IReadOnlyList<FeatureClassDefinition> featureClassDefinitions = geodb.GetDefinitions<FeatureClassDefinition>();
IEnumerable<Definition> featureClasses = featureClassDefinitions;

foreach (var fc in featureClasses)
{
layer_name = fc.GetName();
DatasetType fc_dst = fc.DatasetType;
all_layer_names.Add(layer_name);
m_layer_names.Add(layer_name);
//EsriLogger.Log("Layer name is: " + layer_name + ". Type is: " + fc_dst);
}

if (include_tables)
{
IReadOnlyList<TableDefinition> tableDefinitions = geodb.GetDefinitions<TableDefinition>();
IEnumerable<Definition> tables = tableDefinitions;
foreach (var table in tables)
{
layer_name = table.GetName();
DatasetType td = table.DatasetType;
m_layer_names.Add(layer_name);
//EsriLogger.Log("Table name is: " + layer_name + ". Type is: " + td);
}
}

error = 0;
m_layer_names.Sort();

// Sometime after release 2.2 dataset definitions were added making it possible to
// get the layers in their respective datasets instead of flattening all of thrm under
// one dataset.

// Get feature datasets
IReadOnlyList<Definition> feature_datasets_list = geodb.GetDefinitions<FeatureDatasetDefinition>();

// Get all the feature classes under their respective dataset folders and package them up
// to send back to SGXP.
foreach (var feature_dataset in feature_datasets_list)
{
EsriLogger.Log("feature_dataset name is: " + feature_dataset.GetName() + ". Type is: " + feature_dataset.GetType());

var defsInDataset = geodb.GetRelatedDefinitions(feature_dataset, DefinitionRelationshipType.DatasetInFeatureDataset);
List<String> classes = new List<string>();
bool ds_has_features = false;

foreach (var def in defsInDataset)
{ //Or use LINQ .Where( d => d.DatasetType ...
EsriLogger.Log("def name is: " + def.GetName() + ". Type is: " + def.GetType());

if (def.DatasetType == DatasetType.FeatureClass)
{
classes.Add(def.GetName());
m_layer_names.Add(def.GetName());
ds_has_features = true;
layer_names_in_datsets.Add(def.GetName());
}
}

if (ds_has_features)
{
m_dataset_map.Add(feature_dataset.GetName(), classes);
}
}

// Add all features not in datasets or just tables
List<string> layers_not_in_datasets = new List<string>();
foreach (var lyr_name in m_layer_names)
{
if (!layer_names_in_datsets.Contains(lyr_name))
{
layers_not_in_datasets.Add(lyr_name);
}
}

if (layers_not_in_datasets.Count > 0)
{
m_dataset_map.Add(empty_dataset_name, layers_not_in_datasets);
}
}

// Pack up the byte stream and send the result via IPC
int command = SEE_DATASET_MAP_POPULATED;
int total_len = DEFAULT_BUFLEN;
int offset = EsriUtil.CreateMessageHeader(out byte[] message, total_len, command, message_id, error);

// serialize the map.
EsriUtil.AddInt(m_dataset_map.Count, ref message, ref offset);

foreach (var dm in m_dataset_map) {
// name of the dataset
EsriUtil.AddStringWithLength(dm.Key, ref message, ref offset);
// number of layers in this dataset
EsriUtil.AddInt(dm.Value.Count, ref message, ref offset);

foreach (var layer in dm.Value) {
// layer names in this dataset
EsriUtil.AddStringWithLength(layer, ref message, ref offset);
}
}

EsriManager.Instance.SendMessage(message);
return error;
}

SumitMishra_016 · ‎07-14-2025

You're already implementing a well-structured method for collecting and mapping datasets, feature classes, and tables in a geodatabase.

While LINQ doesn’t inherently make things faster, it makes logic more concise and sometimes avoids unnecessary list traversals, which helps performance indirectly.

For example, this block:

foreach (var lyr_name in m_layer_names)
{
   if (!layer_names_in_datsets.Contains(lyr_name))
   {
      layers_not_in_datasets.Add(lyr_name);
   }
}

View solution in original post

Aashis · ‎07-14-2025

According to the MSDN

LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code.

SumitMishra_016 · ‎07-14-2025

You're already implementing a well-structured method for collecting and mapping datasets, feature classes, and tables in a geodatabase.

While LINQ doesn’t inherently make things faster, it makes logic more concise and sometimes avoids unnecessary list traversals, which helps performance indirectly.

For example, this block:

foreach (var lyr_name in m_layer_names)
{
   if (!layer_names_in_datsets.Contains(lyr_name))
   {
      layers_not_in_datasets.Add(lyr_name);
   }
}

BillSmith · ‎07-15-2025

Thank you. I am trying a LINQ implementation to get performance numbers. I only have to do this once, upon initial connection, so subsequent cached results are very fast. It's just that this is the long tentpole in our performance of connecting. If its the best we can do, so be it!