Slow C# query performance vs VBA

3927
19
07-27-2011 09:08 AM
CarlosPiccirillo
New Contributor III
Hi everyone,

I am writing an stand alone C# application and I am having a performance issue. The following code was converted from VBA literally line by line. I am doing a spatial query between two layers. The VBA code runs in between 10 and 15 minutes. The C# code below takes just over 3 hours to do the same query on the same layers as the VBA code. The code below doesn't even start ArcMap, just accesses things directly. I also tried changing the code a bit to start ArcMap in code and then run the code but it still takes the same amount of time.

Obviously, I am missing something/doing something wrong but I can't figure it out. Anyone have any ideas?

Thanks,
Carlos

public static void AppsInBoundary()
{ 
    try
    {
 IFeatureLayer pAppLayer = null;
 IFeatureLayer pEnsLayer = null;
 IFeatureCursor pEvgCursor = null;
 IFeatureCursor pAppCursor = null;
 IFeature pAppFeature = null;
 IFeature pEnsFeature = null;
 IArea pFeatureArea;
 int featureCount = 0;
 int appFieldIndex = 0;
 string geometryFieldName = string.Empty;

 //Get a reference to the two layers involved in the query.
 pAppLayer = GetLayerReference("FILEGDB", "App_Merge", Scratch_Local + "temp.gdb", null) as IFeatureLayer;
 pEnsLayer = GetLayerReference("SDE", "RIM.HYHDB_BSN_EVRGLDS_AREA", null, "gerrpsde") as IFeatureLayer;

 //Make sure layers being queried have the same projection.
 //Otherwise, topological operations later in the code (DetermineOverlapArea) fail.
 SpatialReferenceCheck(pAppLayer, pEnsLayer);

 //Find index of app no field in app_merge layer so query can run faster. 
 appFieldIndex = pAppLayer.FeatureClass.FindField("APP_NO");

 //Select only polygons that have a basin name.
 IQueryFilter pEnsQueryFilter = new QueryFilterClass();
 pEnsQueryFilter.WhereClause = "basin_name <> ' '";

 pEvgCursor = pEnsLayer.Search(pEnsQueryFilter, true);
 pEnsFeature = pEvgCursor.NextFeature();

 ISpatialFilter pAppSpatialFilter = new SpatialFilterClass();

 while (pEnsFeature != null)
 {
     m_Label = pEnsFeature.get_Value(pEnsFeature.Fields.FindField("BASIN_NAME")).ToString();

     pAppSpatialFilter.Geometry = pEnsFeature.Shape;
     pAppSpatialFilter.GeometryField = "Shape";
     pAppSpatialFilter.SpatialRel = esriSpatialRelEnum.esriSpatialRelIntersects;

     pAppCursor = pAppLayer.Search(pAppSpatialFilter, true);
     pAppFeature = pAppCursor.NextFeature();

     while (pAppFeature != null)
     {
  if (pAppFeature.get_Value(appFieldIndex) != DBNull.Value)
  {
      m_AppNo = Convert.ToString(pAppFeature.get_Value(appFieldIndex));
      pFeatureArea = pAppFeature.Shape as IArea;
      m_AppArea = pFeatureArea.Area;

      //Set relation type, in or adjacent.
      if (DetermineOverlapArea(pAppFeature, pEnsFeature) == true)
   m_Relation = "AI";   //Application In
      else
   m_Relation = "AA";   //Application Adjacent

      //If app_no is not null, proceed, otherwise skip it.
      if (m_AppNo != "")
      {
   //Check if app status is "no response", if it is not, proceed.
   if (AppStatusCheck(m_AppNo) == true)
   {
       GetAppInfo(m_AppNo);
       PopAppCmgenTables(m_AppNo, m_Relation);
   }
      }

      pAppFeature = pAppCursor.NextFeature();

      //Clear values.
      m_AppNo = string.Empty;
      m_AppArea = 0;

  }   //APP_NO != DBNull.Value
     }   //pAppFeature != null

     pEnsFeature = pEvgCursor.NextFeature();

     //Clear values.
     m_AppNo = string.Empty;
     m_Relation = string.Empty;
     m_Label = string.Empty;
 }   //pEnsFeature != null
    }
    catch (Exception ex)
    {
 LogError(ex.StackTrace, ex.Message, "AppsInBoundary", null);
    }   
}

internal static ILayer GetLayerReference(string layerType, string featureClassName, string path, string serverName)
{
    try
    {
 IWorkspaceFactory pWorkspaceFactory = null;
 IFeatureWorkspace pFeatureWorkspace = null;
 IFeatureClass pFeatureClass = null;

 if (layerType == "SDE")
 {
     IPropertySet pPropertySet = new PropertySetClass();
     pPropertySet = GetSdePropertySet(serverName, null);
     pWorkspaceFactory = new SdeWorkspaceFactoryClass();
     pFeatureWorkspace = pWorkspaceFactory.Open(pPropertySet, 0) as IFeatureWorkspace;
 }
 else if (layerType == "FILEGDB")
 {
     pWorkspaceFactory = new FileGDBWorkspaceFactoryClass();
     pFeatureWorkspace = pWorkspaceFactory.OpenFromFile(path, 0) as IFeatureWorkspace;
 }
 else if (layerType == "PGDB")
 {
     pWorkspaceFactory = new AccessWorkspaceFactoryClass();
     pFeatureWorkspace = pWorkspaceFactory.OpenFromFile(path, 0) as IFeatureWorkspace;
 }

 pFeatureClass = pFeatureWorkspace.OpenFeatureClass(featureClassName);
 IFeatureLayer pFeatureLayer = new FeatureLayerClass();
 pFeatureLayer.FeatureClass = pFeatureClass;
 return pFeatureLayer;
    }
    catch (Exception ex)
    {
 LogError(ex.StackTrace, ex.Message, "GetGroupLayer", null);
 return null;
    }
}
0 Kudos
19 Replies
NeilClemmons
Regular Contributor III
.NET code using ArcObjects will almost always run slower than code written in VBA/VB6.  This is because .NET does not directly support COM and VBA/VB6 does.  Any call into a COM object in .NET must pass through a runtime callable wrapper (RCW) that acts as a middleman between COM and .NET.  Naturally, this slows things down.  The more ArcObjects calls you make, the slower it will run.  That being said, there are several things you may be able to do to speed things up.  The first is to make sure your data is properly indexed.  Aside from a spatial index on the feature class, all attribute fields involved in the process should also be indexed.  When you execute your queries, setting the Subfields property on the query/spatial filter will help.  You should set this property to include only the fields that you need to be populated in the cursor.  You should also review your logic to see if there is a way to reduce the number of queries you have to make.  I don't know exactly what your code is doing but it looks like you are performing a spatial query on one layer for each feature in the other layer that has a basin name.  Could you possibly combine the geometries of all features with the same basin  name into a single spatial query?  For instance, let's say there are 100 features in the layer with a single basin name.  If you add these 100 geometries into a geometry bag and use it to execute a single spatial query then you will eliminate 99 spatial queries from the process.  Of course, depending on what it is that you're doing with the query results you may not be able to do this but this is the type of thing you should look for.  If all else fails, you may be better off writing this procedure in VB6 or C++ and compiling it into a library that you can then call from your .NET application.
0 Kudos
CarlosPiccirillo
New Contributor III
Thanks for the reply Neil!

All the fields I query are indexed as I learned that lesson from you a couple of years ago. I will try your suggestions and see what happens.

In the case of the two layers I am querying, the outter cursor (basin name) layer has only 17 features and there are no duplicate values of basin name. The inner cursor (app_no) has 76,000+ features and they are already merged by app_no to minimize queries so I guess there is no performance improvement there.

I have very little VB6 experience, mostly using examples as reference for VBA or C# and no C++ so I will try to convert the code to VB inside of Visual Studio 2008 and see if it runs better there.

If you have any other suggestions, I'm all ears. Thanks again for your time!
Carlos


.NET code using ArcObjects will almost always run slower than code written in VBA/VB6.  This is because .NET does not directly support COM and VBA/VB6 does.  Any call into a COM object in .NET must pass through a runtime callable wrapper (RCW) that acts as a middleman between COM and .NET.  Naturally, this slows things down.  The more ArcObjects calls you make, the slower it will run.  That being said, there are several things you may be able to do to speed things up.  The first is to make sure your data is properly indexed.  Aside from a spatial index on the feature class, all attribute fields involved in the process should also be indexed.  When you execute your queries, setting the Subfields property on the query/spatial filter will help.  You should set this property to include only the fields that you need to be populated in the cursor.  You should also review your logic to see if there is a way to reduce the number of queries you have to make.  I don't know exactly what your code is doing but it looks like you are performing a spatial query on one layer for each feature in the other layer that has a basin name.  Could you possibly combine the geometries of all features with the same basin  name into a single spatial query?  For instance, let's say there are 100 features in the layer with a single basin name.  If you add these 100 geometries into a geometry bag and use it to execute a single spatial query then you will eliminate 99 spatial queries from the process.  Of course, depending on what it is that you're doing with the query results you may not be able to do this but this is the type of thing you should look for.  If all else fails, you may be better off writing this procedure in VB6 or C++ and compiling it into a library that you can then call from your .NET application.
0 Kudos
NeilClemmons
Regular Contributor III
Converting the code to VB.NET will not help as it will still run using COM Interop.  It sounds as if the data and the query logic may not the problem.  It could be the code inside your GetAppInfo and PopAppCmgenTables methods.  That's where I'd start to look for ways to speed things up.  How fast do the queries run if you comment out the calls to these two methods.  You can put code at various places to calculate how much time it takes to execute blocks of code.  I would do this to find where the bottleneck is and then look for ways to overcome it.  If you post the code in those bottlenecks I or someone else may be able to offer suggestions.
0 Kudos
CarlosPiccirillo
New Contributor III
I commented out both GetAppInfo and PopAppCmgenTables methods and stopped the code after 20 minutes because it took the same amount of time to get through 7 of 17 records as the original code so it looks like the bottle neck is somewhere in there.

I am including the code for both GetAppInfo and PopAppCmgenTables methods in case you see ways of improving them. GetAppInfo does a bunch of queries against Access tables and PopAppCmgenTables populates a few tables in another Access database.

I have to admit, GetAppInfo is quite large, convoluted and ugly. It's evolved over the years with patches upon patches to deal with quirky data out of my control. When I started converting it to C# from VBA, I thought about trying to redo the method as a clean start but decided against it because I know the original code works well and fast.

I will write out to a text file at the start and end of pieces of code with time stamps and see if I can narrow down the bottleneck.

I had to zip the code up because it was too large to fit in this window and too large to send as a plain text file, sorry.
0 Kudos
NeilClemmons
Regular Contributor III
Use something like this to measure how long a block of code takes to execute.

        Dim startTime As Date = Now
        ' execute a block of code here
        Dim span As TimeSpan = Now.Subtract(startTime)
        Debug.Print("time to execute block:  " & span.Seconds & " seconds")


Just modify the print statement so you can identify what's what in the debug window.
0 Kudos
CarlosPiccirillo
New Contributor III
Neil,

As it turns out, there are two bottlenecks, methods GetAppInfo and DetermineOverlapArea (below). DetermineOverlapArea calculates how much overlap two polygons have. If it's greater than 5%, we record it, otherwise, skip it. No idea why a simple area calculation would slow things down so much.

        public static bool DetermineOverlapArea(IFeature pFeature1, IFeature pFeature2)
        { 
            try
            {
                IArea pFeature1Area = pFeature1.Shape as IArea;
                IArea pFeature2Area = pFeature2.Shape as IArea;
                ITopologicalOperator2 pTopologicalOperator = pFeature2.Shape as ITopologicalOperator2;
                pTopologicalOperator.IsKnownSimple_2 = false;
                pTopologicalOperator.Simplify();

                IArea pIntersectArea = pTopologicalOperator.Intersect(pFeature1.Shape, esriGeometryDimension.esriGeometry2Dimension) as IArea;

                int appOverlap = Convert.ToInt32((pIntersectArea.Area / pFeature1Area.Area) * 100);
                int ensOverlap = Convert.ToInt32((pIntersectArea.Area / pFeature2Area.Area) * 100);

                if (appOverlap > 5 || ensOverlap > 5)
                    return true;
                else
                    return false;
            }
            catch (Exception ex)
            {
                LogError(ex.StackTrace, ex.Message, "DetermineOverlapArea", null);
                return false;
            }           
        }
Use something like this to measure how long a block of code takes to execute.

        Dim startTime As Date = Now
        ' execute a block of code here
        Dim span As TimeSpan = Now.Subtract(startTime)
        Debug.Print("time to execute block:  " & span.Seconds & " seconds")


Just modify the print statement so you can identify what's what in the debug window.
0 Kudos
NeilClemmons
Regular Contributor III
A few things to try:

First, a method should only take what it really needs as parameters.  In this case, you don't really need 2 features, you need 2 polygons.  So I would modify the method signature to this:

public static bool DetermineOverlapArea(IPolygon polygon1, IPolygon polygon2)

When you call the method, pass in the feature geometries cast as polygons.  Use ShapeCopy instead of Shape because you don't want your method altering the feature geometries.

Try skipping the geomtry simplification.  The general idea here is that an unmodified feature geometry from a feature class will already be simple.

Remove the if/else block at the end and replace it with:  return (appOverlap > 5 || ensOverlap > 5);
Probably not much time savings there but every bit counts.
0 Kudos
CarlosPiccirillo
New Contributor III
Neil,

I tried your suggestions for speeding up method DetermineOverlapArea and it's just as slow as before. I tried passing in the geometries both as polygons as well as geometries and there is no improvement at all. It takes 10 minutes to get through 5 records.

A few things to try:

First, a method should only take what it really needs as parameters.  In this case, you don't really need 2 features, you need 2 polygons.  So I would modify the method signature to this:

public static bool DetermineOverlapArea(IPolygon polygon1, IPolygon polygon2)

When you call the method, pass in the feature geometries cast as polygons.  Use ShapeCopy instead of Shape because you don't want your method altering the feature geometries.

Try skipping the geomtry simplification.  The general idea here is that an unmodified feature geometry from a feature class will already be simple.

Remove the if/else block at the end and replace it with:  return (appOverlap > 5 || ensOverlap > 5);
Probably not much time savings there but every bit counts.
0 Kudos
JeffreyHamblin
New Contributor III
I commented out both GetAppInfo and PopAppCmgenTables methods and stopped the code after 20 minutes because it took the same amount of time to get through 7 of 17 records as the original code so it looks like the bottle neck is somewhere in there.


If I am reading this correctly... If you comment out those two methods and the operation is still just as slow, then the bottleneck is not in those methods.
0 Kudos