Need to speed up performance of update cursor.

4715
16
11-08-2010 10:04 AM
CarlosPiccirillo
New Contributor III
Hi everyone,

I know this question has been asked many times in the past but I have tried every suggestion I found on the forums (been going through them for a week now) and have had little success in improving the speed of my code.

I have 16 SDE layers with a combined 75,000+ features. I have to go through every feature in every layer, run an Oracle query against one table and populate 49 fields in the SDE layer with this info.

The code below works fine but takes three hours to run and my boss wants the info updated every hour.

I've tried joining the table to each layer instead of doing the Oracle query and used ICalculator with IDisplayTable to update the fields but this took even longer, 9 hours. I've also tried search cursors, IFeature::Store and several others that I can't think of right now but they all have worse performance.

For the sake of brevity, I'm only showing three fields in the query/code and hard coded the layer. The full code is MUCH longer than it is here.

Does anyone have an idea how I can speed this code up?

Thanks,
Carlos

       public static void ProcessLayer()
        {
            try
            {
                Type appRefType = Type.GetTypeFromProgID("esriFramework.AppRef");
  object appRefObj = Activator.CreateInstance(appRefType);
                IApplication pApplication = appRefObj as IApplication;
                IMxDocument pMxDocument = (IMxDocument)pApplication.Document;
                IMap pMap = pMxDocument.FocusMap;
                IFeatureLayer pFeatureLayer = new FeatureLayerClass();
                pFeatureLayer = pMap.get_Layer(0) as IFeatureLayer;
                IFeatureClass pFeatureClass = pFeatureLayer.FeatureClass as IFeatureClass;

                //get the Workspace from the IDataset interface on the feature class
                IDataset pDataset = pFeatureClass as IDataset;
                IWorkspace pWorkspace = pDataset.Workspace;

                //Get a handle to the editor extension.
                UID pUID = new UIDClass();
                pUID.Value = "esriEditor.Editor";
                IEditor pEditor = (IEditor)pApplication.FindExtensionByCLSID(pUID);
                pEditor.StartEditing(pWorkspace);
                pEditor.StartOperation();

                IFeatureCursor pFeatureCursor = pFeatureClass.Update(null, false);

                int permitNoIdx = pFeatureCursor.FindField("PERMIT_NO");
                int appNoIdx = pFeatureCursor.FindField("APP_NO");
                int projectNameIdx = pFeatureCursor.FindField("PROJECT_NAME");
                string appNo = string.Empty;
                string permitNo = string.Empty;
                string projectName = string.Empty;

                OracleConnection dataConnection = new OracleConnection();
                dataConnection.ConnectionString = "Data Source=genp;" +
                                                  "Persist Security Info=True;" +
                                                  "User ID=pub;" +
                                                  "Password=pub;" +
                                                  "Unicode=True";

                dataConnection.Open();
                OracleCommand dataCommand = new OracleCommand();
                dataCommand.Connection = dataConnection;

                IFeature pFeature = pFeatureCursor.NextFeature();

                while (pFeature != null)
                {
                    appNo = pFeature.get_Value(appNoIdx).ToString();

                    dataCommand.CommandText = "SELECT PERMIT_NO, PROJECT_NAME " +
                                              "FROM reg_gis " +
                                              "WHERE app_no = '" + appNo + "'";

                    OracleDataReader dataReader = dataCommand.ExecuteReader();

                    while (dataReader.Read())
                    {
                        if (!dataReader.IsDBNull(0))
                        {
                            permitNo = dataReader.GetString(0);
                            pFeature.set_Value(permitNoIdx, permitNo);
                        }

                        if (!dataReader.IsDBNull(1))
                        {
                            projectName = dataReader.GetString(1);
                            pFeature.set_Value(projectNameIdx, projectName);
                        }
                    }
                    dataReader.Close();

                    pFeatureCursor.UpdateFeature(pFeature);
                    pFeature = pFeatureCursor.NextFeature();

                    //Clear values.
                    appNo = string.Empty;
                    permitNo = string.Empty;
                    projectName = string.Empty;
                }

                //Stop editing
                //pWorkspaceEdit.StopEditOperation();
                //pWorkspaceEdit.StopEditing(true);
                pEditor.StopOperation("Done");
                pEditor.StopEditing(true);

                //Release the Cursor
                System.Runtime.InteropServices.Marshal.ReleaseComObject(pFeatureCursor);
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.StackTrace + "\r\n" + ex.Message, "Warning", MessageBoxButtons.OK, MessageBoxIcon.Information);
            }
        }
0 Kudos
16 Replies
RichardFairhurst
MVP Honored Contributor
Looking at your code it appears that the performance hits are primarily due to having to to query each feature of the joined table more or less individually to create the link.  Some sort of Array or List that could read and store all of the joined table records in memory once would be more efficient than hitting the on disk table through queries multiple times.  75,000 records total does not seem like it would exhaust your resources especially if the join table has much fewer records (or no more records).  Choosing an approapriate Array, collection or list that gives you an efficient key to match each feature of your layers would be needed or you would need to sort the features to match the array sort so that you are not having to go backwards and forwards in the Array for a given layer.  .Net has a variety of objects for storing arrays, lists or collections, so something should work (Possibly a List of Arrays or Structs would work).  Anyway, that would be the basic approach I would investigate.
0 Kudos
AlexanderGray
Occasional Contributor III
There a few things you can try but it is hard to tell what is the major bottle neck, the update cursor or the oracle queries.  If you try the update cursor without the oracle query as a test (setting some dummy value) you may get a better idea where to focus. 

As far as the update cursor is concerned, you can try removing the edit operation and edit session, that is like using the field calculator outside of an edit session (no undo) and you can try setting the recycling to true.  You can also create a query filter with a null where clause but where you restrict the sub fields and pass it into the update method.  Querying only the fields you need might help a little.

As far as oracle is concerned, perhaps querying all the rows into a ado.net table in local memory and then selecting the rows from there might be more efficient.
0 Kudos
SteveFang
New Contributor III
You have too much overhead doing your updates with ArcObjects and ADO .Net.  If all you're doing is updating attributes than perhaps you should consider doing this operation in Oracle with PL/SQL.  If you are concerned about versioning than look into the Oracle packages supplied by ESRI to deal with using SQL on versioned data.  Just as a reference, I once updated a versioned SDE featureclass with 1.7+ million records in 5 to 6 hours using PL/SQL where it would of taken me days using ArcObjects.
0 Kudos
CarlosPiccirillo
New Contributor III
Alexander,

Thanks for your suggestions. To speed up testing, I reduced the number of records to 1,000 in my test layer and it appears that the bottle neck is the update cursor. Without the Oracle query and using dummy values, it took 2:35 minutes and with the Oracle query it took 2:45 minutes. Setting recycling to true had no effect and I could not remove the edit operation/session because I am using and update cursor which requires an edit session. The query with the sub fields had no affect either. The only thing I did not try that you suggested is the ado.net table in local memory because I do not know how to do that. Do you have an example in VBA/VB/.NET that I can see how to do this?

Carlos


There a few things you can try but it is hard to tell what is the major bottle neck, the update cursor or the oracle queries.  If you try the update cursor without the oracle query as a test (setting some dummy value) you may get a better idea where to focus.

As far as the update cursor is concerned, you can try removing the edit operation and edit session, that is like using the field calculator outside of an edit session (no undo) and you can try setting the recycling to true.  You can also create a query filter with a null where clause but where you restrict the sub fields and pass it into the update method.  Querying only the fields you need might help a little.

As far as oracle is concerned, perhaps querying all the rows into a ado.net table in local memory and then selecting the rows from there might be more efficient.
0 Kudos
CarlosPiccirillo
New Contributor III
Steve,

Thanks for the reply, unfortunately, I am not familiar with PL/SQL at all. 😞

Carlos


You have too much overhead doing your updates with ArcObjects and ADO .Net.  If all you're doing is updating attributes than perhaps you should consider doing this operation in Oracle with PL/SQL.  If you are concerned about versioning than look into the Oracle packages supplied by ESRI to deal with using SQL on versioned data.  Just as a reference, I once updated a versioned SDE featureclass with 1.7+ million records in 5 to 6 hours using PL/SQL where it would of taken me days using ArcObjects.
0 Kudos
AlexanderGray
Occasional Contributor III
The biggest difference I have ever seen is removing the operation and edit session.  You can try using the Ifeatureclasswrite interface but I am not confident you will get much gain.  If the oracle stuff is only accounting for 6% of the operation's time, there is not much point focusing there, the best you can achieve is less than 6% gain.  It is possible that the relation is not linear so testing the cursor without the oracle queries for more features might yield a different curve but again it looks like the wrong place to make gains.

Chump's suggestion looks much more likely to yield the 300+% gains you need.
0 Kudos
MattMoyles
New Contributor III
We had this same problem and the real bottleneck is if you use a .NET language there is a delay on the interop between COM and .NET.  Each time you update a feature there is an interop delay between COM and the .NET wrapper.

If you use straight C++ COM you will see a HUGE performance increase.  We were trying to update >100,000 records took almost an hour to run through the loop with .NET and a couple minutes with pure C++/COM. 

Unfortunately you will not find much information out there about this in the ESRI documentation.  There is one page: http://help.arcgis.com/en/sdk/10.0/arcobjects_net/conceptualhelp/index.html#//0001000000wm000000

It suggests you use managed c++, but I think you will find it is easier and faster to completely skip out on .NET and just use straight COM.  Which is not well documented either.  If you need some sample code to get started in C++ I can help you with that.
0 Kudos
CarlosPiccirillo
New Contributor III
Matt,

Thanks for the reply. Unfortunately, we are only allowed to develop code in C#. Can C++ be easily converted to C#?

Carlos

We had this same problem and the real bottleneck is if you use a .NET language there is a delay on the interop between COM and .NET.  Each time you update a feature there is an interop delay between COM and the .NET wrapper.

If you use straight C++ COM you will see a HUGE performance increase.  We were trying to update >100,000 records took almost an hour to run through the loop with .NET and a couple minutes with pure C++/COM. 

Unfortunately you will not find much information out there about this in the ESRI documentation.  There is one page: http://help.arcgis.com/en/sdk/10.0/arcobjects_net/conceptualhelp/index.html#//0001000000wm000000

It suggests you use managed c++, but I think you will find it is easier and faster to completely skip out on .NET and just use straight COM.  Which is not well documented either.  If you need some sample code to get started in C++ I can help you with that.
0 Kudos
MattMoyles
New Contributor III
Well the real issue is that there is an interop layer when accessing the underlying COM objects when you develop with a .NET language.  You can write equivalent ArcObjects code in both C# and C++.  C++ is prob. a little trickier since you have to watch your memory management.

You don't have to write your whole application in C++ though, what ESRI's article suggests doing is creating a small Managed C++ ArcObjects library and then loading and using it in your .NET C# application.  If you could create a small managed C++ library with only a few of the functions doing the heavy lifting you could take a detour around your bottle neck.  If you are stuck only using C# then you are unfourtantely out of luck with this approach.
0 Kudos