Select to view content in your preferred language

Performance issues with Row.Store() for bulk updates on MSSQL SDE – Can I use direct SQL for attribute-only updates?

188
3
Monday
MaxW
by
Emerging Contributor

Hi everyone,

I am currently developing a Console Host application for scheduled data synchronization using the ArcGIS Pro SDK. My environment is an Enterprise Geodatabase on MSSQL Server.

I am facing a significant performance bottleneck when using RowCursor and Row.Store() to update approximately 240,000 records. Currently, the process takes about 3 hours, which is far too slow for our production requirements.

Here is the context of the Feature Class:

  • Storage: MSSQL Server (SDE).

  • Versioning: Non-Versioned.

  • Archiving: Disabled.

  • Editor Tracking: Enabled.

  • Services: This Feature Class is published as a Map/Feature Service.

My Requirements:

  • I only need to update specific attribute fields.

  • There will be NO changes to the geometry (Shape) or new feature additions.

My Questions:

  1. Given that the data is Non-Versioned, is it safe to perform these attribute updates using direct MSSQL UPDATE statements instead of the SDK's Row.Store()?

  2. Since Editor Tracking is enabled, I understand I would need to manually update the tracking fields (editor/date) via SQL. Are there any other hidden system tables I should be aware of?

  3. What is the impact on the published services? Will the service instances recognize the changes immediately, or do I need to restart the services/clear caches after a direct SQL update?

If anyone has experience handling large-scale attribute updates in an SDE/MSSQL environment, I would greatly appreciate your insights or any alternative high-performance patterns within the SDK (e.g., specific batching techniques).

Thanks in advance!

0 Kudos
3 Replies
MarcoBoeringa
MVP Alum

I can't comment on the specifics of your SDE based implementation, as I don't run an Enterprise Geodatabase, just an ordinary enterprise spatial database (PostgreSQL with 'geometry' storage) that is not "geodatabase" enabled.

However, the first question that comes to mind: what is the latency between your client and server? Latency is vital for performance, and can kill it.

My current test setup is a laptop connected to an older HP Z840 workstation (2x Xeon E5-2699 v4, the fastest processor of that generation) with 512GB RAM and fast NVMe storage. This setup has extremely low latency via one 1GB/s network switch.

Using psycopg2 , ArcPy and Python "concurrent.futures.ThreadPoolExecutor' multithreading options, it allows me to UPDATE up to 500k rows / s(!), so 1.8B rows per hour using 44 concurrent threads (Xeon E5-2699 v4 have 22 cores) from within ArcGIS Pro using Python scripts. Yes, you read that right, more than 1B rows per hour for light UPDATE statements not involving complex operations (e.g generalizing in my specific workflow, which involves considerable calculations, lowers it to about 40k rows / s, so about 150M rows per hour). 

Of course, I UPDATE rows in batches, so this isn't equivalent to TPS (Transactions per second), but it does show what is possible in ideal situations. And with 512GB RAM and clever use of batching, much of the UPDATEs happen in RAM, so you get speeds close to an in-memory database, especially when the PostgreSQL database determines it can do Heap-Only Tuples (HOT) updates. The 512GB RAM is not overkill though, given that my largest processed tables have been > 1.5B records, where I literally update each-and-every record, so essentially a complete rewrite of the tables using UPDATEs. You really want sufficient RAM as well, so as to avoid continuous swapping in-and-out of memory of data in RAM to disk.

0 Kudos
SelimDissem
Esri Contributor

Hi Max,

One thing you can try is to separate the attribute data from the spatial and recombine it later using a view. You can then use "pure" SQL Server capabilities on the attributes and use a "system" account to mark the updates done with the application (I'm not aware of hidden system tables that would need to be updated but don't quote me on that).

If possible, you can reduce network "chat" by having the update be done by a stored procedure that runs on the Server itself and your application would just call that stored procedure.

Note that all of the above can be achieved with a "regular" .NET console app if you're not using any of the ArcGIS Pro functionality for other things.

Services referencing the resulting view should automatically show the latest and greatest.

0 Kudos
sjones_esriau
Esri Contributor

Hi Max,
Give UpdateCursor a try first. You will get better performance than RowCursor/Store.

using ArcGIS.Core.Data;
using ArcGIS.Core.Hosting;
using System;

class Program
{
    static void Main(string[] args)
    {
        // Initialise CoreHost
        Host.Initialize();

        // Path to your .sde connection file
        var sdePath = @"C:\Connections\myDatabase.sde";

        using (var geodatabase = new Geodatabase(
            new DatabaseConnectionFile(new Uri(sdePath))))
        {
            // Wrap all edits in ApplyEdits (critical for CoreHost)
            geodatabase.ApplyEdits(() =>
            {
                using (var table = geodatabase.OpenDataset<Table>("DBO.MyFeatureClass"))
                {
                    // Only request the fields you need
                    var queryFilter = new QueryFilter
                    {
                        WhereClause = "Status = 'Pending'",     // Filter rows to update
                        SubFields = "OBJECTID, Status, LastUpdated" // Required fields only
                    };

                    using (var cursor = table.Update(queryFilter, false))
                    {
                        int updatedCount = 0;

                        while (cursor.MoveNext())
                        {
                            using (var row = cursor.Current)
                            {
                                // Update attributes
                                row["Status"] = "Complete";
                                row["LastUpdated"] = DateTime.UtcNow;

                                // Persist change (buffered under ApplyEdits)
                                cursor.Update(row);

                                updatedCount++;
                            }
                        }

                        Console.WriteLine($"Updated {updatedCount} rows.");
                    }
                }
            });
        }

        Console.WriteLine("Done.");
    }
}

 

0 Kudos