Is it better to use Pandas/SQLAlchemy or Arcpy/SDE to retrieve data from MSSQL Server?

AustinAverill

Assuming scripts are being run through a machine that has valid SDE connections made, what is the best way to access data on the applicable server?

#SDE and SearchCursor
import arcpy

db_table = r'C:\DBCon.sde\table_name'
data = []

with arcpy.da.SearchCursor(db_table, fields) as cursor:
    for row in cursor:
        data.append(row)

df = pd.DataFrame(data, columns = fields)

or

#pandas and sql alchemy
data = []
    
for chunk in pd.read_sql_query(
    query, connection,
    chunksize=2000):
    data.append(chunk)

df = pd.concat(data, ignore_index=True)

ChristianWells

Thank you! If the dataset being updated is in a geodatabase-enabled MSSQL Sever instance, I'd encourage you to use the arcpy.da functions for better geometry validation on the insert command.

View solution in original post

DavidSolari

If your table is maintained outside of ArcGIS then direct connections work perfectly fine and avoid any ArcGIS overhead. But if the table was created in ArcGIS then you'll want to use an arcpy cursor to work with the data. You can use direct SQL connections but you have to take steps to ensure you're only reading the correct data from the table -- is it branch versioned, traditional versioned, archive table etc. etc. -- and writing data is far more complex and considered unsupported behavior so don't expect any support there.

View solution in original post

ChristianWells

Hi @AustinAverill - it's a bit of 'it depends'. Are you planning on read only access or read/write? Do you plan to use spatial functions on the data?

AustinAverill

In the current workflow I have generated, the dataset is read from the server table (non-spatial) and produces a spatially enabled data frame from a column in the table that contains a geojson like string then truncates and updates an existing dataset with the new.

ChristianWells

Thank you! If the dataset being updated is in a geodatabase-enabled MSSQL Sever instance, I'd encourage you to use the arcpy.da functions for better geometry validation on the insert command.

DavidSolari

If your table is maintained outside of ArcGIS then direct connections work perfectly fine and avoid any ArcGIS overhead. But if the table was created in ArcGIS then you'll want to use an arcpy cursor to work with the data. You can use direct SQL connections but you have to take steps to ensure you're only reading the correct data from the table -- is it branch versioned, traditional versioned, archive table etc. etc. -- and writing data is far more complex and considered unsupported behavior so don't expect any support there.