Improved documentation for updateConnectionProperties(): workspace_factory

3114
10
03-24-2023 11:50 AM
Status: Under Consideration
Labels (1)
AlfredBaldenweck
MVP Regular Contributor

Please see this post for more details.

For context, I've been trying to write some code to mass-replace data sources. However, Pro's only way to do this, updateConnectionProperties(), requires you to know not just the file path, but what type of file and workspace it belongs to. *

This would be fine except there's no documentation for acceptable values, nor is there an easy way to find out, short of loading all of your files and checking their connection properties yourself.

What I am looking for is an exhaustive list of possible values for the workspace_factory parameter

Here are places I have checked:

  1. Updating and fixing data sources—ArcGIS Pro | Documentation
  2. Layer—ArcGIS Pro | Documentation
  3. ArcGISProject—ArcGIS Pro | Documentation
  4. Describe object properties—ArcGIS Pro | Documentation
  5. Layer properties—ArcGIS Pro | Documentation
  6. Workspace properties—ArcGIS Pro | Documentation
  7. Parameter data types in a Python toolbox—ArcGIS Pro | Documentation
  8. WorkspaceFactory Class (ArcObjects .NET 10.8 SDK) (arcgis.com) (ArcMap)
  9. IWorkspaceFactory Interface | ArcGIS Enterprise SDK (Pro)
  10. layer.replaceDataSource() (ArcMap)
  11. Just about every python file in the arcpy folder.

At the end of all of this, I still don't have any idea of what my possible options are.

Improved documentation would be a lifesaver, considering updateConnectionProperties() already asks you do a lot of the thinking that it should be doing for you (Meaning it should be able to figure out if you’re feeding it a shapefile or not without you telling it).

Please update the documentation somewhere, be it at Updating and fixing data sources—ArcGIS Pro |  Documentation or Layer—ArcGIS Pro | Documentation and ArcGISProject—ArcGIS Pro | Documentation

Detailed documentation of my search:

Spoiler

1. Updating and fixing data sources—ArcGIS Pro | Documentation only gives the example of “File Geodatabase”, but it is far from exhaustive.

2. Layer—ArcGIS Pro | Documentation Looking at the connectionProperties, didn’t go anywhere, due to the lack of a comprehensive list.

AlfredBaldenweck_0-1679683024396.png


3. ArcGISProject—ArcGIS Pro | Documentation See #2, Layer

4. Describe object properties—ArcGIS Pro | Documentation Nothing helpful in the basic properties, but put me onto Layer Properties (Describe).

5. 
Layer properties—ArcGIS Pro | Documentation I checked dataElementType and dataType. Neither are formatted correctly, so I would need to build a dictionary to match to the (missing) comprehensive list.

AlfredBaldenweck_1-1679683024398.png

I thought I might be able to make a list by just describing every file in a directory, then returning the set, but surprise: any coverages will crash both Pro and ArcMap (To be clear, I'm not planning on using coverages, but we do have some and they're getting in the way of trying to get this list).

AlfredBaldenweck_2-1679683024399.png

This will crash Pro and ArcMap/Catalog when it reaches a coverage. Changing to arcpy.da.Describe() for Pro doesn’t help.

6. Workspace properties—ArcGIS Pro | Documentation has a very small list of possible properties:

  • esriDataSourcesGDB.AccessWorkspaceFactory.1—Personal geodatabase
  • esriDataSourcesGDB.FileGDBWorkspaceFactory.1—File geodatabase
  • esriDataSourcesGDB.InMemoryWorkspaceFactory.1—In-memory workspace
  • esriDataSourcesGDB.MemoryWorkspaceFactory.1—Memory workspace
  • esriDataSourcesGDB.SdeWorkspaceFactory.1—Enterprise geodatabase
  • esriDataSourcesGDB.SqliteWorkspaceFactory.1—Mobile geodatabase
  • (empty string)—Other

This list is not exhaustive; shapefiles and rasters would not return anything useful, despite having an entry in connectionProperties. Also, this list is not formatted in the way I need for use in updateConnectionProperties().


7. Parameter data types in a Python toolbox—ArcGIS Pro | Documentation See #4, Layer Properties, dataElementType

8. WorkspaceFactory Class (ArcObjects .NET 10.8 SDK) (arcgis.com) (ArcMap) provides a list but not with correct formatting.

9. 
IWorkspaceFactory Interface | ArcGIS Enterprise SDK (Pro) provides a list but not with correct formatting.

10. replaceDataSource() (Arcmap) has a potentially useful list, although not formatted in a helpful way.

  • ACCESS_WORKSPACE — A personal geodatabase or Access workspace
  • ARCINFO_WORKSPACE — An ArcInfo coverage workspace
  • CAD_WORKSPACE —A CAD file workspace
  • EXCEL_WORKSPACE —An Excel file workspace
  • FILEGDB_WORKSPACE —A file geodatabase workspace
  • NONE —Used to skip the parameter
  • OLEDB_WORKSPACE —An OLE database workspace
  • PCCOVERAGE_WORKSPACE —A PC ARC/INFO Coverage workspace
  • RASTER_WORKSPACE —A raster workspace
  • SDE_WORKSPACE —An SDE geodatabase workspace
  • SHAPEFILE_WORKSPACE —A shapefile workspace
  • TEXT_WORKSPACE —A text file workspace
  • TIN_WORKSPACE —A TIN workspace
  • VPF_WORKSPACE —A VPF workspace

11. I checked _mp, arcobjects (every file), anything to do with toolboxes. Nothing.

*To be clear, depending on your use case, you may need less. In my case, where I'm frequently changing not just the GDB or folder, but also changing to a file with a different name, I do need this information. See "Changing a layer's dataset" here.

10 Comments
JeffMoulds

Alfred,

There are a few new things in the 3.1 Updating Data Sources help that might help with this.

1.) In the first batch of samples, look at sample #6. New at 3.1, you dont need to know anything about the source of the existing layer. You can supply None in the first parameter. E.g.

 

aprx.updateConnectionProperties(None, r'C:\Projects\YosemiteNP\New_Data\Yosemite.gdb')

 

 

2.) Check out the sub sections entitled Updating data sources via the CIM and Changing a layer's dataset. Changing a layer's dataset via the CIM might be easier than using the ConnectionProperties dictionary in your case. Those aforementioned sections might be more in line with what you are trying to accomplish. There is a trail of breadcrumbs that will lead you to a list of the workspace factory values. (But I admit that it's not the easiest thing to find. I will investigate a way to make this easier in the future.)  If you follow the link to the Python CIM Access topic, then to the CIM Spec, they are listed in the CIM spec here. Search for WorkspaceFactory on that page.

Workspace factory types.

Property Value Description

SDE0Enterprise geodatabase.
FileGDB1File geodatabase.
Raster2Raster.
Shapefile3Shapefile.
OLEDB4OLEDB.
Access5Microsoft Access.
DelimitedTextFile6Delimited text file.
Custom7Custom.
Sql8SQL query layer.
Tin9TIN.
TrackingServer10Tracking server.
NetCDF11NetCDF.
LASDataset12LAS dataset.
SQLite13SQLite.
FeatureService14Feature service.
ArcInfo15Arc/INFO Coverage.
Cad16CAD.
Excel17Microsoft Excel.
WFS18Web feature service.
StreamService19Stream service
BIMFile20BIM file.
InMemoryDB21In memory database.
NoSQL22NoSQL.
BigDataConnection23Big Data connection.
KnowledgeGraph24Knowledge Graph connection.
NITF25NITF connection.

 

AlfredBaldenweck

Hi Jeff, thanks for the response.

To your first point, it doesn't matter at all what the existing source is, all that matters is what you're trying to replace it with. (As an aside: really, UpdateConnectionProperties() should be able to just figure this stuff out. If Pro can do it when you go through GUI, it should be able to do it through Python)

I gave the CIM properties a shot, and ultimately, they haven't worked out. 1) They still require you to know what kind of file it is that you're replacing with and 2) I can change the layer source just fine by updating CIM, but if I save and come back, it's broken and the source is now the name of the project folder plus the source I tried changing it to. I would say this is user error except for it works fine in the session so ¯\_ (ツ)_/¯.

AlfredBaldenweck_0-1681324376072.png

(Also this set up will crash Pro if you try to manually replace the sources and it looks like this.)

More importantly, the list that you give (which is comprehensive and great for CIM stuff) is formatted differently from the acceptable values in updateConnectionProperties(). For example, acceptable values there are "Shape File" and "File Geodatabase", not "Shapefile" and "FileGDB". Using either of the latter values doesn't do anything. Like, not even an error. It just skips by. 

 

JeffMoulds

Can you show me your code? For the shapefile example, are you re-sourcing something like, c:\data1\shapefile1.shp to c:\data2\shapefile2.shp?

AlfredBaldenweck

Of course, thank you.

Below is my CIM code. I've just been dropping it into the Python window to test. (I'm stuck in 2.9 for now)

For testing, I'm using a blank shapefile (No records, nothing) and a similarly blank file geodatabase feature class, trying to switch the layer's reference between them. 
I have the full range of acceptable values in wkspfactList, but this error behaviour also occurs if that list of values is just ["FileGDB", "Shapefile"].

Again, it works great as long as the Project is open and falls apart once you exit.

Spoiler
aprx = arcpy.mp.ArcGISProject('CURRENT')
mp = aprx.activeMap
lays = mp.listLayers()

sourceDict = {r"W:\Downloads\testmdb\test.gdb\ex1gdb": r"W:\Downloads\testmdb\example1.shp",
              r"W:\Downloads\testmdb\example1.shp": r"W:\Downloads\testmdb\test.gdb\ex1gdb"}
              
''' Update CIM workflow'''
wkspfactList = ["SDE", "FileGDB", "Raster", "Shapefile", "OLEDB",
                "Access", "DelimitedTextFile", "Custom", "Sql",
                "Tin", "TrackingServer", "NetCDF", "LASDataset",
                "SQLite", "FeatureService", "ArcInfo", "Cad", 
                "Excel", "WFS", "StreamService", "BIMFile", 
                "InMemoryDB", "NoSQL", "BigDataConnection",
                "KnowledgeGraph", "NITF"]
for lay in lays:
    if lay.isGroupLayer:
        continue
    # If anyone can tell me why sometimes layers support
    #    dataSource and sometimes they don't; that'd be awesome.  
    # Ditto for catalogPath.  
    if lay.supports("dataSource"):
        layPath = lay.dataSource
    else:
        layPath= arcpy.da.Describe(lay)["catalogPath"]
        
    if layPath in sourceDict:
        descR= arcpy.da.Describe(sourceDict[layPath])
        for wkspc in wkspfactList:
            # Get the layer's CIM definition
            layCIM = lay.getDefinition('V2')

            # Create a new CIM data connection
            dc = arcpy.cim.CreateCIMObjectFromClassName('CIMStandardDataConnection', 'V2')
            # Specify the geodatabase
            dc.workspaceConnectionString = f"DATABASE= {descR['path']}"
            
            # Specify the workspace type
            dc.workspaceFactory = wkspc

            # Specify the dataset name
            dc.dataset = descR['name']

            # Set the new data connection to the layer's CIM featureTable
            layCIM.featureTable.dataConnection = dc
                
            # Set the layer's CIM definition
            lay.setDefinition(layCIM)
            
            # Stop if you have something that works.
            if lay.isBroken:
                continue
            else:
                break​

For updateConnectionProperties:
This works great except for I don't know the acceptable values nor how to guess them.

Spoiler
aprx = arcpy.mp.ArcGISProject('CURRENT')
mp = aprx.activeMap
lays = mp.listLayers()

sourceDict = {r"W:\Downloads\testmdb\test.gdb\ex1gdb": r"W:\Downloads\testmdb\example1.shp",
              r"W:\Downloads\testmdb\example1.shp": r"W:\Downloads\testmdb\test.gdb\ex1gdb"}

''' updateConnectionProperties workflow'''
wkspfactDict = {"DEFeatureClass": "File Geodatabase",
                "DEShapeFile": "Shape File"}
for lay in lays:
    if lay.isGroupLayer:
        continue
    # If anyone can tell me why some layers don't support dataSource
    #   that'd be great.
    if lay.supports("dataSource"):
        layPath = lay.dataSource
    else:
        layPath= arcpy.da.Describe(lay)["catalogPath"]
    if layPath in sourceDict:
        descR= arcpy.da.Describe(sourceDict[layPath])
        layCP = lay.connectionProperties
        fCP = {'dataset': descR["name"], 
               'workspace_factory': wkspfactDict[descR["dataElementType"]],
               'connection_info': 
              {'database': descR["path"]}
        }
        lay.updateConnectionProperties(layCP, fCP )
print("done")     

I hate to say it, but this entire thing was a lot easier in ArcMap; you made a list, looped until something didn't break, and there you go. Having to make an educated guess is a lot more trouble, especially since Pro apparently will just change stuff to whatever you say instead of breaking if it doesn't work.

Spoiler
def replaceDS27(sourcemap, sourcedict):
    sourcemap = arcpy.mapping.MapDocument(sourcemap)
    #for lay in arcpy.mapping.ListBrokenDataSources(sourcemap):
    for lay in arcpy.mapping.ListLayers(sourcemap):
        layDS= lay.dataSource
        if layDS in sourcedict:
            layD= sourcedict[layDS]
            dirN= os.path.dirname(layD)
            baseN= os.path.splitext(os.path.basename(layD))[0]
            wkspcType = ['ACCESS_WORKSPACE',
                         'ARCINFO_WORKSPACE',
                         'CAD_WORKSPACE',
                         'EXCEL_WORKSPACE',
                         'FILEGDB_WORKSPACE',
                         'NONE',
                         'OLEDB_WORKSPACE',
                         'PCCOVERAGE_WORKSPACE',
                         'RASTER_WORKSPACE',
                         'SDE_WORKSPACE',
                         'SHAPEFILE_WORKSPACE',
                         'TEXT_WORKSPACE',
                         'TIN_WORKSPACE',
                         'VPF_WORKSPACE'
                         ]
            # Attempt to replace the source by guessing the wkspc type.
            # Really we're just brute-forcing it rather than making
            #   an educated guess.
            for wkspc in wkspcType:
                try:
                    lay.replaceDataSource(dirN, wkspc, baseN)
                    break
                except:
                    continue
                        
    sourcemap.save()

I think in addition to a proper list of acceptable types of workspace factory; some sort of documentation showing how to programmatically find what types you have would be nice, to avoid the "Throw against the wall and see what sticks" operation I've been running.* What I did for ArcMap works great but is pretty inefficient.

I've made an attempt to do this in the updateConnectionProperties() example by checking the dataElementType, but I'm not sure how viable that is outside of shapefiles and file gdb feature classes.

* The obvious question is "Why don't you know what types of data you have?" and the answer is I'm trying to do this for thousands of files, and it's easier to have the computer figure it out than it is for me. 

JeffMoulds

Alfred,

 

Try a basic workflow first, and see if you can get it to work. Here is a sample using the CIM:

import arcpy

aprx = arcpy.mp.ArcGISProject("C:\\Temp\\Shp2FGDB.aprx")
m = aprx.listMaps('SHP')[0]
lyr = m.listLayers('ShpStates')[0]
lyrCIM = lyr.getDefinition("V3")
lyrCIM.featureTable.dataConnection.workspaceConnectionString = "DATABASE=C:\\Temp\\States.gdb"
lyrCIM.featureTable.dataConnection.workspaceFactory = "FileGDB"
lyrCIM.featureTable.dataConnection.dataset = "States"
lyr.setDefinition(lyrCIM)
aprx.saveACopy("C:\\Temp\\Output.aprx")
print('done')

 

And here is the same workflow using updateConnectionProperties and the connectionProperties dictionary.

import arcpy

aprx = arcpy.mp.ArcGISProject("C:\\Temp\\Shp2FGDB.aprx")
m = aprx.listMaps('SHP')[0]
lyr = m.listLayers('ShpStates')[0]

find_dict = {'connection_info': {'database': 'C:\\Temp'},
 'dataset': 'ShpStates.shp',
 'workspace_factory': 'Shape File'}

replace_dict = {'connection_info': {'database': 'C:\\Temp\\States.gdb'},
 'dataset': 'States',
 'workspace_factory': 'File Geodatabase'}

lyr.updateConnectionProperties(find_dict, replace_dict)
aprx.saveACopy("C:\\Temp\\\Output.aprx")
print('done')

 

AlfredBaldenweck

Hi Jeff, 

I tried the CIM workflow you posted, changing only the necessary values (and the CIM version, since I'm in 2.9 for the foreseeable future.) The same thing happened with the new data source, and a colleague tested the same code with the same result.

AlfredBaldenweck_0-1682952992368.png

I think I'm going to have to submit a support ticket for this specific issue.

That being said, I do want to reiterate the need for a comprehensive list of acceptable values for updateConnectionProperties(), as well as a way to programmatically determine which of those values is appropriate for your data.

JeffMoulds

Is your File GDB feature class residing in a feature data set? If so, there might be another line you need to add to the CIM script. 

AlfredBaldenweck

Nope, all out in the open. The shapefiles I've been testing with are in the same folder (testmdb) as well. I tested converting between the two gdb feature classes and converting between the two shapefiles, using version 2.9.5

AlfredBaldenweck_1-1682967024622.png

 

 

mj_gis
by

I also need a comprehensive list of the workspace connection properties. I need to loop through folders containing lyrx files that are sourced from all different types of data. I need to print out the location of the data (ie if map service I need to print a URL, if raster I need to print the SDE properties, if shapefile I need to print folder and dataset name)  A comprehensive list of workspace connection properties could speed up my process infinitely. 

JeffMoulds
Status changed to: Under Consideration