Need python script to copy SDE to GDB

3627
10
07-25-2022 04:12 PM
Labels (3)
PhilLampe
New Contributor III

Hello,

First off, please forgive my unintentional misuse of the terminologies going forward.  I'm not versed in SDE and may have stated some things incorrectly.

So from the 50,000 foot view my overall intent is to go through this SDE database (that I just got connected to) and find out how many records there are.  This SDE has numerous datasets, and inside those datasets are numerous feature classes.  The feature classes utilize relates and some of the fields are tied to coded value domains.  At the root of the SDE connection are tables that are linked to the relates in the feature classes as well as the coded domains (I believe).

The goal here is to read the contents of the SDE and generate a text (CSV or XLS) file with it's contents.  All I need is 3-columns (Dataset name = Col-A, Feature Class name = Col-B, and total feature count = Col-C).

Now, I HAVE a script in hand that can do ALL of the above, EXCEPT (big EXCEPT) it will not work when trying to read from an SDE connection.  It MUST be able to point to a GDB.

Hence the reason I'm here now.

So, the IMMEDIATE goal here is to find a python script that can take my version instance of SDE and make a duplicate copy of the entire SDE database into a GDB database.

So... is there a python script that can just do the above listed in red?

Or, does anyone know of a python script that can do BOTH listed in red and blue?

If there's more information you need before you can provide an answer, please respond and I'll reply ASAP.

TIA!

Phil

0 Kudos
10 Replies
JohannesLindner
MVP Frequent Contributor

You should be able to copy from SDE to FGDB using the XML Workspace Document:

Export XML Workspace Document (Data Management)—ArcGIS Pro | Documentation

Import XML Workspace Document (Data Management)—ArcGIS Pro | Documentation

 

If this doesn't work, there is a script here: arcgis 10.1 - Copying ArcSDE geodatabase to file geodatabase using ArcPy? - Geographic Information S...

 

I'm interested in your script (and why it doesn't work for SDEs). Could you post it, please?


Have a great day!
Johannes
PhilLampe
New Contributor III

Johannes,

Thanks again for helping me with this. Code is below.

Please keep in mind that this was not written by me (I'm not trying to pass blame or anything, just being honest).  I imagine you may find one (or many) flaw(s) but that doesn't mean I don't want to learn or improve my python skills, just want to let you know where I'm coming from.

And just to be clear, this current script works fine as intended.  That is to say, when done I have an excel file with the output from the GDB that was read by the script.  It just can't read from the Oracle enterprise database, hence the reason we are here now.

In sum, the overall goal here is to add the additional code to this script so it can connect then iterate from the Oracle enterprise database and generate a table with feature counts from the feature classes from within the datasets.

Looking forward to your reply.

Phil

import arcpy
import time
import datetime
import os
import shutil
import Tkinter, tkFileDialog

initdir = "c:\\geobase\\"
print "Please select which file you want run the report on:"
#wait = os.system("pause")
root = Tkinter.Tk()
root.withdraw()

file_path = tkFileDialog.askdirectory(initialdir=initdir)
t=1
while t < 5 and file_path.split("/")[-1][-4:] != ".gdb":
    if t != 4:
        print "No file selected or GDB not selected.  Try again."
        file_path = tkFileDialog.askdirectory(initialdir=initdir)
        t += 1
    else:
        print "No file selected or GDB not selected. Quiting Script"
        wait = os.system("pause")
        quit()

print "Generating Report from file: " + file_path

file_name = file_path.split("/")[-1]

def f_time_hr (seconds):
    if seconds < 60:
        output = str(round(seconds)) + " seconds."
        return output
    if seconds > 60 and seconds < 3600:
        minutes = seconds/60
        output = str(round(minutes)) + " minutes."
        return output
    if seconds > 3600:
        hours= seconds/3600
        minutes=int(round(hours-int(hours),2)*60)
        hours=int(hours)
        output = str(hours) + "hrs " + str(minutes) + " minutes"
        return output


arcpy.env.workspace = file_path

#  the following table has to be created first before script can run
#   see the attribute fields listed at the bottom of the script,
RT_Path = "in_memory"
RT_Name = "GIS_PR_Attributes"
Results_Table = "in_memory\\GIS_PR_Attributes"
Results_Table_Fields = ["Dataset", "Entity", "Combine", "FeatureCount","ISSOLDPROPERTY", "PRUID","FacilityNumber","FacilityID","BuildingNumber","StructureNumber","CategoryCode","True_PR_Asset","GEO_Comment"]

if arcpy.Exists(Results_Table):
    arcpy.Delete_management(Results_Table)

Results_Table_Fields = [["Dataset", "TEXT" ,40], ["Entity", "TEXT", 40], ["Combine", "TEXT", 1], ["FeatureCount","DOUBLE"],["ISSOLDPROPERTY", "TEXT", 1], ["PRUID", "TEXT", 1],["FacilityNumber", "TEXT", 1],["FacilityID", "TEXT", 1],["BuildingNumber", "TEXT", 1],["StructureNumber", "TEXT", 1],["CategoryCode", "TEXT", 1],["True_PR_Asset", "TEXT", 1],["GEO_Comment", "TEXT", 255]]
RT_Table = arcpy.CreateTable_management(RT_Path, RT_Name)
for fld in Results_Table_Fields:
    if fld[1] == "TEXT":
        arcpy.AddField_management(RT_Table, fld[0], "TEXT", field_length=fld[2])
    else:
        arcpy.AddField_management(RT_Table, fld[0], fld[1])

datasets = arcpy.ListDatasets("*", "Feature")
datasets = [''] + datasets if datasets is not None else []

if datasets == [''] or datasets == []:
    print "The file you selected does not seem to contain any data.  Please try again."
    wait = os.system("pause")
    quit()

print "Counting Feature Classes in GDB"
n=0
for dataset in datasets:
    for entity in arcpy.ListFeatureClasses(feature_dataset=dataset):
        n=n+1
        output = "\rCounting: " + str(n)
        sys.stdout.write(output)
        sys.stdout.flush()

print "\nThere are " + str(n) + " feature classes"

print "Starting to look through Feature Classes"
avg_time_taken = 10

#  the following are the attribute fields in the GIS database table
#  the first two need to be text/string 50 char and the next should be text/string 1 char
r=n
x=1
fieldsdict = {}
start_time = time.time()
for dataset in datasets:
    fieldsdict[dataset] = {}
    fcs = arcpy.ListFeatureClasses(feature_dataset=dataset)
    for entity in fcs:
        remaining_time = f_time_hr((avg_time_taken * r))  
        output = "\rWorking on " + str(x) + " out of " + str(n) + "   Current: " + dataset + "\\" + entity + "\t\tETC: " + remaining_time + "\t\t\t\t"
        sys.stdout.write(output)
        sys.stdout.flush()
        fcDesc = arcpy.Describe(entity)
        fields = [f.name for f in arcpy.ListFields(entity)]
        PRUID = "Y" if "PRUID" in fields else "N"
        FACILITYNUMBER = "Y" if "FACILITYNUMBER" in fields else "N"
        FACILITYID = "Y" if "FACILITYID" in fields else "N"
        BUILDINGNUMBER = "Y" if "BUILDINGNUMBER" in fields else "N"
        STRUCTURENUMBER = "Y" if "STRUCTURENUMBER" in fields else "N"
        CATEGORYCODE = "Y" if "CATEGORYCODE" in fields else "N"
        ISSOLDPROPERTY = "Y" if "ISSOLDPROPERTY" in fields else "N"
        fccount = 0
        with arcpy.da.SearchCursor(entity, "OBJECTID") as FeatureClass:
            for fcRow in FeatureClass:
                fccount = fccount + 1
        FeatureCount = fccount
        fieldsdict[dataset][entity] = [FeatureCount, PRUID, FACILITYNUMBER, FACILITYID, BUILDINGNUMBER, STRUCTURENUMBER, CATEGORYCODE, ISSOLDPROPERTY] 
        time_taken = (time.time() - start_time)
        avg_time_taken = (time_taken/x)
        r=r-1
        x=x+1

cursor = arcpy.da.InsertCursor(Results_Table, ["Dataset", "Entity", "FeatureCount", "PRUID", "FACILITYNUMBER","FACILITYID", "BUILDINGNUMBER","STRUCTURENUMBER","CATEGORYCODE","ISSOLDPROPERTY"])
for dataset in fieldsdict:
    for entity in fieldsdict[dataset]:
        cursor.insertRow([dataset, entity] + fieldsdict[dataset][entity])

del cursor

date = datetime.datetime.today()
datet = date.strftime('%Y-%m-%d-%H%M')

fType = ".xls"
excel_file = tkFileDialog.asksaveasfilename(defaultextension=fType,initialdir=initdir, filetypes = [('Excel', '*' + fType)])
x=1
while os.path.exists(excel_file):
    print "File already exist, please select another name"
    excel_file = tkFileDialog.asksaveasfilename(defaultextension=fType,initialdir=initdir, filetypes = [('Excel', '*' + fType)]) 

arcpy.TableToExcel_conversion(Results_Table,excel_file)


print "\nDone"
wait = os.system("pause")
Tags (3)
0 Kudos
VinceAngelo
Esri Esteemed Contributor

There's a lot of fluff unrelated to the problem in this script, but there doesn't seem to be any reason you couldn't tear off the UI prompt that is hard-coded to picking  a directory (.gdb folder) and specify a connection file path instead (.sde)

You certainly don't need to copy the data to a file geodatabase to make this work. It might even just be changes to the 'askdirectory' UI calls.

- V

0 Kudos
PhilLampe
New Contributor III

How would you hard code it?  Can you please include the snippet of code that would replace the UI prompt section with hard code to the database directly?

I found this path to the Oracle instance in ArcCatalog (see JPG) which is C:\Users\Phil.Lampe\AppData\Roaming\ESRI\Desktop10.6\ArcCatalog\Lampe_EDITOR_40.sde

-Phil

 

0 Kudos
VinceAngelo
Esri Esteemed Contributor

I haven't done any Tk with Python.  I use Python toolboxes, or just plain scripts.

The key is changing the arcpy.env.workspace to use the .sde file instead of file_path (or overwrite file_path with the connection file string).

- V

0 Kudos
PhilLampe
New Contributor III

Hi Johannes - did you have a chance to look at the code I posted?  As for the XML suggestion, 1) I don't have ArcPro and 2) I've heard that XML works ok for small GDBs but can get unwieldy with larger GDBs so I won't be pursuing that method.

Thanks for your time looking into this!

-Phil

0 Kudos
PhilLampe
New Contributor III

Hi Johannes - I wanted to let you know that after hard coding the SDE source and destinations for the GDB copy the above script ran perfectly.  Took about an hour and 30 minutes.  I ended up using the modified 2nd version.  So am going to accept this as the solution.  I actually tried this the same day you shared it, but couldn't get it to work because I was using the wrong source.

-Phil

0 Kudos
VinceAngelo
Esri Esteemed Contributor

Making a copy of the enterprise geodatabase to FGDB is the long way around the lake to get an inventory of the contents. And any problem which prevents access to the enterprise geodatabase to make an inventory would prevent data copy as well.  This sure seems to be an XY Problem (a question asking about a flawed solution instead of the root problem).

Since the script should be under ten lines, you really ought to post it.

From a terminology standpoint, it might help to realize there is no longer any such thing as an "SDE database" (and hasn't been since 8.0 was released). There are Oracle, SQL Server, PostgreSQL, Informix and DB2 enterprise databases, but no SDE. Each of the supported RDBMSes are accessed in the same way, with ListDatasets, ListFeatureClasses and ListTables, and GetCount.

- V

 

PhilLampe
New Contributor III

Vince,

Thanks for your assistance with this. Code is above in Johannes' reply.  And thanks for letting me know the proper way to post code.

And YES, you might be on to something with your XY Problem comment.  Perhaps when you read what I wrote above to Johannes it will shed some more light and may further support your theory that I'm going about this in the wrong way.  In which case... I'm all ears!

-Phil

0 Kudos