Now that geopackages using sqlite has been publicised on the front cover of ArcNews I am wondering how you are getting on testing out this new format.
Of course it will be more fully supported in 10.3, but you can try out a lot of features at 10.2 already. I wanted to test out performance on medium datasets which I count as between 1 million and 10 million records. This size performs very badly with current geoprocessing tools and larger tables can be broken down and partitioned. I note that the cartographic team considers 50,000 records to be the basic target for partitioning using the very useful grid tool to split a large set of points into equal counts. Sqlite is the base database engine, which has been extended for spatial columns in spatialite and then conventions defined in the OGC geopackage to use spatialite for data exchange, vendor extensions, projections and so on. It is expected to replace shapefiles for data exchange.
Here is what I have tried:
I haven't tried yet:
The good parts
Sqlite has a very wide, but hidden, support base because it is used in every mobile phone. So you are already using it! It is being used by ArcGIS Mobile apps to manage disconnected editing of a geodatabase.
Writing python code to manipulate sqlite databases is compact and powerful using SQL expressions.
Loading large CSV tables into a database is blinding fast compared to alternatives. I can load a set of tables totalling 50 million records and 4.5GB in 18 minutes instead of 9 hours.
It is easy to exchange tables between sqlite and filegeodatabases in ArcCatalog.
Complex SQL expressions with multiple joins across multiple databases are very very fast and don't need indexes. I can extract 1 million records from a series of tables containing 18 million records with 3 joins and two conditional tests in less than 100 seconds without indexes.
Since it is entirely open and already well supported it is a ready-made exchange format with direct access for editing (although it is read-only in ArcGIS at 10.2 you can use python to edit.)
The missing parts
There are quite a few tricks to learn to sidestep awkward features in sqlite.
While the text datatype is unlimited in width it is essential to define a width when creating a table if it is going to translate to a filegeodatabase.
The width is ignored by spatialite, but is kept as metadata and used to translate to other formats where it is required. If you don't define a width then an index in filegeodatabases will fail because the width will default to some huge value 18,000 chars.
I am still getting used to the internal rowid and primary keys. Do not define your primary key as a primary key or it will disappear into the OID. Instead define a key to be primary as an autoincrement so that it will go into the OID and you can keep your static primary key to edit as you wish.
I have taken the advice in the O'Reilly book 'Using Sqlite' not to use the SQL construct to create a table from a select statement because the field widths are not set. Instead make a template from the source table and use INSERT INTO.
Performance using execute functions in python always fails for large tables.The same sql works just fine from a command line or development tool. I do not know why or how to work around this. This makes it hard to automate queries on large tables.
Adding geometry
You can add geometry by extending sqlite with spatial extensions that are compatible with spatialite 4.+ using a hidden Arcpy function.
The extension is .gpkg but .sqlite also works thank goodness, my fingers cannot spell gpkg.
#------------------------------------------------------------------------------- # Name: createSqliteDB.py # Purpose: now use gpkg as extension # # Author: kimo # # Created: 19/09/2014 29/10/2014 # Copyright: (c) kimo 2014 # Licence: Createive Commons 3.0 NZ #------------------------------------------------------------------------------- import arcpy import sys import os try: path = sys.argv[1] db = sys.argv[2] geom = sys.argv[3] except: path = 'e:/crs/currentgdb' db = 'example.gpkg' geom = 'ST_GEOMETRY' arcpy.AddMessage(path + " " + db) # geomOption = ['ST_GEOMETRY','SPATIALITE'] sqlite_database_path = '{}/{}'.format(path,db) if not arcpy.Exists(sqlite_database_path) : arcpy.gp.CreateSQliteDatabase(sqlite_database_path,geom) else: arcpy.AddWarning("db already exists") print "db already exists"
This is great! Because now you can use the sqlite database as if it was a filegeodatabase. The normal da.InsertCursor works just fine to insert new records including a geometry column. Unfortunately there is a bug in the da.UpdateCursor that crashes Python if you try to insert a geometry column (10.3 Prerelease). But it looks very promising. Note that there is a shortcut to creating a geometry directly from a WKT string using a token on the da.InsertCursor. It also took me a while to realise that I cannot just call the geometry column 'SHAPE' as reported in Listfields, I have to use a token with an @ in it. No need to fiddle about, and it is fast.
#------------------------------------------------------------------------------- # Name: loadfc_gpkg_demo.py # Purpose: demonstrate loading a CSV fc into spatialite # # Author: kimo # # Created: 29/10/2014 # Copyright: (c) kimo 2014 # Licence: Ollivier & Company #------------------------------------------------------------------------------- # changed shapefield token to SHAPE@WKT for da.InsertCursor 24 Oct 2014 import arcpy import sys import os import csv import datetime import sqlite3 def update_existing(rcl): ''' use the same changeset to update itself to test out the UpdateCursor on a geopackage read in entire changeset first and build row lists before opening cursors to limit the number of records much simpler to delete updates and re-insert''' desc = arcpy.Describe(rcl) print "target fc:", rcl, desc.shapeType, desc.spatialReference.name, desc.featureType # create a buffer list without objectid and shape_length and rename shape field to a token buf = [f.name.upper() for f in desc.fields] print "Desc fields in target:",buf buf.remove('OBJECTID') # buf.remove('SHAPE_LENGTH') there isn't one? buf[buf.index("SHAPE")] = 'SHAPE@WKT' # shape field in cursor name MUST be one of special tokens so use SHAPE@WKT dBuf = {f.upper():buf.index(f) for f in buf} print "dBuf:", dBuf # delete update or insert records into the featureclass # just testing change = 'UPDATE' or 'DELETE' debug = True # splits changeset into action arrays ready to go delete_rows = [] insert_rows = [] update_rows = [] with open(csv_file,'r') as f: reader = csv.reader(f) n = 0 for row in reader: if reader.line_num == 1: print "header:", len(row),row rowCSV = [r.upper().replace('__','') for r in row] # get rid of double underscores in __change__ print "CSV header:",rowCSV rowCSV[rowCSV.index("SHAPE")] = 'SHAPE@WKT' dCSV = {r.upper():rowCSV.index(r) for r in rowCSV} print "dCSV:", dCSV for f in dBuf.keys(): try: print "Field: {} Buf {}, CSV {} ".format(f, dBuf, dCSV ) except: print "Field name dictionaries do not match", f sys.exit(1) print print '-------------------- BEGIN --------------------------' else: n+=1 # fieldmapping by name dictionaries # initialise a new buffer feat = [None] * len(buf) for f in buf: # iterate over field names # just poke shape straight into buffer field as a WKT string! if row[dCSV ]: # populate if not an empty data element, otherwise leave as null if f == 'FID': feat[dBuf ] = int(row[dCSV ].split('.')[-1]) # print f,feat[dBuf ] else: feat[dBuf ] = row[dCSV ] if row[dCSV['CHANGE']] == 'DELETE': delete_rows = delete_rows + [row[dCSV['ID']],feat] if row[dCSV['CHANGE']] == 'UPDATE': update_rows = update_rows + [row[dCSV['ID']],feat] if row[dCSV['CHANGE']] == 'INSERT': insert_rows = insert_rows + [row[dCSV['ID']],feat] print "{} deletes {} updates {} inserts".format(len(delete_rows),len(update_rows),len(insert_rows)) # apply deletes sql_query = "ID in " + str(tuple(row[0] for row in delete_rows)) print sql_query ## with arcpy.da.UpdateCursor(rcl, '*', sql_query) as cur: # shape called SHAPE@ or SHAPE@WKT ## for row in cur: ## cur.deleteRow(row) deltaTime = datetime.datetime.now() - start print n,"Well Done", deltaTime return def create_new(rcl): ''' create a new featureclass from scratch''' # arcpy.management.CreateFeatureclass(ws, rcl, template='rcl_changeset_template', spatial_reference=sr) arcpy.management.CreateFeatureclass(ws, rcl, 'POLYLINE', spatial_reference=sr) arcpy.management.AddField(rcl, 'FID', 'LONG') # actually not a long but a complex string arcpy.management.AddField(rcl, 'CHANGE', 'TEXT', field_length=16) arcpy.management.AddField(rcl, 'ID', 'LONG') arcpy.management.AddField(rcl, 'ALT_ID', 'LONG') arcpy.management.AddField(rcl, 'STATUS', 'TEXT', field_length=4) arcpy.management.AddField(rcl, 'NON_CADASTRAL_RD', 'TEXT', field_length=1) arcpy.management.AddField(rcl, 'AUDIT_ID', 'LONG') arcpy.management.AddField(rcl, 'SE_ROW_ID', 'LONG') desc = arcpy.Describe(rcl) print "target fc:", rcl, desc.shapeType, desc.spatialReference.name, desc.featureType # create a buffer list without objectid and shape_length and rename shape field to a token buf = [f.name.upper() for f in desc.fields] print "Desc fields in target:",buf buf.remove('OBJECTID') # buf.remove('SHAPE_LENGTH') buf[buf.index("SHAPE")] = 'SHAPE@WKT' # shape field in cursor name MUST be one of special tokens so use SHAPE@WKT dBuf = {f.upper():buf.index(f) for f in buf} print "dBuf:", dBuf # insert records into the featureclass # maybe for speed we could use arcpy.da.Editor for bulk commits? debug = False icur = arcpy.da.InsertCursor(rcl, buf) # shape called SHAPE@ or SHAPE@WKT with open(csv_file,'r') as f: reader = csv.reader(f) n = 0 for row in reader: if reader.line_num == 1: print "header:", len(row),row rowCSV = [r.upper().replace('__','') for r in row] # get rid of double underscores in __change__ print "CSV header:",rowCSV rowCSV[rowCSV.index("SHAPE")] = 'SHAPE@WKT' dCSV = {r.upper():rowCSV.index(r) for r in rowCSV} print "dCSV:", dCSV for f in dBuf.keys(): try: print "Field: {} Buf {}, CSV {} ".format(f, dBuf , dCSV ) except: print "Field name dictionaries do not match", f sys.exit(1) print print '-------------------- BEGIN --------------------------' else: n+=1 # fieldmapping by name dictionaries # initialise a new buffer feat = [None] * len(buf) for f in buf: # iterate over field names # just poke shape straight into buffer field as a WKT string! if row[dCSV ]: # populate if not an empty data element, otherwise leave as null if f == 'FID': feat[dBuf ] = int(row[dCSV ].split('.')[-1]) # print f,feat[dBuf ] else: feat[dBuf ] = row[dCSV ] icur.insertRow(feat) del icur deltaTime = datetime.datetime.now() - start print n,"Well Done", deltaTime return # ------------------------------------ main --------------------------- try: csv_file = sys.argv[1] except: csv_file = 'e:/crs/source/future/script/layer-2023-changeset.csv' start = datetime.datetime.now() # rcl changeset if not arcpy.Exists('e:/crs/source/future/future.gpkg'): arcpy.gp.CreateSQLiteDatabase('e:/crs/source/future/future.gpkg') ws = 'e:/crs/source/future/future.gpkg' arcpy.env.workspace = ws arcpy.env.overwriteOutput = True rcl = 'rcl_changeset' sr = arcpy.SpatialReference(4167) # 4167 NZGD_2000 # sample is DD not NZTM so cannot use template NZTM projection if arcpy.Exists(rcl): update_existing(rcl) else: create_new(rcl) #arcpy.management.Delete(rcl)
So what to do about the failed UpdateCursor if I want to do some editing?
A search of the options for open source python tools suggests a module called pyspatialite will provide an extension to the sqlite3 python module. This works well on Linux because it is built-in to tools such as OSGeo4W and QGIS. There is no stand-alone windows executable I could find for pyspatialite to install in Python27/ArcGIS10.0. However there is a tricky way you can do this without trying to recompile a package from source on Windows.
Test your new module with a script from SpatiaLite / Python
Out of date sqlite3 in Python 2.7
This is a major problem that can be easily fixed by yourself as a developer, but a bit harder to distribute.
I found excellent performance for stand-alone SQL commands in sqlite but they did not finish as a python command.
It turns out that sqlite has been significantly upgraded to fix these performance issues and is now running at version 3.8.5.
To upgrade the python imbedded version just replace the sqlite3.dll that resides in c:/python27/ArcGIS10.2/DLLs with the latest. When you query the version in python you can get two answers, the one from the sqlite interface installed and one from sqlite3 itself.
>>> import sqlite3
>>> sqlite3.version_info
(2, 6, 0)
>>> sqlite3.version
'2.6.0'
>>> sqlite3.sqlite_version
'3.8.6'
>>>
Do the same for the 64bit version if you have installed that.
Also to upgrade a sqlite database to a geopackage you can do this from a command line:
sqlite3 my.sqlite
SELECT load_extension('C:\ArcGIS\Desktop10.2\DatabaseSupport\SQLite\Windows32\stgeometry_sqlite.dll','SDE_SQL_funcs_init');
SELECT CreateGpkgTables();
.quit