POST
|
Hi, please log a support call, apologies for the delay.
... View more
04-02-2024
07:04 AM
|
0
|
0
|
268
|
IDEA
|
Related, an old sample: https://pm.maps.arcgis.com/home/item.html?id=9398bd2232cb4c8490b0b05015364d28
... View more
04-01-2024
09:20 AM
|
0
|
0
|
336
|
POST
|
For followers: Product team for Data Interop have recommended a test embedding the service account owner credentials into the SQL Server reader.
... View more
02-23-2024
01:07 PM
|
0
|
1
|
674
|
BLOG
|
Hello Annarita, since you mention a "workbench" I'll not wait for Jill to talk about replication and collaboration scenarios. You can do bi-directional synchronization between ArcGIS Online and enterprise geodatabase with Data Interoperability, but you will need to decide which system is parent and child when the same feature is edited in both systems. Maybe one system is where features are created and deleted but features can be edited in either, you make the rules. If you want to prefer one system's edits then run a workspace that commits its' edits to the child first, then a second workspace to fetch edits from the child. If you do not have a shared key field then you can only generate inserts and deletes, not updates. I'm thinking here of using the ChangeDetector transformer to generate the deltas.
... View more
02-20-2024
07:33 AM
|
1
|
0
|
1302
|
POST
|
Hello Jeremy, apologies for the delay. Quick Export doesn't know about definition queries. In the attached is a Pro 3.2 toolbox with a spatial ETL tool in it that exports your query features to mbtiles, zoom levels 4-8.
... View more
02-16-2024
10:53 AM
|
0
|
0
|
834
|
BLOG
|
@ShareUser I suppose simple datasets in object stores could be used that way, but on the vector data front I'm personally interested in how AI could be used to generate SQL that gives you the view of GeoParquet that you want to consume, in a smart enough client.
... View more
02-14-2024
06:29 AM
|
1
|
0
|
1340
|
BLOG
|
@ShareUser Now that is exactly the right question! GeoParquet certainly has some very attractive properties, such as a rich data type set (including arrays and structs so you can have an implied data model in one table), queryable metadata, support in the S3 API across the major object stores etc. This makes it a candidate to replace the venerable shapefile, if the user community chooses.
... View more
02-13-2024
11:18 AM
|
0
|
0
|
1397
|
BLOG
|
@ShareUser A feature service with behaviors like relationship classes is easily digestible in ArcGIS Pro for mapping and analysis, a remote hive of GeoParquet files (much) less so, at least for a Pro user.
... View more
02-13-2024
11:07 AM
|
0
|
0
|
1408
|
POST
|
Hi Matt There are two properties in play, feature operation and table handling. If you are always replacing the table data entirely then insert feature operation and drop and create table handling is simple. This will also create the table if it doesn't exist. If the table has relationship classes though, dropping the table will also drop the relates (which Data Interoperability cannot recreate), in which case you need to insert features with a truncate existing table handling. Most elegant is to also read the existing state of the target table, use a ChangeDetector transformer to figure out the delta, then use fme_db_operation feature operation to do insert, update and delete operations as required. Be aware though that ChangeDetector is brutally strict, things like date precision matter if you include datetimes in the compare.
... View more
02-13-2024
08:21 AM
|
1
|
0
|
554
|
BLOG
|
In an earlier post I got my feet wet using DuckDB in a hosted notebook in ArcGIS Online. That post highlighted reading GeoParquet files in an object store to maintain a feature service. That's a powerful workflow but it is much more likely your daily work involves wrangling common file formats which take up way too much time to ingest - like CSV, Excel and JSON. This post is about making your life easier by showing how DuckDB enables SQL for any file format and also supports simple conversion to Esri formats. If you have basic SQL and Python skills - as in you can read this sample and the web help - you will be fine. Here is some live action, Bloomington Indiana open 311 incidents migrated to a feature class in seconds. Bloomington 311 Incidents The notebook (in the blog download) reads a CSV file at a public URL and writes a feature class into the project default geodatabase. Notebooks are a good place to start compared to a script tool because you can work in a cell interactively as you figure out your SQL. Once things are working you might copy the code to a script tool for easy automation on a schedule. But first you need to install DuckDB! The usual route to adding a Python package to your environment is to clone the arcgispro-py3 environment then add a package in the Package Manager UI backstage in Pro. You would look for a package named python-duckdb. This errored for me (it happens) so I did a manual install. I cloned my default environment to one I named arcgispro-py3-quack and installed DuckDB by opening the Python Command Prompt from the ArcGIS program group then running this command ('username' will be yours): conda install -c conda-forge python-duckdb=1.0.0 -p "C:\Users\username\AppData\Local\ESRI\conda\envs\arcgispro-py3-quack" --yes Make sure to get the latest DuckDB distribution, at writing it is 1.0.0. Let's walk through each notebook cell. The first one is pretty simple, the imports and some calls to duckdb to set up the environment. # Imports, environment
import arcpy
import duckdb
from arcgis.features import GeoAccessor, GeoSeriesAccessor
import os
import requests
from urllib.parse import urlparse
conn = duckdb.connect()
conn.sql("install spatial;load spatial;")
conn.sql("install httpfs;load httpfs;")
conn.sql("set s3_region='us-west-2';")
conn.sql("set enable_object_cache=true;")
arcpy.env.overwriteOutput = True The second cell makes a pandas dataframe by reading the CSV data while automatically inferring data types - such a big help. You'll notice I code defensively for the situation the server I'm hitting doesn't support HTTP range request access like DuckDB wants - I download the data - which in this case was necessary. # Read data into duckdb
url = r'https://data.bloomington.in.gov/resource/aw6y-t4ix.csv?$limit=200000'
filename = os.path.basename(urlparse(url).path)
base, extension = os.path.splitext(filename)
try:
sql = "create or replace view bloomington311_view as select * from read_csv_auto('{}');".format(url)
conn.sql(sql)
except:
response = requests.get(url)
with open(filename, 'w', encoding="utf-8") as f:
f.write(response.text)
f.close()
sql = "create or replace view bloomington311_view as select * from read_csv_auto('{}');".format(filename)
conn.sql(sql)
# Build the query you want in SQL to rename, cast or otherwise process the data.
# This dataset has a column in WKT already, if your data does not then you'll need
# to make one using the ST_ functions in DuckDB.
sql = """select service_request_id, requested_datetime, updated_datetime, closed_date, status_description, source,
service_name, description, agency_responsible, address, city, state, try_cast (zip as varchar(10)) as zip, sladays,
request_complete_days, sla_diff_days,
geocoded_column as SHAPE from bloomington311_view where SHAPE is not null;"""
df = conn.sql(sql).df() Note the point about using or creating WKT values for your geometry. The spatial extension for DuckDB has a rich set of spatial functions. Now convert the dataframe to a feature class. You have other conversion options. Note that the arcgis Python API has no idea where the dataframe came from - DuckDB will do fine! - so I spatially enable the dataframe and then write to a geodatabase feature class. No wrangling of attribute field properties, they just work (see the SQL above, the classic situation of casting ZIP codes to text came up). # Write the feature class
df.spatial.set_geometry("SHAPE",sr=4326,inplace=True)
aprx = arcpy.mp.ArcGISProject("CURRENT")
gdb = aprx.defaultGeodatabase
out_fc = arcpy.ValidateTableName(base,gdb)
location = os.path.join(gdb,out_fc)
df.spatial.to_featureclass(location) Lastly, announcement! print("Created feature class {}".format(location)) That's it, fast, simple, smart migration of data with some help from DuckDB! The notebook is in the blog download, let us know how you get on.
... View more
01-23-2024
01:34 PM
|
8
|
0
|
1237
|
BLOG
|
Updated 7/25/2024 to support Overture release 2024-07-22.0 Breaking changes include place category "main" renamed to "primary" and update_time removed from the base schema. Refer to the notebook in the download for code changes. This blog may not be maintained. Every now and then a compelling workflow is enabled by new ideas, and data distribution by cloud native formats is my topic today. Here is my example data: Places of interest in London, England What you are looking at is a hosted group layer in ArcGIS Online with 309,402 place of interest points in London, England, with related tables; 309,402 address details, 18,383 brand names, 239,309 website URLs, 417,720 source attributions, 207,114 social media link URLs, and 281,556 phone numbers. Here is a chocolatier in Covent Garden: Chocolatier in Covent Garden This being England, you might expect there are tea rooms about the city. Using place category and related tables we can see not just their locations but their address, website, social media links and phone numbers: Tea Rooms in London Tea Rooms and related data This place data is one of the themes from Overture Maps Foundation and is made available under this license. If you surf the Overture website, you'll see it is a collaboration of Amazon, Meta, Microsoft and TomTom as steering members, Esri as a general member, and many other contributor members and is envisaged as a resource for "developers who build map services or use geospatial data". I'm democratizing it a bit more here, giving you a pattern for consuming the data as hosted feature layers in your ArcGIS Online or ArcGIS Enterprise portals. Let's dig into the details of how to migrate the data to feature services. The data is made available at Amazon S3 and Azure object stores as Parquet files, with anonymous access. I'll let you read up on the format details elsewhere but Parquet is definitely one star of the cloud native show because it is optimized for querying by attribute column, and in the case of GeoParquet, this includes a spatial column (technically multiple spatial columns if you want to push your luck). As GeoParquet is an emerging format it still has some things that are TODO, like a spatial index (which would let you query by spatial operators), but Overture very thoughtfully include a bounding box property which is simple to query by X and Y. The technology that is the second star of the cloud native show is DuckDB. DuckDB enables SQL query of local or remote files like CSV, JSON and of course Parquet (and many more) as if they are local databases. Remote file query is especially powerful if the host portal supports the Amazon S3 REST API and a client that talks this can use HTTP to send SQL queries, which DuckDB can. Especially powerful is DuckDB's ability to unpack complex data types (arrays and structs) into a rich data model like I'm doing here (base features and 1:M related tables), and not just flat tables. Only the query result is returned, not the remote file. The third star of the cloud native show is ArcGIS Online hosted notebooks, which can be used to orchestrate DuckDB transactions and integrate the results into new or refreshed hosted feature layers in Online or Enterprise. The superpower of this combination of Parquet, DuckDB and Python is that global scale data can be queried for a subset of interest in a desired schema using industry standard SQL, plus automated . This forever deprecates the legacy workflow of downloading superset data files to retrieve the part you want. At writing, the places data resolves to 6 files totaling 5.54GB, not something you want to haul over the wire before you start processing! If you think about it, any file format you have to zip to move around and unzip to query (shapefile, file geodatabase) generates some friction, Parquet avoids this. The notebook named DuckDBIntegration is what I used to import the places data and is in the blog download. Notebooks aren't easy to share graphically but I'll make a few points. Firstly, ArcGIS notebooks don't include the DuckDB Python API, so it needs to be installed from PyPi, here is the code that does the import and also loads the spatial and httpfs extensions needed for DuckDB in this workflow. The DuckDB version at writing is 1.0.0. try:
import duckdb
except:
!pip install duckdb==1.0.0
import duckdb
conn = duckdb.connect()
conn.sql("install spatial;load spatial;")
conn.sql("install httpfs;load httpfs;")
conn.sql("set s3_region='us-west-2';")
conn.sql("set enable_object_cache=true;") Once DuckDB is loaded it is a matter of extracting the layers of interest. I did not model the full Overture places schema, for example omitting alternate place categories. If you browse the places schema note the YAML tab. Don't be surprised in 2025 if you see YAML powering AI to let you have conversations with your data 😉 As Overture data is of global scale, with hundreds of millions of features for the base Places layer, the extract must be limited to my area of interest. The method I adopted was to use the Division Area theme as my selection layer. This is all the world's political divisions from country to neighborhood scale - I build a selecting geometry from a SQL query on division area subtypes and names. The blog download includes a notebook DivisionArea which extracts the data to a file geodatabase feature class. After some data exploration I determined this SQL where clause would define my area of interest: country = 'GB' and subtype in ('county','locality') and names.primary in ('City Of London','London','Camden','Tower Hamlets','Islington') Then in my cell that makes my extraction polygon I use this SQL and DuckDB's spatial operators to construct my selection geometry, both as a Well Known Text polygon and XY bounds variables: sql = f"""select ST_AsText(ST_GeomFromWKB(geometry)) as wkt
from read_parquet('s3://overturemaps-us-west-2/release/{release}/theme=divisions/type=division_area*/*', filename=true, hive_partitioning=1)
where
country = 'GB' and subtype in ('county','locality') and names.primary in ('City Of London','London','Camden','Tower Hamlets','Islington');"""
wktList = [row[0] for row in conn.sql(sql).fetchall()]
wkt = wktList[0]
for w in wktList[1:]:
wkt = conn.sql(f"""select ST_AsText(ST_Union(ST_GeomFromText('{wkt}'),ST_GeomFromText('{w}')));""").fetchone()[0]
xmin,ymin,xmax,ymax = conn.sql(f"""with aoi as (select ST_GeomFromText('{wkt}') as geom)
select ST_XMin(geom),ST_YMin(geom),ST_XMax(geom),ST_YMax(geom) from aoi;""").fetchone() I then make a view of the Places theme which defines the features I want: sql = f"""create or replace view places_view as select *
from read_parquet('s3://overturemaps-us-west-2/release/{release}/theme=places/type=place*/*',filename=false, hive_partitioning=1)
where bbox.xmin >= {xmin}
and bbox.ymin >= {ymin}
and bbox.xmax <= {xmax}
and bbox.ymax <= {ymax}
and ST_Intersects(ST_GeomFromText('{wkt}'), ST_GeomFromWKB(geometry));"""
conn.sql(sql) The Places data schema models the expansion tables as arrays of structs, so that each place may have a 1:M relationship to its expansion properties. If this terminology is new to you, it is equivalent to Python lists of dictionaries, but with the property that the dictionaries are guaranteed to have the same key names in each row. Now I have my Places view it is a matter of throwing SQL statements at it to create relations (cursors) to use with ArcPy insert cursors to write each layer. Now, if you inspect the ExtractDivisions notebook you'll see it is possible to use a built-in file geodatabase driver in DuckDB to dump data out quickly and easily. However this doesn't control the schema very well, so I take a longer winded approach to creating the output data, relying on ArcPy to create my output. Here is the code which writes the Places base layer: arcpy.env.workspace = arcpy.management.CreateFileGDB(out_folder_path=out_folder_path,
out_name=out_name,
out_version="CURRENT").getOutput(0)
arcPlaces = arcpy.management.CreateFeatureclass(out_path=arcpy.env.workspace,
out_name="Places",
geometry_type="POINT",
has_m="DISABLED",
has_z="DISABLED",
spatial_reference=sR).getOutput(0)
arcpy.management.AddField(in_table=arcPlaces,field_name="id",field_type="TEXT",field_length=32)
arcpy.management.AddField(in_table=arcPlaces,field_name="name_primary",field_type="TEXT",field_length=1000)
arcpy.management.AddField(in_table=arcPlaces,field_name="category_main",field_type="TEXT",field_length=100)
arcpy.management.AddField(in_table=arcPlaces,field_name="confidence",field_type="FLOAT")
arcpy.management.AddField(in_table=arcPlaces,field_name="update_time",field_type="DATE")
arcpy.management.AddField(in_table=arcPlaces,field_name="version",field_type="SHORT")
sql = f"""select id,
names.primary as name_primary,
categories.main as category_main,
confidence,
try_cast(update_time as datetime) as update_time,
version,
geometry
from places_view;"""
duckPlaces = conn.sql(sql)
with arcpy.da.InsertCursor(arcPlaces,["id","name_primary","category_main","confidence","update_time","version","shape@"]) as iCursor:
row = duckPlaces.fetchone()
i = 1
if row:
while row:
if i % 10000 == 0:
print('Inserted {} Places rows at {}'.format(str(i),getNow()))
row = list(row)
row[-1] = arcpy.FromWKB(row[-1])
iCursor.insertRow(row)
i+=1
row = duckPlaces.fetchone()
del iCursor So, the above creates my point layer, then it is a matter of extracting each expansion table, which all follow the same pattern. Here is the code that extracts addresses from the Places view: sql = """with address as (select id, unnest(addresses, recursive := true) from places_view where addresses is not null)
select id, freeform, locality, region, postcode, country from address;"""
duckAddresses = conn.sql(sql)
arcAddresses = arcpy.management.CreateTable(out_path=arcpy.env.workspace,out_name="Addresses").getOutput(0)
arcpy.management.AddField(in_table=arcAddresses,field_name="id",field_type="TEXT",field_length=32)
arcpy.management.AddField(in_table=arcAddresses,field_name="freeform",field_type="TEXT",field_length=300)
arcpy.management.AddField(in_table=arcAddresses,field_name="locality",field_type="TEXT",field_length=100)
arcpy.management.AddField(in_table=arcAddresses,field_name="region",field_type="TEXT",field_length=50)
arcpy.management.AddField(in_table=arcAddresses,field_name="postcode",field_type="TEXT",field_length=100)
arcpy.management.AddField(in_table=arcAddresses,field_name="country",field_type="TEXT",field_length=2)
with arcpy.da.InsertCursor(arcAddresses,["id","freeform","locality","region","postcode","country"]) as iCursor:
row = duckAddresses.fetchone()
i = 1
if row:
while row:
if i % 10000 == 0:
print('Inserted {} Addresses rows at {}'.format(str(i),getNow()))
iCursor.insertRow(row)
i+=1
row = duckAddresses.fetchone()
del iCursor Note the fancy "unnest" operator in the SQL. This expands the array of address structs into new rows for each address, inheriting the id field for each expanded row. Once all expansion tables are extracted I make the relationship classes from Places points to each expansion table: for destination in ["Addresses","Brands","Websites","Sources","Socials","Phones"]:
arcpy.management.CreateRelationshipClass(origin_table="Places",
destination_table=destination,
out_relationship_class="Places_{}".format(destination),
relationship_type="SIMPLE",
forward_label=destination,
backward_label="Places",
message_direction="NONE",
cardinality="ONE_TO_MANY",
attributed="NONE",
origin_primary_key="id",
origin_foreign_key="id",
destination_primary_key="",
destination_foreign_key="") Then attach some metadata to each layer as a good practice, some counts, times and license details: current_time_utc = datetime.now(pytz.utc)
pst = pytz.timezone('US/Pacific')
current_time_pst = current_time_utc.astimezone(pst)
current_time_pst_formatted = current_time_pst.strftime('%Y-%m-%d %H:%M:%S')
objDict = {obj:0 for obj in ["Places","Addresses","Brands","Phones","Socials","Sources","Websites"]}
for k in objDict.keys():
objDict[k] = int(arcpy.management.GetCount(k).getOutput(0))
for obj in objDict.keys():
objCount = objDict[obj]
new_md = arcpy.metadata.Metadata()
new_md.title = f"Overture Maps Foundation {obj} in London, release {release}."
new_md.tags = f"{obj},London,Overture"
new_md.summary = f"For mapping and analysis of {obj} in ArcGIS."
new_md.description = f"{objCount} {obj} features in London, Great Britain."
new_md.credits = f"Esri, {current_time_pst_formatted} Pacific."
new_md.accessConstraints = f"https://cdla.dev/permissive-2-0/"
tgt_item_md = arcpy.metadata.Metadata(obj)
if not tgt_item_md.isReadOnly:
tgt_item_md.copy(new_md)
tgt_item_md.save()
else:
print(f"{obj} metadata is read only.") I'll let you inspect the rest of the DuckDBIntegration notebook code yourselves, but all that is left is to zip up the output geodatabase and use it to overwrite the feature service being maintained. The most important point about the feature service is it must have been created by sharing a file geodatabase item, not shared from Pro. Making the initial file geodatabase item is just a matter of manually running the notebook down to the cell that does the zipping, then downloading the zip file manually and using it to create the item. To go into production with this approach first figure out a query on extent or division area fields that works for you, then plug it into the notebook. You'll need to supply your own credentials of course, and change target gis if you're using Enterprise for the output feature service. Then at each Overture release, change the release variable to match Overtute and refresh your service. To give you a feel for performance, look at the download notebook, you'll see it takes about 6 1/2 minutes to run - amazingly fast for the data scale. I am though co-located with the S3 region being accessed. Naturally, if the data schema changes or to support other Overture themes, you'll need to author notebook changes or new notebooks. Do share! I'm sure you'll agree that this is a powerful new paradigm that will change the industry. I'm just following industry trends. It will be fun to see if Parquet is what comes after the shapefile for data sharing. The blog download has the notebook but not a file geodatabase (it's too big for the blog technology) but when you generate your own services don't forget the data license here. Have fun!
... View more
01-19-2024
11:27 AM
|
10
|
7
|
2524
|
IDEA
|
Hello Donald Data Interoperability extension for Pro can write TopoJSON. Regards
... View more
12-15-2023
06:19 AM
|
0
|
0
|
754
|
POST
|
Hello Ashley It seems you have struck an issue we haven't found a cause for: https://pro.arcgis.com/en/pro-app/latest/tool-reference/tool-errors-and-warnings/160001-170000/tool-errors-and-warnings-160226-160250-160236.htm If you open a support call the analyst can look at your workflow. Re. subscribing, if you subscribe to the whole Data Interoperability board you'll get everything, and thank you for asking.
... View more
12-06-2023
11:28 AM
|
0
|
0
|
606
|
BLOG
|
One of the hidden gems in ArcGIS Pro is an ability to reach out to external geographic data sources using simple scripted workflows, which you can optionally automate. Here is today's subject matter, an OGC API Features layer mirrored to a hosted feature layer in ArcGIS Online, and the notebook that does the job: Electricity Transmission Lines The secret sauce in my cooking is GDAL - the Geographic Data Abstraction Library. Esri ship GDAL in the standard conda environment in ArcGIS Pro, here is the package description: GDAL is a translator library for raster and vector geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single raster abstract data model and vector abstract data model to the calling application for all supported formats. While you can go deep with GDAL, in the way I'm using it here it's "copy-paste" ETL with a small T, I didn't do any feature level transformation. All you have to know is where your source data is, what format it is, and then plug it into the notebook. I count 84 vector formats in the supported vector drivers, so it isn't ArcGIS Data Interoperability but it's a useful subset. Datasets can be files or at HTTP, FTP or even S3 URLs. My example uses OGC API Features, but this isn't important, GDAL abstracts the concept of a dataset, so you can use the same code for any format. However, on the topic of OGC API Features, the format supports the concept of a landing page, collections and items, and you can supply a URL to any of these endpoints from where you navigate to them in a browser. I imported the electricity transmission lines collection from this landing page. I'll let you walk through the notebook (in the blog download), but in summary it converts external data to file geodatabase then overwrites a feature service with the data. To create the feature service in the first place, pause at the cell that creates a zip file, upload and publish it, then use the item ID to set your target. Do not create the feature service from map layers, the system requires a file-based layer definition. Run the notebook on any frequency that suits you, or turn it into a Python script tool and schedule it. Make sure you inject the right format short name ahead of your data path. There you have it, simple maintenance of a hosted feature service sourced from external data! It isn't real data virtualization, as the data is moved, but it is an easy way to make data available to all.
... View more
11-28-2023
11:29 AM
|
3
|
0
|
986
|
Title | Kudos | Posted |
---|---|---|
4 | Tuesday | |
1 | 2 weeks ago | |
2 | 3 weeks ago | |
2 | a month ago | |
1 | 08-20-2024 08:33 AM |
Online Status |
Offline
|
Date Last Visited |
49m ago
|