|
BLOG
|
It's holiday season in much of the world so I'm giving myself some light duty with this post. I previously posted about making REST API calls in the context of handling JSON responses in ArcGIS Data Interoperability ETL tools. I'll assume you know what you're doing there now (while authoring temporarily write the JSON to file, build your JSON query with the JSONExtractor's handy picker, revert to using the response JSON instead of the file...) and encourage you to reach out to as many REST endpoints as you want - by way of example. First check out my video captured from ArcGIS Earth. I'll wager you don't know about the ArcGIS Earth Automation API - I didn't until a couple of days ago! The video shows navigating the optimal route amongst ten 311 calls taken from New York City's 311 Service Request RSS feed. I calculate the optimal route from the ArcGIS Online routing service solve endpoint. (Please excuse video quality at mid show, I'm on home network). I start from the Esri campus, fly to an extent 39km above New York, then draw my route and 10 events (in order), navigating from first to last. I take a jaunty excursion into space right after the first event visit, if Jeff Bezos and Elon Musk can launch people into space so can I. My flights are cheaper. An aside: Years ago there was a room at Esri with a projector in it that sampled web maps requested by users of ArcWeb - the precursor to ArcGIS Online - and displayed them on a globe that would rotate to the map footprint. It was mesmerizing to watch where maps were going. I remember being impressed by the projector - this was when the bulbs in those things cost $1000 or some crazy amount if the projector was home-cinema grade. I didn't get fancy with my flights by exactly following the route, I do crow-flies hops - well my crow does quite high trajectories but you get the idea. So this is really just a shout out to a couple of Esri APIs, one you can have fun with as a 'presentation layer' and one that does a real analysis you want to present. APIs are where you find them, just go looking like I did. Oh yes - there is no coding required, when I think of coding I am reminded of the projector bulbs - something you don't have to buy these days ;-). The ETL workspace that does the work is in the download, you can run it yourself if you provide credentials in the ArcGISOnlineTokenGetter. Workbench The Earth Automation API is under active development, if you really do want to use it as a presentation layer talk to your Esri representative and ask to be put in touch with the Product Manager who will be pleased to hear of your requirements.
... View more
12-20-2021
12:39 PM
|
3
|
0
|
1561
|
|
POST
|
Hi Jo, thanks for the shout out and tight lines for your retirement! It has been a please working with you.
... View more
12-17-2021
10:36 AM
|
1
|
0
|
2409
|
|
BLOG
|
This may not be your exact problem - loading 100 million features into a hosted feature service in ArcGIS Online - but the subject here is loading big data into a hosted feature service in Online, 100 million is the scale I'm talking about. Here is what 100,000,000 features look like at the density I'm dealing with. An extent about a city block across: Layer not visible Then turning layer visibility on: Layer visible The data is so big Pro 2.9 will not attempt to display it at scales smaller than 1:500. Here is the back story. One of the Esri teams I work with occasionally tackles big data destined for a hosted feature service in Online. The web being what it is, very large sharing transactions can fail due to network or other issues, at which point you have to recover the situation where it broke or re-run the entire sharing process. They wanted a process which works reliably and with a recovery mechanism for any failure. They were headed down the Python path, which is fine for Pythonistas but I'm the no-code guy in the office (let's say a recovering coder who occasionally lapses). ArcGIS Data Interoperability to the rescue! My test data is a CSV file of 143,751,910 rows. Get Count I took 100 million rows off the top just as a nice round number. The data has lat/lon values so the ETL aspect is a simple spatial enablement to make point features. Now how to send these to a point feature layer? A preliminary step was to take a sample of the data to file geodatabase and make a feature service from it, this just instantiates the target layer. Then the downstream problem is two-fold: Load the data as performantly as possible Use a methodology that supports recovery from failure Online team suggested a 'sweet spot' for loading this scale data would be using the Append endpoint for the feature service with batches of 500K records in two concurrent processes, and run it 'after hours' US time. No problem, here are my workspaces: LoadTaxis LoadTaxis creates zipped file geodatabases of sets of 500K records then hands the data path off to LoadTaxisWorker in a maximum of two processes that run asynchronously. I found the time it takes to cook the data was a good fit for how long it takes Online to eat it, a little less than 2 minutes per batch during low load hours. Watching the log file go by, there were a few instances of LoadTaxis pausing for a process slot to become available, but usually Online was ready for the next batch, so things were working as fast as my ETL could run. LoadTaxisWorker LoadTaxisWorker uploads each zipped file geodatabase to Online then calls the service Append endpoint, triggering a load from the new file geodatabase item. This starts a job in Online. A looping custom transformer checks for job completion every 5 seconds (with a maximum check count of 42). Note the HTTPCaller in the custom transformer needs a concurrency setting of 1 to work in a loop. On successful completion the file geodatabase item is deleted then a second child workspace - LoadTaxisCleanerUpper - is called to delete the input zipped file geodatabase in a separate process (file handlers lock deletion otherwise). LoadTaxisCleanerUpper You will see in LoadTaxisWorker I have a couple of Emailer transformers (disabled in the download) and the WorkspaceRunner that calls the cleanup process allows for a bump in the number of child processes so things don't hang. Now this being the web you can get failures even following best practices, but if you do, there is a paper trail built into the methodology. The file geodatabases made for upload are not deleted from disk or as Online items like they are for successful jobs - you can see any failed data: File geodatabases not deleted And here are file geodatabase items relating to failed Append jobs - not deleted either. File geodatabase items not deleted However, if you inspect the HTTPCaller that calls Append you'll see I set rollback on failure to be true so all we have to do is: Manually delete the failed file geodatabase items from Online content Manually run LoadTaxisWorker using the failed file geodatabase zipfiles and batch IDs as input parameters So after my hands-free run of the 100 million features, the tool executes without error... LoadTaxis worked fine ...but the loaded feature count is short because of the failed jobs: The target feature service didn't get all data We can see 7 failed jobs equates to the missing 3,500,000 features in the service. Lets get them there! After deleting the file geodatabase items in Online I can run each job manually like this: Run LoadTaxisWorker manually I actually prefer to run the LoadTaxisWorker tool in edit mode (i.e. in Workbench) so I can do things like turn the Emailers on to catch the action in detail. There is also this issue to watch for which I'll put in a spoiler tag: Warning: Append called by LoadTaxisWorker may take far longer than anticipated (hours not minutes) if run at a busy time for Online - Append is a shared resource and may get queued. In this case the Emailer will tell you 210 seconds (or your preferred timeout) went by without Append finishing; it will still complete eventually but the cleanup operations will not trigger. I experienced this and enabled the Emailer that sends the job URL so I could watch it manually. Anyway, after manually running your failed jobs you will have 100 million features in your service! Yay my 100 million features are loaded! How hard was that? The ETL tools are in the post download. I used ArcGIS Pro 2.9 & Data Interoperability extension.
... View more
12-15-2021
05:53 AM
|
2
|
0
|
1540
|
|
IDEA
|
Erik, Data Interoperability extension can do this for you without CSV download, you just need to build an ETL tool in Pro, no scripting required. Here is one example of accessing 'web CSV' and maintaining a feature service from the data, including creating geometry on the fly: https://community.esri.com/t5/arcgis-data-interoperability-blog/building-a-data-driven-organization-part-8/ba-p/1104900
... View more
11-30-2021
09:58 AM
|
0
|
0
|
527
|
|
BLOG
|
This post is in two parts, firstly handling common problems with CSV data headed into your GIS, and secondly some useful ways to share messages about data changes. In both cases we will automate processing with ArcGIS Data Interoperability in ArcGIS Pro 2.9 with a nod to ArcPy for a helping hand (but you don't have to code). You will learn how to send a message card to a Microsoft Teams channel so your whole organization can be notified when data events happen: Teams channel message card There are oceans of CSV files out there, it's the 'lowest common denominator' of portable tabular 'database' (possibly excepting column aligned text) which many apps can ingest, CSV is easy to host, send and retrieve, and if you're desperate you can actually read it manually. This has made CSV a highly successful distribution format, including for data that changes frequently. My sample data is daily 311 requests. Here I am reviewing the data: Every time I tackle CSV the same issue arises - what's the schema? Rarely do curators of the data publish the data definition of each column or provide a schema.ini file, so the schema needs to be discovered if you want a quality outcome. Many subsidiary issues arise frequently. Is what looks like an integer column always an integer in all rows? Can I reliably convert date-looking columns into actual datetime data? CSV has no null value - an empty string "" might mean null, or might need to be an actual empty string. What field width do I need for each string field? What numeric types will work for numbers in the data? It goes on. I'll give you a pattern here to make schema discovery and data definition easy. You see I started with a clerical examination in Excel. It is OK for browsing but I would not convert CSV via XLSX and the geoprocessing tool Excel to Table 'hands free' because no data issues get found or fixed. Similarly, Table to Table is robust but does things like create text fields with 8000 byte width. Apart from loose field definitions there are going to be data errors you should repair, and nothing beats a combination of automated and interactive inspection for that. Then what did browsing the data in Excel tell us? Luckily the web page where the CSV is downloaded from has a data dictionary that describes most field purposes, but not the field definition, however with a little clerical effort we can make these observations: CSV Field Notes: service_request_id Unique identifier, looks to be integer service_request_parent_id Self join for service_request_id if it's a duplicate request, I see "nan" where nulls should be sap_notification_number If the 311 request is in the city's SAP system this is the identifier, looks like a 64bit integer date_requested Looks like ISO format, great! case_age_days How long it has taken to 'get it done' in whole days, so should be small integer case_record_type Team that is responsible at the city, text service_name Classification of the 311 service type, text date_closed Looks like ISO format again status Current request state - might update during the request lifespan, text lat Latitude (but I see some are missing), double lng Longitude (missing if Latitude is missing), double street_address Civic address of the 311 request, text zipcode ZIP code, may be zip+4, should be text but I see some decimal values! council_district District, encoded as an integer comm_plan_code Community, encoded as an integer comm_plan_name Community, as text park_name Park or open space name if applicable, text case_origin Method used to submit the request, text referred If the status field is 'referred' this describes to whom, text public_description Crowd sourced text, my favorite data! It can be any width imfloc Not described, drop it floc Not described, drop it So far so good, we have a rough idea of how the schema should look. How do we refine it? In the post download toolbox is a Python script tool ReportCSVProperties which does a full scan of the data and creates an in-memory metadata table showing each field definition that will work with the data, i.e. numbers will be preserved and text will not be truncated. Running it with the sample data the message stream looks like this (actually each 1% progress is reported): Inspecting 323441 rows... Making in-memory table... Inspecting data... 1% complete ... 50% complete ... 100% complete The output table gives me a good start with the schema: ReportCSVProperties Output If I trusted these definitions I could use this schema information in the field map control in Table to Table to directly convert the data to geodatabase, but I happen to know at a minimum there are "nan" string values polluting what should be an integer field (causing it to be treated as string) plus datetime and XY coordinate fields in there which would require subsequent geoprocessing to expose properly; however for the record you could adopt the schema and apply field widths like this: (I usually round widths up as a little future-proofing, so for example the street_address field width 104 I bump up to 120): Table to Table Field Map Example My goal here showing you automated ETL, not step-wise processes, so we will jump into ArcGIS Data Interoperability functionality. Beginning with the the CSV source, it is at a hyperlink. I right-clicked on the dataset icon to get the download URL. So lets go web-to-spatial-features in one tool, correcting and casting as we go: Import 311 CSV to Geodatabase Features Lets look at the CSV reader. I manually set the schema using the number types and string widths found by my helper tool except where I knew the "nan" values threw it off, plus I detected ISO format datetimes and created geometry from the lat/lng fields. When you specify a field to be integer, any string values are replaced with nulls, so the "nan" values (this means "not a number", its a computing convention, usually cased as "NaN") are fixed. So just at the reader step I have the exact schema I want. But wait there's more! One of the great features of the Workbench app is being able to cache data at each canvas element and inspect it. Look what the reader turned up! Turn caching on and run just the reader, then click the green magnifying glass to inspect the cache. Reader with caching enabled Inspecting the cache The data should all be in San Diego, but I see dots all over the US plus one in Australia. Plus selecting features with no geometry finds 2450 features. These issues need fixing! To fix null geometries and out-of-area geometries I geocoded features that were not in the city area. If I could not geocode them I emailed a note suggesting the data be reviewed. Email from the ETL process That gets my CSV data cleaned up and into a geodatabase, from which I made a feature service. Note there are four layers in the service for each request status type - Closed, In Process, New and Referred. Feature Service from CSV So that's part one, you can see how easy it is to get CSV data into beautiful shape in your GIS. Now for the fun part, beautiful messaging when the data changes! The data changes daily, to refresh the feature service I made the ETL tool RefreshGetItDone311. Here is a screen grab: Refresh 311 Feature Service This tool is a bit busier than its parent that writes to geodatabase because it reads and writes the four layers in the feature service after doing change detection. Step through it yourself, note the two tunnels and how I merge existing geometry and changed attribution to write only the daily delta to the service. This workspace can be made into a scheduled task to maintain the data continually. I'll leave the service up for a while, you can add it to a map yourself. The real thing I want to highlight though is the very last transformer in the workspace, a WorkspaceRunner. This will perform some postprocessing - refresh a pie-chart item and send a Microsoft Teams adaptive card with a map link to a channel - smart and attractive messaging. Pie Chart made from new 311 features Here is the postprocessing workspace, ReportGetItDone311. Update barchart and message Teams New 311 features with case_age_days equal to zero are counted, a pie chart image made (segmented by request type), then the pie chart image is used to update an image item in Online and lastly a card message is sent to aTeams channel webhook. Firstly the pie chart. While authoring the ReportGetItDone311 tool I created an image item. If you make image items public you can use their data URL as a file path in a card. I also created a web map with the New feature - filtered by case_age_days = 0. I added a bar chart to the web map. You can read here that you can create and modify web maps just by sending URLs to ArcGIS Online. With my pie chart and web map items shared publicly the last step was to tell the world whenever I run RefreshGetItDone311. I thought this was going to be scary hard but it turned out to be crazy simple. First I made an incoming webhook item in my Teams channel of choice: Incoming webhook This gives me the URL to send a card to with an HTTPCaller. Now I need the payload JSON. If you open App Studio in Teams you get a handy card editor: Teams Card Editor I copied out the JSON from the card editor, made an edit manually which a little research indicated was necessary and then in my HTTPCaller plugged in some runtime variable substitution to pick up the date and feature count for the day's processing. Just to prove it wasn't a fluke I'll run it again in the morning before posting this blog. ...zzzzzzz... Here is the result: Card in Teams Web Map Linked From Card Pie Chart Item In Online So there you have it, plain old CSV turned into beautifully typed feature service data and talking to us with a beautiful message. The tools are in the post download (I will deactivate the Teams webhook URL).
... View more
11-17-2021
06:20 AM
|
3
|
0
|
1413
|
|
BLOG
|
My ArcGIS Data Interoperability ETL process was going fine but suddenly its ArcGIS Online Feature Service writer threw an error which terminated my workspace, the error message looked like this: The error code from the server was '400' and the message was: ''. Details: 'Field public_description has invalid html content. The data is from a local government 311 site - the field public_description is where the public have captured the details of their service request. How did HTML get in there? What was invalid about it? How do I harden my translation to trap this kind of thing? After a lot of investigation and making a note to self to bring writer rejection handling up with Safe again I found the offending data. The investigation reminded me of math class when we worked through Isaac Newton's regula falsi method, turning caching on and using two Sampler transformers to bracket the bad guy until I found it. It wasn't even HTML, it was a bracketed email: <somebody@hotmail.com>. OK I thought, its reasonable to put an email in a free text field. Why is the feature service so finicky about it? It turns out feature services have a setting (on by default) to cause potentially harmful content to be blocked. I have a few choices. If my data really did have HTML content I could percent-encode the field with the TextEncoder and HTML mode, I could manually encode brackets as < and >, I could remove anything in tags with StringReplacer and a regular expression <.*>, or I could switch the service property and let the data through. Let's say I want this last option. I need to use an admin REST API call. The property I need to change is xssPreventionInfo. To do this, in your browser and logged in as the service owner, go to the Content section in the Home app and navigate to the feature service item. At bottom right is a control that lets you View the item REST API: Go there in your browser (your URL will of course be different). https://services.arcgis.com/FQD0rKU8X5sAQfh8/arcgis/rest/services/Get_It_Done_311_Requests/FeatureServer?token=<yourtoken> To access the admin API insert the word 'admin' between 'rest' and 'services': https://services.arcgis.com/FQD0rKU8X5sAQfh8/ArcGIS/rest/admin/services/Get_It_Done_311_Requests/FeatureServer?token=<yourtoken> Then at the bottom of the page go to the UpdateDefinition endpoint: https://services.arcgis.com/FQD0rKU8X5sAQfh8/ArcGIS/rest/admin/services/Get_It_Done_311_Requests/FeatureServer/updateDefinition?token=<yourtoken> Now search for the xssPreventionInfo property, set the value for xssInputRule to sanitizeInvalid: Click the Update Service Definition button and check it completes OK. Now you can write HTML and things like <somebody@hotmail.com> to your feature service!
... View more
11-15-2021
08:00 AM
|
6
|
10
|
6976
|
|
POST
|
Hi Joe There is a link to training resources here - see Helpful Links on the right: https://community.esri.com/t5/arcgis-data-interoperability/ct-p/arcgis-data-interoperability Pathway entry point goes here: https://learn.arcgis.com/en/paths/integrate-data-and-apps-without-coding-using-arcgis-data-interoperability/ If you have any problems or just questions email me directly, or message in the community and I'll get it.
... View more
11-04-2021
07:36 AM
|
1
|
0
|
2149
|
|
IDEA
|
Hear hear. I also navigate in Explorer to my project home folder often so I added a tool to the quick access toolbar to do that for me: https://pm.maps.arcgis.com/home/item.html?id=1dfcb2ddf2c549f7a0cca8c6ffc8985d
... View more
11-03-2021
09:36 AM
|
0
|
0
|
2215
|
|
BLOG
|
@JosephCarl2 Another option is to use Task Scheduler to execute fme.exe with the fmw file (and other parameters) as arguments. The command syntax is exposed at the top of every workspace log file when run interactively. If parameters don't change then don't publish them and they can be omitted.
... View more
11-03-2021
06:53 AM
|
1
|
0
|
6209
|
|
BLOG
|
I would really like to throw Data Interoperability extension at this problem, its what it does ;-).
... View more
11-01-2021
08:05 AM
|
0
|
0
|
1632
|
|
POST
|
Hi Brittany, I think you found a bug - thank you! I'll follow up with the development folks to take a look. I look forward to hearing from you!
... View more
10-21-2021
09:08 AM
|
1
|
0
|
3560
|
|
POST
|
Hi Brittany, usually this type of error is an overflow of the data you are attempting to write, meaning the target field public_description is defined as being a particular width but a value coming in is wider. "description" fields are notorious for this, people write 'books' into some web form and you get more characters than the field can take. Truncate the data with a string function like @Left(@Value(public_description),100).
... View more
10-20-2021
11:13 AM
|
0
|
2
|
3582
|
|
BLOG
|
A colleague recently asked me how to move a couple of billion records to GeoEvent's spatiotemporal big data store (STBDS) at a customer site, using ArcGIS Data Interoperability. These are archived vehicle positions for a utility company, stored in an Oracle geodatabase. A couple of million new events are coming in daily. The archived data is in 70+ tables. Because a feature service in the STBDS has the same REST API as an ordinary hosted feature service, the plan going in was to leverage the pattern described in my earlier blog, namely to ETL the archive data into portal shapefile items then use the target layer's Append REST endpoint to asynchronously load the data. An additional wrinkle was to parallelize the ETL as multiple concurrent jobs. This approach would reduce the risk of network outages while streaming small transactions (1000 features is the default batch size) to the portal. It turned out the customer environment wasn't at a release that supported the workflow and we did go with streaming, but it got me thinking of how to ETL big data to the new wave of cloud warehouses which you can use natively, and in ArcGIS Pro 2.9. My basic message is that you can do these lift and shift jobs by moving big datasets as files, in multiple concurrent jobs, thus maximizing throughput and minimizing transport risk. Lets see how. Cloud warehouses like Snowflake, Big Query and Redshift can be queried and read in ArcGIS Pro 2.9, but not written-to using out of the box tools. ArcGIS Data Interoperability can however write to these warehouses, including spatial data, but the default mode is streaming, which might not scale how you need. I'm going to show you a pattern you can use across all three warehouses: ETL whole datasets, including spatial data, as Apache Parquet or other format like CSV Geometry is encoded in a character-based standard format Automate the ETL in multiple concurrent processes Don't write any code beyond any SQL or macro commands required by the target environment I'll also throw a bone to any coders lurking in my no-code blog space, see below 😉, i.e. some Python tips on creating Parquet files (Note: Parquet files are a supported item type in ArcGIS Online from September 22nd 2021. Share some!) Billions of features are in scope with this pattern but I'm using a more modest data-set for demonstration purposes, only 2.3 million point features. 2.3 Million Point Features My data is in 12 feature classes, you can have any number. The pattern I will show works with data split into separate parts that can be processed concurrently. If your data is monolithic then either split it yourself spatially (oriented fishnet anyone?) or by adding a field signifying a batch identifier populated by row position - you can drop the field during processing. I mentioned Snowflake, Big Query and Redshift warehouses. In all cases you can stage Parquet files where the target environment can see them and then load from the Parquet files. For spatial data, the Parquet files will need geometry encoded in a format understood by the target environment (Snowflake and Big Query support GeoJSON and WKT, Redshift supports WKT). I will only provide a worked example with GeoJSON going to Snowflake. My demo data is point geometry and the field I use to store the GeoJSON has a width of 100, if you are using polyline or polygon data you should investigate how wide your most point rich features are when encoding the field. For example I selected a very point rich polygon in a layer and as GeoJSON it is 2,071,156 characters: with arcpy.da.SearchCursor('NZ Property Titles','shape@') as cursor:
for row in cursor:
print(len(str(row[0].__geo_interface__)))
2071156 Note that Data Interoperability can control the decimal precision used by GeoJSON; for geographic data a value of 7 is reasonable, the same polygon then uses 1,274,064 characters. For example the first coordinate goes from (172.90677540000001,-41.12752416699993) to (172.9067754,-41.1275242). Remember every byte counts! Note: For Big Query, Data Interoperability has a GoogleBigQueryConnector hub transformer that can load CSV to tables. This may be simpler than sending Parquet and using the bq command environment to load data, I have not investigated the scenario. Let's dig into my particular workflow. The secret sauce is to create two ETL tools, the first marshals the jobs and calls a WorkspaceRunner transformer that calls the second tool, which does the work. It is very simple, here is LoadManager.fmw, it takes a list of arguments, in my case feature class names in a geodatabase: LoadManager WorkspaceRunner starts up to 7 FME processes that run a target tool until the job queue is consumed. Processing is likely to be CPU bound while worker processes extract, encode and upload the dataset files. I allowed each process to run two jobs which ate my incoming datasets in 6 processes. WorkspaceRunner Here is LoadWorkerParquet.fmw. LoadWorkerParquet It also is a simple tool, it reads geodatabase, writes a local parquet file then sends the parquet file to Snowflake where the data is copied into a table. I'll let you inspect the SQLExecutor yourselves but as an aid to understanding, after variable substitution here is what a statement looks like (to Snowflake): create or replace file format Canterburyparquet_format
type = 'parquet';
create or replace temporary stage stageCanterbury
file_format = Canterburyparquet_format;
put file://C:\Work\Parquet\TitlesCanterbury.parquet @stageCanterbury;
copy into "INTEROPERABILITY"."PUBLIC"."Titles"
from (select
$1:id::number,
$1:title_no::varchar,
$1:status::varchar,
$1:type::varchar,
$1:land_district::varchar,
$1:issue_date::timestamp,
$1:guarantee_status::varchar,
$1:estate_description::varchar,
$1:number_owners::varchar,
$1:spatial_extents_shared::varchar,
to_geography($1:geom::varchar)
from @stageCanterbury); Once the parquet file gets to Snowflake the ingest is blazing fast. By the way, I learned to do this by reading the help, I'm no Snowflake DBA. What performance should you expect? At writing I'm trapped at home like the rest of us but on my home WiFi I get 2.3 million features loaded to Snowflake in 6 minutes, so with a decent computer and wired network I think conservatively 25 million point features per hour. Of course for a production environment and a really big job you could use multiple computers, certainly the target cloud warehouses will scale to take the throughput. In the blog download you'll notice I include a second worker tool, LoadWorker.fmw, this was for me to compare performance of the usual way to write to Snowflake with 100K features per transaction, it was way slower. Now back in core Pro 2.9 my data is loaded to Snowflake and I can throw queries at it and enjoy the scaled compute experience. Snowflake in Catalog Pane I mentioned a Python option for creating parquet files, it is in the blog download but here it is too: # Pro 2.9+ example creation of a parquet file from a feature class
# Geometry is encoded as GeoJSON in a field 'geom'
import arcpy
import pyarrow.parquet as pq
arcpy.env.overwriteOutput = True
# Source feature class
Canterbury = r"C:\Work\Parquet\Parquet.gdb\Canterbury"
# Create in-memory feature class in WGS84
with arcpy.EnvManager(outputCoordinateSystem='GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]',
geographicTransformations="NZGD_2000_To_WGS_1984_1"):
arcpy.conversion.ExportFeatures(
in_features=Canterbury,
out_features=r"memory\Canterbury")
# Add the geom field (non-point geometry will require a wider field
Canterbury = arcpy.management.AddField("memory\Canterbury","geom","TEXT",None,None,100,'',"NULLABLE","NON_REQUIRED",'').getOutput(0)
# Derive GeoJSON
with arcpy.da.UpdateCursor(Canterbury,['shape@','geom']) as cursor:
for row in cursor:
row[1] = str(row[0].__geo_interface__)
cursor.updateRow(row)
# Drop geometry by creating a Table
esriTable = arcpy.conversion.TableToTable(Canterbury,"memory","CanterburyTable").getOutput(0)
# Make arrow table
arrowTable = arcpy.da.TableToArrowTable(esriTable)
# Write parquet
pq.write_table(arrowTable,r'C:\Work\Parquet\TitlesCanterbury.parquet',
version='1.0',
compression='SNAPPY') Have fun moving that data around at scale!
... View more
09-17-2021
11:28 AM
|
2
|
0
|
3863
|
|
BLOG
|
"You'll need to write a connector". That's what people used to say when a need arose to build an integration between ArcGIS and another system. Nowadays however, the information technology landscape has matured and how apps communicate with each other has centered on a handful of patterns that everyone (who wants to stay relevant) uses. ArcGIS Data Interoperability walks and talks in this space. A clear winner is called REST, and since I'm not a computer scientist I will not go into what you can read for yourself, all I care about is that pretty much everything on the web can send and receive data in a way I can easily use (mostly JSON, with some holdouts still using XML and a few stretching usability with protocol buffer payloads but I'll keep an eye on them for you, I think that will pass 😉.) with ArcGIS Data Interoperability. I need to add a graphic before the search engines get bored with me and I don't get indexed. Here is a workspace (StubOutAPIJSONReading.fmw in the post download) I used to initially explore an API (details to follow): Exploring an API My sample API is published by the good folks at Clarity Movement. Clarity was founded in 2014 to tackle the global air pollution crisis and now provides cost-effective, scalable, and reliable air quality monitoring to customers in more than 60 countries around the world. Clarity's solution enables governments and communities to collect higher-resolution air quality data by supplementing existing regulatory monitors with dense networks of continuously calibrated air quality sensors. I like Clarity's Measurements endpoint as a good example of the most common pattern you will encounter, namely handling JSON data returned from an HTTP call. Before you panic about things like transport protocols and JSON parsng, relax! Data Interoperability handles it all for you, which is just as well because otherwise you would have to read stuff like this: {
"_id": "6137a995cd42fd51fcda7083",
"device": "609935dcc9348052e0c5d917",
"deviceCode": "AY989QV6",
"time": "2021-09-07T17:00:00.000Z",
"location": {
"coordinates": [
-120.90439519586954,
36.01958888490562
],
"type": "Point"
},
"recId": "averaged:AY989QV6:hour:2021-09-07T18:00:00",
"characteristics": {
"relHumid": {
"value": 30.428397178649904,
"weight": 4,
"raw": 30.428397178649904
},
"temperature": {
"value": 30.882064819335939,
"weight": 4,
"raw": 30.882064819335939
},
"pm2_5ConcNum": {
"value": 24.072547912597658,
"weight": 4,
"raw": 24.072547912597658
},
"pm2_5ConcMass": {
"value": 13.646130166709775,
"weight": 4,
"raw": 21.126213550567628,
"calibratedValue": 13.646130166709775,
"epaNowCast": 15.097605519225573
},
"pm1ConcNum": {
"value": 22.670581817626954,
"weight": 4,
"raw": 22.670581817626954
},
"pm1ConcMass": {
"value": 12.186892986297608,
"weight": 4,
"raw": 12.186892986297608
},
"pm10ConcNum": {
"value": 24.33531427383423,
"weight": 4,
"raw": 24.33531427383423
},
"pm10ConcMass": {
"value": 28.42475652694702,
"weight": 4,
"raw": 28.42475652694702
},
"no2Conc": {
"value": -6.413380280137062,
"weight": 4,
"raw": -6.413380280137062
},
"pm2_5ConcMass_24HourRollingMean": {
"value": 15.003665288163619,
"weight": 88,
"raw": 27.317325342785229,
"calibratedValue": 15.003665288163619
},
"pm2_5ConcNum_24HourRollingMean": {
"value": 26.923120065168903,
"weight": 88,
"raw": 26.923120065168903
},
"pm10ConcMass_24HourRollingMean": {
"value": 37.67677915096283,
"weight": 88,
"raw": 37.67677915096283
},
"pm10ConcNum_24HourRollingMean": {
"value": 27.273549459197306,
"weight": 88,
"raw": 27.273549459197306
}
},
"average": "hour"
} The job is to turn this data into a hosted feature service, and automate service maintenance. A boon of having a GIS background is you likely know some Python, and JSON payloads like the above look just like Python dictionaries and lists. You only need to learn a couple of tricks before you can tackle pretty much any REST API. Trick #1: Using HTTPCaller. HTTPCaller HTTPCaller lets you simply fill in a form to make a web call and receive a response (JSON in this case). First you must read your API doc and determine the required parameters, plus any optional ones you want, and whether to use a GET or POST method. GET is usually for shorter URLs, POST supports longer URLs and also uploads supplied in a Body. Another aspect is authentication. Many APIs require an API key (like you see here) or a token. Typically keys do not expire and tokens do, plus tokens require a generation step (which may be done separately via HTTP or via OAuth2 in a web connection you set up for your account). My example requires a key. My call requests a JSON array of 10 air quality measurements and returns it in an attribute named _response_body. Trick #2: Unpacking JSON I'm headed to ingesting arbitrary numbers of air quality measurements but first I must figure out how to extract data from the JSON. I requested 10 air quality measurements so the returned JSON will be an array. Each measurement in the array will have the same schema used by the fields of my features - the above JSON is a measurement feature. I need to figure out extracting field values. The simplest way to do this is write one array feature's JSON to a file and use the data-aware capability of JSONExtractor to let me build queries without coding any JSON queries. If I temporarily set the JSONExtractor to read from a JSON file then I get a handy-dandy picker to build my JSON queries. Then I can copy-paste the JSONExtractor into a production tool, set it to read from the incoming feature's JSON attribute it will extract and expose all the available field values. How easy is that! Here is my JSONExtractor while I'm reading from a file and I'm building my queries: JSONExtractor Now its all the way down hill to creating and maintaining my feature service. Clarity2GDB.fmw uses the JSON exploration done above to write out a feature class in my Pro project's home geodatabase - there is some renaming of attributes done, and of course they get their desired types set when writing. Clarity2GDB Then after creating my feature service from the geodatabase feature class I can recycle the work to maintain the feature service with RefreshClarityService.fmw, which only differs at the writer step. You'll notice only changed features are written to the target service, the workspace has a parameter that sets the history interval for my features, features in the service older than that get aged out. RefreshClarityService This final tool can be scheduled or run on demand. That's it! I have conquered the complexities of web integration and have a feature service I can use to power my maps, apps and dashboards. Feature Service The blog download has the ETL tools I used, less a functional API key, please contact Clarity if you want to test and implement an integration.
... View more
09-08-2021
10:07 AM
|
2
|
0
|
2308
|
| Title | Kudos | Posted |
|---|---|---|
| 1 | 10-03-2025 05:45 AM | |
| 1 | 11-21-2025 05:34 AM | |
| 2 | 10-06-2025 05:36 AM | |
| 1 | 11-03-2025 05:14 AM | |
| 3 | 11-04-2025 08:41 AM |
| Online Status |
Offline
|
| Date Last Visited |
11 hours ago
|