Every now and then a compelling workflow is enabled by new ideas, and data distribution by cloud native formats is my example today. Take this (very plain) map for example:
Places of interest in London, EnglandPlaces of interest in London, England
What you are looking at is a hosted group layer in ArcGIS Online with 306229 place points around London, England, with related tables; 306229 address details, 18566 business brand names, 237806 website URLs, 414651 source attributions, 206639 social media link URLs, and 279495 phone numbers. If we explore places in Covent Garden we can see dozens of business brands:
Covent GardenCovent Garden
This being England, you might be interested in tea rooms about the city. Using place category and related tables we can see not just their locations but their address, website, social media links and phone numbers:
Tea RoomsTea Rooms
Tea Rooms Related DataTea Rooms Related Data
This place data is one of the themes from Overture Maps Foundation and is made available under this license. If you surf the Overture website, you'll see it is a collaboration of Amazon, Meta, Microsoft and TomTom as steering members, Esri as a general member, and many other contributor members and is envisaged as a resource for "developers who build map services or use geospatial data". I'm democratizing it a bit more here, giving you a pattern for consuming the data as hosted feature layers in your ArcGIS Online or ArcGIS Enterprise portals.
Let's dig into the details of how to migrate the data to feature services.
The data is made available at Amazon S3 and Azure object stores as Parquet files, with anonymous access. I'll let you read up on the format details elsewhere but Parquet is definitely one star of the cloud native show because it is optimized for querying by attribute column, and in the case of GeoParquet, this includes a spatial column (technically multiple spatial columns if you want to push your luck). As GeoParquet is an emerging format it still has some things that are TODO, like a spatial index (which would let you query by spatial operators), but Overture very thoughtfully include a bounding box property which is simple to query by X and Y.
The technology that is the second star of the cloud native show is DuckDB. DuckDB enables SQL query of local or remote files like CSV, JSON and of course Parquet (and many more) as if they are local databases. Remote file query is especially powerful if the host portal supports the Amazon S3 REST API and a client that talks this can use HTTP to send SQL queries, which DuckDB can. Especially powerful is DuckDB's ability to unpack complex data types (arrays and structs) into a rich data model like I'm doing here (base features and 1:M related tables), and not just flat tables. Only the query result is returned, not the remote file.
The third star of the cloud native show is ArcGIS Online hosted notebooks, which can be used to orchestrate DuckDB transactions and integrate the results into new or refreshed hosted feature layers in Online or Enterprise. The superpower of this combination of Parquet, DuckDB and Python is that global scale data can be queried for a subset of interest in a desired schema using industry standard SQL, plus automated . This forever deprecates the legacy workflow of downloading superset data files to retrieve the part you want. At writing, the places data resolves to 6 files totaling 5.54GB, not something you want to haul over the wire before you start processing! If you think about it, any file format you have to zip to move around and unzip to query (shapefile, file geodatabase) generates some friction, Parquet avoids this.
The notebook named DuckDBIntegration is what I used to import the places data and is in the blog download. Notebooks aren't easy to share graphically but I'll make a few points.
Firstly, ArcGIS notebooks don't include the DuckDB Python API, so it needs to be installed from PyPi, here is my cell that does the import and also loads the spatial and httpfs extensions needed for DuckDB in this workflow.
If you browse the places schema note the YAML tab. Don't be surprised in 2025 if you see YAML powering AI to let you have conversations with your data 😉
Each extract from the S3 hive follows the same pattern. Here is the cell that extracts address data - note the UNNEST operator that unpacks a struct.
print('Extracting Addresses starting at {}'.format(getNow()))
sql = """with address as (select id, unnest(addresses, recursive := true) from places_view where addresses is not null
and {}) select id, freeform, locality, postcode, region, country from address;""".format(whereExp)
relation = duckdb.sql(sql)
addresses = arcpy.management.CreateTable(out_path=r"/arcgis/Places.gdb",out_name="Addresses").getOutput(0)
arcpy.management.AddField(in_table=addresses,field_name="id",field_type="TEXT",field_length=40,field_is_nullable="NULLABLE",field_is_required="REQUIRED")
arcpy.management.AddField(in_table=addresses,field_name="freeform",field_type="TEXT",field_length=1000,field_is_nullable="NULLABLE",field_is_required="NON_REQUIRED")
arcpy.management.AddField(in_table=addresses,field_name="locality",field_type="TEXT",field_length=100,field_is_nullable="NULLABLE",field_is_required="NON_REQUIRED")
arcpy.management.AddField(in_table=addresses,field_name="postcode",field_type="TEXT",field_length=1000,field_is_nullable="NULLABLE",field_is_required="NON_REQUIRED")
arcpy.management.AddField(in_table=addresses,field_name="region",field_type="TEXT",field_length=100,field_is_nullable="NULLABLE",field_is_required="NON_REQUIRED")
arcpy.management.AddField(in_table=addresses,field_name="country",field_type="TEXT",field_length=2,field_is_nullable="NULLABLE",field_is_required="NON_REQUIRED")
print('Addresses table created at {}'.format(getNow()))
with arcpy.da.InsertCursor(addresses,["id","freeform","locality","postcode","region","country"]) as iCursor:
row = relation.fetchone()
i = 1
if row:
while row:
if i % 10000 == 0:
print('Inserted {} rows at {}'.format(str(i),getNow()))
iCursor.insertRow(row)
i+=1
row = relation.fetchone()
del iCursor
print('Addresses table populated at {}'.format(getNow()))
You'll see I set specific field widths for text fields. Common web formats (CSV, JSON, Parquet) don't let you tune data types like this but I like to do so. However if you're sharp eyed you'll see the postcode field has a crazy width of 1000. The pattern I settled on to determine field widths was to have a utility cell (in the download) in my notebook during construction which I used to explore the data. I found maximum field sizes that way. It turns out the crazy big postcode I found looks like a data error.
Likely Data ErrorLikely Data Error
To go into production with this approach first figure out a query on extent or address fields (region, country) that works for you, then plug it into the notebook after sharing it to ArcGIS Online. You'll need to supply your own credentials of course, and change target gis if you're using Enterprise for the output feature service. To initialize the feature service, run the notebook manually down to the cell that zips up the working file geodatabase, download the zip file from the notebook and share it as an item and feature service, then plug in the item id. Then at each Overture release, change the theme URL to match the release and refresh your service.
To give you a feel for performance, it takes a few minutes for the S3 hive to be queried for each layer but the results write at tens of thousands of features per second. It takes about 50 minutes for my test extent and data model, including overwrite of a target feature service.
Naturally, if the data schema changes or to support other Overture themes, you'll need to author notebook changes or new notebooks. Do share!
I'm sure you'll agree that this is a powerful new paradigm that will change the industry. I'm just following industry trends. It will be fun to see if Parquet is what comes after the shapefile for data sharing.
The blog download has the notebook but not a file geodatabase (it's too big for the blog technology) but when you generate your own services don't forget the data license here. Have fun!
DuckDBIntegration.zip
7 Comments
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
';
}
}
}
catch(e){
}
}
}
if (newSub.getAttribute("slang").toLowerCase() != code_l.toLowerCase()) {
if (trLabelsHtml != "") {
var labelSname = "";
if(labelEle[i].querySelector("ul li:nth-child(1)").getAttribute("aria-hidden")){
labelSname = labelEle[i].querySelector("ul li:nth-child(1)").outerHTML;
}
labelEle[i].innerHTML = "";
labelEle[i].innerHTML = labelSname + trLabelsHtml;
}
}
}
}
}
catch(e){
}
}
}
/* V 2.0:3 = Store not translated reply id */
if(lingoRSXML.snapshotLength == 0){
if($scope.falseReplyID == "") {
$scope.falseReplyID = value;
}
}
/* Get translated Body of Replies/Comments */
var lingoRBXML = doc.evaluate(lingoRBExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for(var i=0;i 0) {
var attachDiv = rootElement.querySelector('div.lia-quilt-row-main').querySelector('div.custom-attachments');
if (attachDiv) {
attachDiv = attachDiv.outerHTML;
}
else if(rootElement.querySelector('div.lia-quilt-row-main').querySelectorAll('#attachments').length > 0){
if ("BlogArticlePage" == "BlogArticlePage") {
attachDiv = rootElement.querySelector('div.lia-quilt-row-main .lia-message-body-content').querySelector('#attachments');
if (attachDiv) {
attachDiv = attachDiv.outerHTML;
}
else{
attachDiv = "";
}
}else{
attachDiv = rootElement.querySelector('div.lia-quilt-row-main').querySelector('#attachments').outerHTML;
}
}
else {
attachDiv = "";
}
/* Feedback Div */
var feedbackDiv = "";
var feedbackDivs = rootElement.querySelector('div.lia-quilt-row-main').querySelectorAll('div.lia-panel-feedback-banner-safe');
if (feedbackDivs.length > 0) {
for (var k = 0; k < feedbackDivs.length; k++) {
feedbackDiv = feedbackDiv + feedbackDivs[k].outerHTML;
}
}
}
else {
var attachDiv = rootElement.querySelector('div.lia-message-body-content').querySelector('div.Attachments.preview-attachments');
if (attachDiv) {
attachDiv = attachDiv.outerHTML;
} else {
attachDiv = "";
}
/* Everyone tags links */
if (document.querySelectorAll("div.TagList").length > 0){
var everyoneTagslink = document.querySelector('div.lia-quilt-row-main').querySelector(".MessageTagsTaplet .TagList");
if ((everyoneTagslink != null)||(everyoneTagslink != undefined)){
everyoneTagslink = everyoneTagslink.outerHTML;
}
else{
everyoneTagslink = "";
}
}
/* Feedback Div */
var feedbackDiv = "";
var feedbackDivs = rootElement.querySelector('div.lia-message-body-content').querySelectorAll('div.lia-panel-feedback-banner-safe');
if (feedbackDivs.length > 0) {
for (var m = 0; m < feedbackDivs.length; m++) {
feedbackDiv = feedbackDiv + feedbackDivs[m].outerHTML;
}
}
}
}
} catch (e) {
}
if (body_L == "") {
/* V 2.0:7 Replacing translated video data with source video data */
var newBodyVideoData = newBody.querySelectorAll('div[class*="video-embed"]');
angular.forEach($scope.videoData[value], function (sourceVideoElement, index) {
if (index <= (newBodyVideoData.length - 1)) {
newBodyVideoData[index].outerHTML = sourceVideoElement.outerHTML
}
});
/* V 2.0:7 = Replacing translated image data with source data */
var newBodyImageData = newBody.querySelectorAll('[class*="lia-image"]');
angular.forEach($scope.imageData[value], function (sourceImgElement, index) {
if (index <= (newBodyImageData.length - 1)) {
newBodyImageData[index].outerHTML = sourceImgElement.outerHTML;
}
});
/* V 2.0:7 = Replacing translated pre tag data with source data */
var newBodyPreTagData = newBody.querySelectorAll('pre');
angular.forEach($scope.preTagData[value], function (sourcePreTagElement, index) {
if (index <= (newBodyPreTagData.length - 1)) {
newBodyPreTagData[index].outerHTML = sourcePreTagElement.outerHTML;
}
});
}
var copyBodySubject = false;
if (body_L == "") {
copyBodySubject = true;
body_L = newBody.innerHTML;
}
/* This code is written as part of video fix by iTalent */
/* try{
var iframeHTMLText = body_L;
var searchIframeText = "<IFRAME";
var foundiFrameTag;
if (iframeHTMLText.indexOf(searchIframeText) > -1) {
foundiFrameTag = decodeHTMLEntities(iframeHTMLText);
foundiFrameTag = foundiFrameTag.split('src="')[1];
body_L = foundiFrameTag;
}
}
catch(e){
} */
/* This code is placed to remove the extra meta tag adding in the UI*/
try{
body_L = body_L.replace('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />','');
}
catch(e){
}
/** We should not replace the source content if user profile language and selected target language matches with source language **/
if(showTrContent) {
var compiled = false;
rootElement.querySelectorAll('div.lia-message-body-content')[0].innerHTML = null
if("BlogArticlePage"=="IdeaPage"){
// var customAttachDiv = '';
rootElement.querySelectorAll('div.lia-message-body-content')[0].innerHTML = body_L + feedbackDiv ;
$compile(rootElement.querySelectorAll('div.lia-message-body-content')[0])($scope);
compiled = true;
/* Attach atttach div */
// document.querySelector("div.translation-attachments-"+value).innerHTML = attachDiv;
rootElement.querySelectorAll('div.lia-message-body-content')[0].insertAdjacentHTML('afterend',attachDiv);
if(rootElement.querySelectorAll('div.lia-quilt-idea-message .lia-message-body .lia-attachments-message').length > 1){
rootElement.querySelectorAll('div.lia-quilt-idea-message .lia-message-body .lia-attachments-message')[1].remove();
}
} else {
if("BlogArticlePage"=="TkbArticlePage"){
rootElement.querySelectorAll('div.lia-message-body-content')[0].innerHTML = body_L + feedbackDiv ;
}else{
rootElement.querySelectorAll('div.lia-message-body-content')[0].innerHTML = body_L + feedbackDiv + attachDiv;
compiled = true;
}
}
/* Destroy and recreate OOyala player videos to restore the videos in target languages which is written by iTalent as part of iTrack LILICON-79 */ /* Destroy and recreate OOyala player videos */
try{
// $scope.videoData[value][0].querySelector("div").getAttribute("id");
for(var vidIndex=0; vidIndex<$scope.videoData[value].length; vidIndex++){
if( $scope.videoData[value][vidIndex].querySelector("div") != null){
var containerId = LITHIUM.OOYALA.players[$scope.videoData[value][vidIndex].querySelector("div").getAttribute("id")].containerId;
videoId = LITHIUM.OOYALA.players[$scope.videoData[value][vidIndex].querySelector("div").getAttribute("id")].videoId;
/** Get the Video object */
vid = OO.Player.create(containerId,videoId);
/** Destroy the video **/
vid.destroy();
/** recreate in the same position */
var vid = OO.Player.create(containerId,videoId);
}
}
}
catch(e){
}
try{
for(var vidIndex=0; vidIndex<($scope.videoData[value].length); vidIndex++){
if($scope.videoData[value][vidIndex].querySelector('video-js') != null){
var data_id = $scope.videoData[value][vidIndex].querySelector('video-js').getAttribute('data-video-id');
var data_account = $scope.videoData[value][vidIndex].querySelector('video-js').getAttribute('data-account');
var data_palyer = $scope.videoData[value][vidIndex].querySelector('video-js').getAttribute('data-player');
var div = document.createElement('div');
div.id = "brightcove";
div.class = "brightcove-player";
div.innerHTML =
'(view in my videos)'
var data = div.getElementsByClassName("video-js");
var script = document.createElement('script');
script.src = "https://players.brightcove.net/" + data_account + "/" + data_palyer + "_default/index.min.js";
for(var i=0;i< data.length;i++){
videodata.push(data[i]);
}
}
}
for(var i=0;i< videodata.length;i++){
document.getElementsByClassName('lia-vid-container')[i].innerHTML = videodata[i].outerHTML;
document.body.appendChild(script);
}
}
catch(e){
}
if(!compiled){
/* Re compile html */
$compile(rootElement.querySelectorAll('div.lia-message-body-content')[0])($scope);
}
}
if (code_l.toLowerCase() != newBody.getAttribute("slang").toLowerCase()) {
/* Adding Translation flag */
var tr_obj = $filter('filter')($scope.sourceLangList, function (obj_l) {
return obj_l.code.toLowerCase() === newBody.getAttribute("slang").toLowerCase()
});
if (tr_obj.length > 0) {
tr_text = "Esri may utilize third parties to translate your data and/or imagery to facilitate communication across different languages.".replace(/lilicon-trans-text/g, tr_obj[0].title);
try {
if ($scope.wootMessages[$rootScope.profLang] != undefined) {
tr_text = $scope.wootMessages[$rootScope.profLang].replace(/lilicon-trans-text/g, tr_obj[0].title);
}
} catch (e) {
}
} else {
//tr_text = "This message was translated for your convenience!";
tr_text = "Esri may utilize third parties to translate your data and/or imagery to facilitate communication across different languages.";
}
try {
if (!document.getElementById("tr-msz-" + value)) {
var tr_para = document.createElement("P");
tr_para.setAttribute("id", "tr-msz-" + value);
tr_para.setAttribute("class", "tr-msz");
tr_para.style.textAlign = 'justify';
var tr_fTag = document.createElement("IMG");
tr_fTag.setAttribute("class", "tFlag");
tr_fTag.setAttribute("src", "/html/assets/langTrFlag.PNG");
tr_fTag.style.marginRight = "5px";
tr_fTag.style.height = "14px";
tr_para.appendChild(tr_fTag);
var tr_textNode = document.createTextNode(tr_text);
tr_para.appendChild(tr_textNode);
/* Woot message only for multi source */
if(rootElement.querySelector(".lia-quilt-forum-message")){
rootElement.querySelector(".lia-quilt-forum-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-message-view-blog-topic-message")) {
rootElement.querySelector(".lia-message-view-blog-topic-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-blog-reply-message")){
rootElement.querySelector(".lia-quilt-blog-reply-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-message")){
rootElement.querySelector(".lia-quilt-tkb-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-reply-message")){
rootElement.querySelector(".lia-quilt-tkb-reply-message").insertBefore(tr_para,rootElement.querySelector(".lia-quilt-row.lia-quilt-row-footer"));
} else if(rootElement.querySelector(".lia-quilt-idea-message")){
rootElement.querySelector(".lia-quilt-idea-message").appendChild(tr_para);
} else if(rootElement.querySelector('.lia-quilt-occasion-message')){
rootElement.querySelector('.lia-quilt-occasion-message').appendChild(tr_para);
}
else {
if (rootElement.querySelectorAll('div.lia-quilt-row-footer').length > 0) {
rootElement.querySelectorAll('div.lia-quilt-row-footer')[0].appendChild(tr_para);
} else {
rootElement.querySelectorAll('div.lia-quilt-column-message-footer')[0].appendChild(tr_para);
}
}
}
} catch (e) {
}
}
} else {
/* Do not display button for same language */
// syncList.remove(value);
var index = $scope.syncList.indexOf(value);
if (index > -1) {
$scope.syncList.splice(index, 1);
}
}
}
}
});
});
/* V 1.1:2 = Reply Sync button for multi source translation */
} catch(e){
console.log(e);
}
};
if((rContent != undefined) && (rContent != "")) {
drawCanvas(decodeURIComponent(rContent));
/** Update variable with selected language code **/
$scope.previousSelCode = code_l;
}
};
/**
* @function manageTranslation
* @description Managess the translation of given language for the thread
* @param {string} langCode - Language Code
* @param {string} tid - Thread ID
*/
$scope.manageTranslation = function (langCode, tid) {
//debugger;
$scope.showTrText = false;
/* V 2.0:5 = actualStatus variable introduced to indicate detailed connector status on UI. This variable holds the actual translation percentage */
$scope.transPercent = "";
$scope.actualStatus = "";
if (tid != "") {
var bulkTranslation = lithiumPlugin.bulkTranslation(langCode, tid);
bulkTranslation.then(function (trContent) {
if(trContent.body != "") {
$scope.showPreview(trContent.body, $scope.mszList, langCode);
if(langCode != "en-US") {
$scope.showTrText = true;
}
}
if((trContent.status != "NA") && trContent.status != null) {
// $scope.transPercent = String(trContent.status);
$scope.actualStatus = String(trContent.status);
} else {
// $rootScope.errorMsg = "Translation is in progress. Please check again a few minutes."
$rootScope.errorMsg = "Translation is in progress. Please retry in a few minutes."
}
$scope.workbench = trContent.wb;
/* V 2.0:4 = Trigger uncalled or delayed callbacks (documnet uploaded/translation completed from lithium).*/
if(trContent.callback == 'true') {
var trCompletCallback = lithiumPlugin.trCompletCallback(langCode, trContent.docID);
trCompletCallback.then(function (callback){
// $rootScope.errorMsg = "Downloading Translated content in " + langCode + " now. Please check again in a few minutes."
$rootScope.errorMsg = "Uploading content to translate. Please check again in a few minutes."
});
} else if (trContent.callback == 'upload') {
var trCompletUpload = lithiumPlugin.trCompletUpload(langCode, trContent.docID);
trCompletUpload.then(function (callback) {
//$rootScope.errorMsg = "Uploading content to translate. Please check again in a few minutes."
$rootScope.errorMsg = "Uploading content to translate. Please check again in a few minutes."
});
} else if ("many" == "one") {
$scope.updateOOS();
} else if("SmartConx" == "SmartConx"){
if ("many" == "many"){
$scope.updateOOS();
}
}else if ((trContent.status != null) && trContent.status.includes("100")) {
/* If everything fine then only check Out of Sync status */
$scope.updateOOS();
} else {
/* If translation perccent is less than 100 then show the percentage on UI */
$scope.transPercent = $scope.actualStatus;
}
});
}
}
/**
* @function selectThisLang
* @description Called on select dropdown.
* @param {string} lang - Language code
*
*/
$scope.selectThisLang = function (lang, anonymousFlag) {
/* 1.4:3 Update Analytics on language selection */
try {
lingoThreadLangSelected(lang, '1372477');
} catch (e) {
}
/** Display Translated content **/
var getTranslation = lithiumPlugin.getTranslation(lang, "1372477");
getTranslation.then(function (trContent) {
if (trContent.body != "") {
$scope.showPreview(trContent.body, $scope.mszList, lang);
} else {
//$rootScope.errorMsg = "Translation is in progress. Please check again in a few minutes."
$rootScope.errorMsg = "Translation is in progress. Please retry in a few minutes."
}
});
};
var decodeEntities = (function() {
// this prevents any overhead from creating the object each time
var element = document.createElement('div');
function decodeHTMLEntities (str) {
if(str && typeof str === 'string') {
// strip script/html tags
str = str.replace(/