Scraping data from web map with embedded data

8418
6
Jump to solution
02-20-2021 01:52 AM
mapalapa
New Contributor II

Hi all,

I'm trying to export data from an online map (https://www.temporary-url.com/288D18).  The json I've managed to find is here: https://bit.ly/3duvYBM.  I've tried the JSON to Features tool in ArcGIS 10.8, but it doesn't like the json format.  I'm hoping someone can help me ingest this data into a shape file or similar?

Many thanks in advance....

0 Kudos
1 Solution

Accepted Solutions
jcarlson
MVP Esteemed Contributor

That's good to know, especially the link you shared!

I wasn't talking about ethics so much as basic CYA. I've known a few projects that got tanked due to a Data Owner taking issue with how they obtained / used their data, so my default response to getting data in a way that isn't specifically and explicitly permitted is perhaps overly cautious.

Knowing now that this is for academic research, and that you're clearly well aware of your legal "footing", so to speak, I certainly feel more comfortable elaborating.

So, we've got the web map definition JSON. Looking at it, it's clear the data we want is in a series of separate CSV files added to the web map in AGOL, but not published as a standalone service. That's unfortunate, as we can't directly query a service URL for those. But we can still work from the map.

Using the arcgis module in Python, we can pull a list of layers for the map, then interact with each to convert the contents to friendlier formats. The JSON says that these layers support using query, but I can't seem to get it to work. Anyhow, it's less elegant, but this still works:

 

 

from arcgis import GIS
from arcgis.features import FeatureSet
from arcgis.mapping import WebMap

gis = GIS()

the_map = WebMap(gis.content.get('df5f3c6ca21a44c5ba7c39d9355ff9dd')

for l in the_map.layers[:-1]:
    fs = FeatureSet.from_dict(l['featureCollection']['layers'][0]['featureSet'])

 

 

 We're still one line short of this code giving you any kind of output, but that depends where you're running your code.

Running in ArcGIS

with access to arcpy, you can call save on a FeatureSet. Simply append "fs.save('your-file-path', l['title'])" inside the for-loop.

Outside of ArcGIS - to JSON / GeoJSON

To JSON / GeoJSON: you can call the property "to_geojson" or "to_json" on the FeatureSet. Writing the output of that to a file could then be used with JSON to Feature Class. Seems redundant, though.

Using a Spatially Enabled DataFrame

I like using DataFrames when I can, but it can be irritating if the data's not just so. "fs.sdf" gets you the FeatureSet as a DataFrame, and from there, you can use the spatial.to_featureclass method. You might need to explicitly tell it what the data types are, though, and that could be tricky inside of a loop if the CSVs don't have identical schemas.

- Josh Carlson
Kendall County GIS

View solution in original post

6 Replies
jcarlson
MVP Esteemed Contributor

First of all, you really ought to be careful with this sort of thing. While the data may seem "public" because we can see it on a web map, scraping the data itself is often a violation of a website or organization's Terms of Service. If there isn't a big button that says "download this data", that's often an intentional choice by the data owner.

There is also a chance that the data owner is happy to share their data. You should put in some effort looking around their website and contact pages to try and see who maintains the data, what, if any, terms and conditions are in place, etc. For the map in question, the data owners have a dedicated Data Request Form and a Code of Conduct you must agree to. I'd go that route, if you really want to access this dataset.

And finally, to more specifically address your post's technical question: the JSON you've shared looks like the service definition for the map, not the dataset itself. If you were handy with Python or JavaScript, it would be a simple enough matter to extract single portions of this and reshape it into the correct format for working with in a desktop GIS. However, due to the existence of more official means of getting this data, and terms attached to the use of it, I strongly discourage your doing so.

- Josh Carlson
Kendall County GIS
0 Kudos
mapalapa
New Contributor II

Thank you for your thoughts, Josh.  While I appreciate your concern, your post is largely unhelpful and does not address my question.  To address your concerns, web scraping of public pages is legal for non-commercial uses (https://medium.com/@tjwaterman99/web-scraping-is-now-legal-6bf0e5730a78), which is true of this use (academic research).  This page requires no login, and is freely available online.  The contact page is generally intended for detailed plot data that is available to authorized users only.  I am accessing only a public-facing data source.  I don't think you would say that it would be unethical for me to take out a ruler and measure the location of every point on the map manually.  There is no real difference between my efforts here and such a manual method.  I would like to ask that further discussion of ethics be stopped at this point - if you're not comfortable with my methods, then please feel free to bow out of this thread.

Regarding your brief thought on my original question: yes, I could rip through the data definition and extract all the relevant data manually, and I may end up doing this.  But that's a bit of messy work, and I'm hoping to find a cleaner way of getting at the data that arcgis can ingest directly.  Thanks for any further thoughts you may have.

jcarlson
MVP Esteemed Contributor

That's good to know, especially the link you shared!

I wasn't talking about ethics so much as basic CYA. I've known a few projects that got tanked due to a Data Owner taking issue with how they obtained / used their data, so my default response to getting data in a way that isn't specifically and explicitly permitted is perhaps overly cautious.

Knowing now that this is for academic research, and that you're clearly well aware of your legal "footing", so to speak, I certainly feel more comfortable elaborating.

So, we've got the web map definition JSON. Looking at it, it's clear the data we want is in a series of separate CSV files added to the web map in AGOL, but not published as a standalone service. That's unfortunate, as we can't directly query a service URL for those. But we can still work from the map.

Using the arcgis module in Python, we can pull a list of layers for the map, then interact with each to convert the contents to friendlier formats. The JSON says that these layers support using query, but I can't seem to get it to work. Anyhow, it's less elegant, but this still works:

 

 

from arcgis import GIS
from arcgis.features import FeatureSet
from arcgis.mapping import WebMap

gis = GIS()

the_map = WebMap(gis.content.get('df5f3c6ca21a44c5ba7c39d9355ff9dd')

for l in the_map.layers[:-1]:
    fs = FeatureSet.from_dict(l['featureCollection']['layers'][0]['featureSet'])

 

 

 We're still one line short of this code giving you any kind of output, but that depends where you're running your code.

Running in ArcGIS

with access to arcpy, you can call save on a FeatureSet. Simply append "fs.save('your-file-path', l['title'])" inside the for-loop.

Outside of ArcGIS - to JSON / GeoJSON

To JSON / GeoJSON: you can call the property "to_geojson" or "to_json" on the FeatureSet. Writing the output of that to a file could then be used with JSON to Feature Class. Seems redundant, though.

Using a Spatially Enabled DataFrame

I like using DataFrames when I can, but it can be irritating if the data's not just so. "fs.sdf" gets you the FeatureSet as a DataFrame, and from there, you can use the spatial.to_featureclass method. You might need to explicitly tell it what the data types are, though, and that could be tricky inside of a loop if the CSVs don't have identical schemas.

- Josh Carlson
Kendall County GIS
AndrewEastop
New Contributor III

Hello Josh

I am doing some university research on planning within council areas and was just reading your very useful answer. It looks exactly what I need so thank you for that!

May i please ask if this is the correct way i should be using the code?

This is the url for the online map i want to export.. https://bdbc.maps.arcgis.com/apps/webappviewer/index.html?id=dd203d139d254dacb4faab6326d64c66

so i replace the green text with what i have highlighted in bold and then add in fs.save('C:\Users\aeast\Documents\Uni\council data', l['council data'])

any help would be very much appreicated!!

from arcgis import GIS
from arcgis.features import FeatureSet
from arcgis.mapping import WebMap

gis = GIS()

the_map = WebMap(gis.content.get('df5f3c6ca21a44c5ba7c39d9355ff9dd')

for l in the_map.layers[:-1]:
    fs = FeatureSet.from_dict(l['featureCollection']['layers'][0]['featureSet'])

 

0 Kudos
mapalapa
New Contributor II

Thanks Josh,

This is super helpful, and is really a full answer.  Thanks so much!

jcarlson
MVP Esteemed Contributor

I love a good puzzle, and this certainly was one. Happy to help!

- Josh Carlson
Kendall County GIS
0 Kudos