Select to view content in your preferred language

Gather data from a regularly formatted webpage (Tax Parcels)

318
2
08-01-2022 04:56 PM
AlfredBaldenweck
MVP Regular Contributor

Hi all,

We frequently use the tax parcel layer published by the State of Hawaii to help us plan projects.

In the last few years, the State stopped publishing the layer with the ownership information, opting instead to include a link to a webpage featuring ownership, taxes, etc. as an attribute

Example here, with Hawai'i Volcano National Park. qPublic.net - Hawai'i County, HI - Report: 980010010000 (schneidercorp.com)

I'd like to be able to populate a copy of the layer (filtered to be relevant to us) with attributes from the webpage, mostly (especially) the ownership information.

Does anyone have any tips as to this might be done? Dynamic is not needed.

Thanks!

0 Kudos
2 Replies
I_AM_ERROR
Regular Contributor

Since the URL contains the TMK # of the parcel you could use that with the requests library. Retrieve info from the page, parse the return, then repeat for each record of interest.

0 Kudos
by Anonymous User
Not applicable

Taking a look at the sites robots.txt file, it disallows all user agents (web crawlers/ automatic scraping) for /Application.aprx/ so be respectful/careful how you go about your data extraction. 

You can use the python package BeautifulSoup to extract items/text from webpages/urls- there are a ton of tutorials on the net for how it can be done.

0 Kudos