topic Re: Gather data from a regularly formatted webpage (Tax Parcels) in Python Questions

Gather data from a regularly formatted webpage (Tax Parcels)

AlfredBaldenweck — Mon, 01 Aug 2022 23:56:16 GMT

Hi all,

We frequently use the tax parcel layer published by the State of Hawaii to help us plan projects.

In the last few years, the State stopped publishing the layer with the ownership information, opting instead to include a link to a webpage featuring ownership, taxes, etc. as an attribute

Example here, with Hawai'i Volcano National Park. qPublic.net - Hawai'i County, HI - Report: 980010010000 (schneidercorp.com)

I'd like to be able to populate a copy of the layer (filtered to be relevant to us) with attributes from the webpage, mostly (especially) the ownership information.

Does anyone have any tips as to this might be done? Dynamic is not needed.

Thanks!

Re: Gather data from a regularly formatted webpage (Tax Parcels)

I_AM_ERROR — Tue, 02 Aug 2022 01:33:26 GMT

Since the URL contains the TMK # of the parcel you could use that with the requests library. Retrieve info from the page, parse the return, then repeat for each record of interest.

Re: Gather data from a regularly formatted webpage (Tax Parcels)

Anonymous User — Tue, 02 Aug 2022 03:56:57 GMT

Taking a look at the sites robots.txt file, it disallows all user agents (web crawlers/ automatic scraping) for /Application.aprx/ so be respectful/careful how you go about your data extraction.

You can use the python package BeautifulSoup to extract items/text from webpages/urls- there are a ton of tutorials on the net for how it can be done.