AnsweredAssumed Answered

How to use beautiful soup to remove HTML tags from ArcGIS Metadata

Question asked by jpilbeam on Dec 22, 2017
Latest reply on Dec 27, 2017 by jpilbeam

I have a script that uses a Python package called arcpy_metdata. It basically allows you to get at ArcGIS metadata.

The script is set up to write the metadata to a text file and runs without errors, but unfortunately HTML code used to format the Description and Limitation items also gets written. It interferes with the readability of the textfile. I contacted the author of the package and he suggested BeautifulSoup, which leads to my question. I have this so far, but am at a loss at how to implement it:

from bs4 import BeautifulSoup 
cleantext = BeautifulSoup(raw_html).text

And here is my Metadata2Txt script:

import arcpy
import arcpy_metadata as md
import re


ws = r'Database Connections\ims to Plainfield.sde\gisedit.DBO.Tax_Map_LY\gisedit.DBO.Tax_Map_Parcels_LY'
metadata = md.MetadataEditor(ws)
path = r'\\gisfile\GISstaff\Jared\Python Scripts\Test\Parcels'
##cleantext = BeautifulSoup(raw_html).text

def meta2txt():
    title = metadata.title
    tags = metadata.tags
    purpose = metadata.purpose
    abstract = metadata.abstract
    credits = metadata.credits
    citation = metadata.citation
    limitation = metadata.limitation
    extent_description = metadata.extent_description
    desc = arcpy.Describe(ws)
    sr = desc.spatialReference
    tf = open(path + " " + "{}".format("Metadata.txt"), "w")
    tf.write("Metadata Content:" + "\n")
    tf.write("----------------------------------------------" + "\n")

    if title:
        print('Title:\n{}\n'.format(title))
        tf.write('Title:\n{}\n'.format(title) + '\n')
    else:
        print('Title: \nThere is no title.\n')
        tf.write('Title: \nThere is no title.\n' + '\n')
       
    if tags:
        print('Tags:\n{}\n'.format(tags))
        tf.write('Tags:\n{}\n'.format(tags) + '\n')
    else:
        print("Tags: \nThere are no tags.\n")
        tf.write('Tags: \nThere are no tags.\n' + '\n')

    if purpose:
        print('Summary:\n{}\n'.format(purpose))
        tf.write('Summary:\n{}\n'.format(purpose) + '\n')
    else:
        print('Summary: \nThere is no summary.\n' + '\n')
        tf.write('Summary: \nThere is no summary.\n' + '\n')

    if abstract:
        print('Description:\n{}\n'.format(abstract))
        tf.write('Description:\n{}\n'.format(abstract) + '\n')
    else:
        print('Description: \nThere is no description.\n')
        tf.write('Description: \nThere is no description.\n' + '\n')

    if credits:
        print('Credits:\n{}\n'.format(credits))
        tf.write('Credits:\n{}\n'.format(credits) + '\n')
    else:
        print('Credits: \nThere are no credits.\n')
        tf.write('Credits: \nThere are no credits.\n' + '\n')

    if citation:
        print('Citation:\n{}\n'.format(citation))
        tf.write('Citation:\n{}\n'.format(citation) + '\n')
    else:
        print('Citation: \nThere is no citation.\n')
        tf.write('Citation: \nThere is no citation.\n' + '\n')

    if limitation:
        print('Limitation:\n{}\n'.format(limitation))
        tf.write('Limitation:\n{}\n'.format(limitation) + '\n')
    else:
        print('Limitation: \nThere is no limitation.\n')
        tf.write('Limitation: \nThere is no limitation.\n' + '\n')

    if extent_description:
        print('Extent:\n{}\n'.format(extent_description))
        tf.write('Extent:\n{}\n'.format(extent_description) + '\n')
    else:
        print('Extent: \nThere is no extent.\n')
        tf.write('Extent: \nThere is no extent.\n' + '\n')

    if sr:
        print('Spatial Reference:\n{}\n'.format(sr.name))
        tf.write('Spatial Reference:\n{}\n'.format(sr.name) + '\n')
    else:
        print('Spatial Reference: \nThere is no spatial reference.\n')
        tf.write('Extent: \nThere is no spatial reference.\n' + '\n')

meta2txt()


Here's how Description item of this particular feature class looks in the textfile after running the script:

Description:
<DIV STYLE="text-align:Left;"><DIV><DIV><P><SPAN>The tax map parcels layer is published
every year normally during the spring through the Will County Clerk Tax Extension. This
layer contains various parcels within Will County. The tax map parcels entered is only
digitized based upon plats and recorded documents received by the Tax Extension within
the Will County Clerk Office. </SPAN></P></DIV></DIV></DIV>

Outcomes