Select to view content in your preferred language

Extract data from Microsofct Word .docx for fetaure class using Python or other tools?

2540
5
01-07-2021 05:24 AM
AlanMcDowell1
Emerging Contributor

Are there tools or Python scripts that will extract text from Microsoft Word (2016) files (.docx )? 

Issue - I have dozens or Word documents , all in the same format, that have data I would like to extract to put in a feature class. The Word documents have listings of geocoordinates, and text descriptions that would fit nicely into feature class fields. 

Plan - I would like to be able to extract the data to a table in a FGDB (or into a CSV file) that I can then convert to a feature class.

Are there Python modules, or Python code, or ESRI models that can do this?

Thank you!

5 Replies
GeoJosh
Esri Regular Contributor

Morning, AlanMcDowell1!

I'm not aware of any Esri tools to accomplish what you describe, but the docx Python module looks like it might do the job.

https://python-docx.readthedocs.io/en/latest/

Josh

AlanMcDowell1
Emerging Contributor

I spent a few hours with the docx Python module but could not get it to work, it is probably because I am a novice user of Python and was doing something wrong.

0 Kudos
DanPatterson
MVP Esteemed Contributor

moved to Data Management Questions - GeoNet, The Esri Community

to garner a more focused audience to your question


... sort of retired...
Robert_LeClair
Esri Notable Contributor

The LocateXT extension for ArcGIS Pro can do this.  It can work with non-structured data such as a Word Doc and extract coordinates and other attributes out into a geodatabase point feature class. You can even build create custom attributes to only pull out certain things from your unstructured data.  Pretty interesting tool!

AlanMcDowell1
Emerging Contributor

Thanks. Locate XT looks like it will work. I will post back after I get an license and try it out.