Are there tools or Python scripts that will extract text from Microsoft Word (2016) files (.docx )?
Issue - I have dozens or Word documents , all in the same format, that have data I would like to extract to put in a feature class. The Word documents have listings of geocoordinates, and text descriptions that would fit nicely into feature class fields.
Plan - I would like to be able to extract the data to a table in a FGDB (or into a CSV file) that I can then convert to a feature class.
Are there Python modules, or Python code, or ESRI models that can do this?
Thank you!
Morning, AlanMcDowell1!
I'm not aware of any Esri tools to accomplish what you describe, but the docx Python module looks like it might do the job.
https://python-docx.readthedocs.io/en/latest/
Josh
I spent a few hours with the docx Python module but could not get it to work, it is probably because I am a novice user of Python and was doing something wrong.
moved to Data Management Questions - GeoNet, The Esri Community
to garner a more focused audience to your question
The LocateXT extension for ArcGIS Pro can do this. It can work with non-structured data such as a Word Doc and extract coordinates and other attributes out into a geodatabase point feature class. You can even build create custom attributes to only pull out certain things from your unstructured data. Pretty interesting tool!
Thanks. Locate XT looks like it will work. I will post back after I get an license and try it out.