Geo-reference data in MS Word, PowerPoint, PDF

3915
1
04-24-2014 10:08 AM
StephenCoppola
New Contributor III
Good Afternoon,

Question: Does Geoportal Server support the search and retrieval of Geo-referenced data embedded in MS Word, PowerPoint, and plain-text documents?

Appreciate your insight and suggestions.
0 Kudos
1 Reply
MartenHogeweg
Esri Contributor

to do this, you will need to extend the harvesting framework and for example Apache Tika. A description of how to extend the metadata harvester is given on the Geoportal Server wiki. I have attached some code I have been working on to do exactly this. Using Tika, it reads the metadata from the documents (such as Office/PDF document properties) into a Dublin Core XML structure. My next step was to see if I could geolocate places based on accessing a gazetteer with place names. I'd be happy to collaborate on further developing this and adding it to Geoportal Server as a feature.

0 Kudos