to do this, you will need to extend the harvesting framework and for example Apache Tika. A description of how to extend the metadata harvester is given on the Geoportal Server wiki. I have attached some code I have been working on to do exactly this. Using Tika, it reads the metadata from the documents (such as Office/PDF document properties) into a Dublin Core XML structure. My next step was to see if I could geolocate places based on accessing a gazetteer with place names. I'd be happy to collaborate on further developing this and adding it to Geoportal Server as a feature.