Geoportal Harvester

StephenCoppola · ‎04-28-2014

Question 1: Can the Geoportal harvester be customize to extract metadata from MS Word, PowerPoint, PDF?
Question 2: If so, which profile do these documents need (ISO, etc)?
Question 3: Are there existing tools or examples that couple Geoportal Server with textual documents?

Thank you for your time and suggestions

MartenHogeweg · ‎07-24-2014

Yes! You can extend the harvester as explained in this example:

Extending the Web Harvester · Esri/geoportal-server Wiki · GitHub‌

Using for example apache tika you could get the document information from various file types.

you would generate metadata for these docs yourself in your preferred profile (question 2). I've been using Dublin Core as there typically is only limited information available when indexing docs.

I have some code that indexes docs that I could post on GitHub (question 3). Perhaps something to work on together?