I want to enable our users to be able to search for text inside attached documents.
Example scenario: User needs to find all features that have attached documents that contain the word "bicycle"
Does anybody have any ideas how to do this?
As a general approach, I'd use Python, with a read cursor to iterate through each attachment, download it to a temp file, and search using a relevant Python library that reads your attachment file types. If you get a match, return the attachment's GlobalID and use that to find and return parent record data.
Downloading could be slow, but I don't know any options for reading attachments in situ.
FME can access attachments and should be able to read/search PDFs, but AFAIK, it can't read Word Documents, so you're back to using Python in FME if that's what you have.
Thanks Mic, much appreciated. Having read your response I think I'll adjust my approach to immediately loading the text contents when the document is attached and then search through the attachments table: