Extract Locations from Document (LocateXT) not finding coordinates?

1592
3
08-06-2019 09:53 AM
RyanHowell1
New Contributor III

I have a big project that involves parsing many PDF's for locations and just got a LocateXT license to try using the Extract Locations from Document tool. So far, I've had mixed results.

I've run the tool on 6 different PDF's; 2 of them work perfectly, 2 of them give me a bunch of garbage depending on the configuration of the tool but don't pull out the coordinates, and 2 of them won't give me anything no matter what I try.

The successful ones were in DMS and DD, the unsuccessful ones were a mix as follows:

Successful:

118°45'15.435"W, 42°33'14.25"N

40.174996° N, 111.072179° W

Unsuccessful

(116°W, 43°N)

(lat 40°23'N, long 120°34'W).

located at 49'41'N and 74"35'W

(latitude 48°19'35"N, longitude 85°21'01"W)

I've run all of them through with varying combinations of what types of coordinates to parse for, but as I copied and pasted the coordinates into this I noticed a lot of the symbols got messed up (° changed to 0, etc) so that might be part of my problem. However converting it to Word and txt files didn't solve the issue.

My big question is has anyone had experience with this tool and has had success working through issues like I am having? Are there any resources you found helpful other than the tool documentation (https://pro.arcgis.com/en/pro-app/tool-reference/conversion/extract-locations-from-document.htm)? Any recommendations on how to make this more streamlined for processing many documents would be appreciated.

0 Kudos
3 Replies
AryleButler
Esri Contributor

Hello Ryan,

Without seeing the PDFs, I can't say exactly what's going on, but there is some general information I can give you. LocateXT for ArcGIS Pro 2.4.x is currently able to scan documents for certain coordinate formats. You can find a list of these coordinates, along with examples and efficacy within LocateXT by, in an open ArcGIS Pro Project:

  1. Click Add Data and select Extract Locations
  2. In the Extract Locations pane:
    1. Select the Properties tab
    2. In the Properties tab click the red icon
    3. Be sure the Coordinates tab is selected
    4. View the coordinates available

You may want to compare your coordinates that did not match, with the coordinate formats available.

For example:

  • 116°W, 43°N may need to be written 43°00'00.0"N 116°00'00.0"W for LocateXT to recognize it. 

If these are locations that come up frequently in your documents, you can create a custom location. For example, you could name it 116°W, 43°N and input the coordinates as 43°00'00.0"N 116°00'00.0"W , so that each time LocateXT sees 116°W, 43°N, it plots the point correctly, rather than ignoring the notation entirely. You can read more about this in our documentation: Extract custom locations.

As for the two PDFs that LocateXT seems to produce no data from, are these PDFs readable? Meaning, is the text in the PDF a scanned image, or actual text? LocateXT is currently unable to scan images for text, and as such PDFs that contain scanned or image based text must be run through Optical Character Recognition software and then those output can be scanned by LocateXT.

I hope this helps!

-Aryle

ThomasL
Occasional Contributor

Hi Aryle Butler

I also had poor results on pdf-files (vector-based pdf-files, no OCR needed), but more success with txt-files and LocateXT.

My experience is that LocateXT has a very limited pdf-reader capability compared to open source solutions like ghostscript.

For this reason, I convert all pdf-files to txt-files before processing it with LocateXT (easily scripted using python by calling at batch-command to ghostscript).

This is of cores a workaround and not expected for the prices we have to pay for the LocateXT extension?

AlfredoConetta2
New Contributor

I found that if you do not have a pdf reader on your system it is likely that there is no ifilter for the pdf which is a dependency for LocateXT to work smoothly. This is also the same with MS Office documents if you do not have MS office loaded on.  

0 Kudos