Natural Language Processing function

1439
8
08-09-2016 05:51 AM
Status: Open
Labels (1)
DanCole
New Contributor II

I would like to propose that Esri add a Natural Language Processing (NLP) function for geo-coding records that currently have only descriptive locations, e.g., an object was collected, during 1876 in Nebraska, along the right back of the Missouri River 25 miles northwest of its intersection with the Big Sioux River.  This sort of function is available for free by using the GeoLocate program developed by Tulane University.  Currently, third party NLP commercial programs, MetaCarta and LocateXT, are used by the defense and intelligence sectors to obtain geographic coordinates from unstructured text.  What we need, however, is this function to work with structured text, such as in museum database records.  So I'm asking if Esri would be willing to work with Tulane University to incorporate GeoLocate into ArcGIS.

8 Comments
DanCole

No one from Esri has commented on this yet.  Is there any chance that it will appear on Esri's radar in the future?

KimOllivier

I would like to see a little NLP to extract an address from other clutter to improve the match rate. At the moment a person's name in the address such as is put on mail labels confounds the very strict requirements of an address string.

There are open source packages that do this very well that I have experimented with. They need a body of examples to be trained on, but that isn't very hard to do once. Here is an example: GitHub - datamade/usaddress: a python library for parsing unstructured address strings into address ... 

KevinMacLeod4

I have noticed searching things like Owner Name in a parcel layer in the geocoder is frustrating. If you search Michael Smith it works but not Smith Michael for example. Sure we can use TSQL etc and engineer around it but it would be excellent if they had Locators work with MS SQL Server's natural language processing out of the box.  The geocoder and Search Widget are the linchpin of almost every map viewer. Leverage SQL Server capability for NLP to achieve a "Google"-like intelligence and user interface that would be very friendly.

Thoughts Esri?

BruceHarold

Kimo, use a Single Field input to geocode noisy data.  The Geocode Addresses tool accepts this mode.  Concatenate (space separated) your input data with people names and all into one field.  If your data has multiple countries in it, leave the country value in its own field but concatenate the other fields into one field, then use Mutliple Field input but supply only one field to the Address input, plus the country field to Country.

NLP is coming to Data Interoperability in 2019.

DarronPustam

Esri has acquired ClearTerra LocateXT technology and it is being implemented as part of the ArcGIS platform.

There will be an ArcGIS LocateXT Extension which will be available for ArcMap, ArcGIS Pro, and Enterprise. This new capability discovers & extracts geocoordinates, place names, and other critical information from multiple types of unstructured data formats and places it into ArcGIS for visualization and analysis.


The ArcGIS LocateXT Extension will be a new extension to the ArcGIS Platform to handle "structured" data such as records and unstructured data - the first version of which will be available in ArcGIS Pro 2.3.

Happy to have a conversation with those interested.

Darron

DanPatterson_Retired

There are many good libraries as well already in existence which can easily be installed and incorporated in ArcGIS Pro ….

e..g 

Natural Language Toolkit — NLTK 3.4 documentation 

DanCole
Darron,
I am looking forward to the new versions of ArcGIS with the LocateXT Extension.  At the Smithsonian, we tried using MetaCarta over 10 years ago with moderate success since that program was constructed to only parse unstructured descriptive locations rather than structured descriptions as found in older collection localities within museum databases. 
thanks,
Dan Cole
BruceHarold