Calculating Trip Length with ST_GeodesicLengthWGS84

4934
3
Jump to solution
06-20-2015 08:19 AM
DerckVonck
New Contributor

Hi,

We are looking at vehicle tracking data for two of our clients.  We have about 1TB of csv data that contains long,lat,time,speed,bearing,linear g,lateral g and ignition status data.  We have setup an experimental Hadoop Cluster using the Hortonworks HDP 2.2.6 of 9 data nodes using Ambari.  We managed to ingest the data and convert to ORC file in about 2 hours.  In total we have just over 1.3 billion points. 

We want to identify trips (a consecutive series of points in time for a specific vehicle between ignition on and off) to determine movement patterns over time.  This we can do using Hive queries quite easily using the lead and lag functions.  My problem however comes in when we want to calculate the length of the trip. All data is stored in WGS84 lat long so I need to use the ST_GeodesicLengthWGS84 in the spatial-framework-for-hadoop.  I have managed to build the framework and run some test queries on our hadoop cluster and all seems to be working well.  My problem is to convert the lat, long data to a line string that is required as input for ST_GeodesicLengthWGS84.  I did come across a modified constructor for ST_LineString in the following fork of the spatial-framework-for-hadoop at https://github.com/cartershanklin/spatial-framework-for-hadoop.  The changes that he made were never pulled into the main repo however and it looks like a lot of changes have been made to the main repo. 

Has anybody done similar analysis and found a way around this issue.

Regards

Derck

Derck Vonck

Technical Lead

esri South Africa

0 Kudos
1 Solution

Accepted Solutions
MichaelPark
Esri Contributor

Hi Derck,

We have created an issue for this request on github.

Add ST_LineString Hive UDF constructor for array of points. · Issue #84 · Esri/spatial-framework-for...

Is it more useful for you to have a constructor that takes an array of ST_Point, or one that takes two separate arrays for x and y values?

Mike

View solution in original post

0 Kudos
3 Replies
MichaelPark
Esri Contributor

Hi Derck,

We have created an issue for this request on github.

Add ST_LineString Hive UDF constructor for array of points. · Issue #84 · Esri/spatial-framework-for...

Is it more useful for you to have a constructor that takes an array of ST_Point, or one that takes two separate arrays for x and y values?

Mike

0 Kudos
DerckVonck
New Contributor

Hi Mike

Either option would be good. Although I would think that just using two arrays of double for the longitude and latitude (y) should be a bit faster.

Thank you very much for your prompt response.

Regards

Derck

Ps Have you ever used the spatial framework for hadoop with ORC files? I get an error about vector not instantiated when I use ORC data.

Sent from Samsung Mobile

0 Kudos
MichaelPark
Esri Contributor
Either option would be good.

Sounds good.  We will probably just add both options anyway.

Have you ever used the spatial framework for hadoop with ORC files? I get an error about vector not instantiated when I use ORC data.

I have not.  Can you create an issue in the above github repository with some information about which Hive/Hadoop versions you are using and the full error message?  If you don't have an account there, you can just reply with that information here and I'll create an issue.  

0 Kudos