Notebook server spatial dataframe date from query result parsed wrong

946
4
Jump to solution
02-02-2021 11:30 PM
Labels (1)
JoëlHempenius3
Occasional Contributor II

I'm currently testing a Notebook server standard 10.8.1 and see some strange behavior on date fields in the spatial data frame.

I get a layer from my ArcGIS Enterprise:

JoëlHempenius3_0-1612336963824.png

I do a query with a geometry filter on that layer and specify the out fields

JoëlHempenius3_1-1612337009286.png

And from the result I get the spatial data frame:

JoëlHempenius3_2-1612337102018.png

Note that all dates in DATUM_AANLEG column are close to unix epoch time.

When I get feature with OBJECTID 17296550

JoëlHempenius3_3-1612337205274.png

and I convert the date manually, (javascript console new Date(1403748000000) ) I get Thu Jun 26 2014 04:00:00 GMT+0200 (Midden-Europese zomertijd), which is the correct date.

Clearly, something goes wrong when querying the service and converting the results to the spatial data frame. But I do not see where and how I can fix it.

Any ideas?

 

 

-Joël Hempenius.

Languages: JavaScript, Python and Dunglish
0 Kudos
1 Solution

Accepted Solutions
jcarlson
MVP Esteemed Contributor

That's very odd. There's something else going on, it's not just you being new to Pandas.

Here are some other suggestions:

Using "as_df=True"

In the initial query, include "as_df=True", rather than subsequently calling ".sdf" on the featureset. Then your query comes out directly into a dataframe. Saves you a step and an intermediate output as well, even if it doesn't solve the problem.

I've noticed a few differences between the two methods of generating a spatially-enabled dataframes, and I use both, depending on the situation.

 

Get the Integer First

It seems it's converting your field to a datetime64[ns] at the initial conversion to a dataframe.

If we're starting form a datetime instead of an integer, converting that to other datetime types would change the underlying integer value, but remain representative of the same timestamp.

So what we need to do is get the integer that it thinks are in nanoseconds, and call "astype" on that. It seems rather inelegant, but as long as we get the right data in the end, I suppose.

spdf['DATUM_AANLEG'].astype('int').astype('datetime64[ms]')

 

Doing some Math

If that second suggestion doesn't work, you could try to convert your timestamps into nanosecond integers. 1 ms = 1 * 10^6 ns.

(spdf['DATUM_AANLEG'].astype('int') * 10**6).astype('datetime64[ns]')

 

I hope at least one of those things works!

- Josh Carlson
Kendall County GIS

View solution in original post

0 Kudos
4 Replies
jcarlson
MVP Esteemed Contributor

I believe the issue here is that pandas infers timestamps to be in nanoseconds, according to the docs.

To address this, you'll need to explicitly tell pandas that your timestamps are in milliseconds.

Consider the following code block and its output:

rng = np.random.default_rng()

df = pd.DataFrame({'dates':rng.integers(low=1403700000000, high=1403800000000, size=20)})

df.assign(
    nanoseconds=df['dates'].astype('datetime64[ns]'),
    milliseconds=df['dates'].astype('datetime64[ms]')    
)

jcarlson_0-1612363507566.png

Looks familiar!

In your code, just append ".astype({'DATUM_AANLEG':'datetime64[ms]'})" after "sdf", and that should get you on the right track.

 

- Josh Carlson
Kendall County GIS
0 Kudos
JoëlHempenius3
Occasional Contributor II

Hi Josh,

Thanks for your response.
I tried your suggestion this way:

JoëlHempenius3_0-1612369120825.png

But that didn't make any difference. I'm not sure whether I did that right.

I also tried this:

JoëlHempenius3_1-1612369317167.png

but that doesn't fix it either.

And finally:

JoëlHempenius3_3-1612369684178.png

It clearly shows though that I'm new to panda's and don't know what I'm doing 🐼

 

Your code sample looks fine on my Notebook server

JoëlHempenius3_2-1612369427777.png

 

Any other suggestions?


 

 

-Joël Hempenius.

Languages: JavaScript, Python and Dunglish
0 Kudos
jcarlson
MVP Esteemed Contributor

That's very odd. There's something else going on, it's not just you being new to Pandas.

Here are some other suggestions:

Using "as_df=True"

In the initial query, include "as_df=True", rather than subsequently calling ".sdf" on the featureset. Then your query comes out directly into a dataframe. Saves you a step and an intermediate output as well, even if it doesn't solve the problem.

I've noticed a few differences between the two methods of generating a spatially-enabled dataframes, and I use both, depending on the situation.

 

Get the Integer First

It seems it's converting your field to a datetime64[ns] at the initial conversion to a dataframe.

If we're starting form a datetime instead of an integer, converting that to other datetime types would change the underlying integer value, but remain representative of the same timestamp.

So what we need to do is get the integer that it thinks are in nanoseconds, and call "astype" on that. It seems rather inelegant, but as long as we get the right data in the end, I suppose.

spdf['DATUM_AANLEG'].astype('int').astype('datetime64[ms]')

 

Doing some Math

If that second suggestion doesn't work, you could try to convert your timestamps into nanosecond integers. 1 ms = 1 * 10^6 ns.

(spdf['DATUM_AANLEG'].astype('int') * 10**6).astype('datetime64[ns]')

 

I hope at least one of those things works!

- Josh Carlson
Kendall County GIS
0 Kudos
JoëlHempenius3
Occasional Contributor II

I think I will contact my distributor, because "as_df" doesn't work:

JoëlHempenius3_0-1612425151505.png

Your second suggestion also resulted in an exception.

But finally:

JoëlHempenius3_1-1612425321269.png

Still going to contact my distributor, because it shouldn't be so difficult.

Thanks for all your suggestion Josh

-Joël Hempenius.

Languages: JavaScript, Python and Dunglish