Does "watch a folder for a new csv/json file" input connector not support UTF-8 encoding?

1088
3
08-15-2019 08:44 PM
ShengrongLiu
New Contributor II

@RJ Sunderman   @Earl Medin 

I  use  "wath a folder for a new csv/json file " input connector to ingest the real time data , and find that it works well in files that do not contain chinese characters,  if I change some field names or field value to chinese characters in the same file and it does't work, even though I have changed the file's encoding to "UTF-8". so I want to know that does these input connector support UTF-8?

and another question about this connector is that , the parameter of Expected Date Format  will convert the data time to what format? I format it like this"yyyy/MM/dd hh:mm:ss" and the input datetime is "2019/7/28 9:12" and the output datetime format become to "2019-07-28T09:12:00.000+08:00"(write to a csv file output), it is seems that it change the datetime to "UTC+8"? and then it looks like this "7/28/2019, 09:12 AM" in portal viewer througt the attribute table ,which the feature layer is published by geoevent with "add a feature to a spatiotemporal big data store" output connector.

I am a little confused about these datetime, does they mean the same time? is there any methord to keep the three of them in the same way to expressing the datetime?

thank you!

0 Kudos
3 Replies
Stefan_Jung
Esri Contributor

Hi Shengrong Liu,

we are also struggeling with the encoding. We also have text files that contains special characters. The first hint I got was to set the "Is the file text" Parameter of the Input connector to false.

The file transport has a 'Is File Text' parameter which when set 'True' (the default) limits data to single-byte character sets (typically the ASCII character set 0x0020 - 0x007e). The error is misleading as the file's data *is* UTF-8 encoded. Multibyte character sets can be UTF-8 encoded, but GeoEvent's FILE transport only accepts single-byte character sets. The request is to enhance the FILE transport to work with multi-byte character sets (and correct the error message suggesting the problem is with UTF-8 encoding).

After changing this parameter this error is gone:

Could not decode the incoming buffer. Expected UTF-8 encoding. Make sure your input is in UTF-8 Encoding.

But still it does not look like that the input connector is able to interprete the complete text file. You should test it, maybe it work for you.

Other workarounds might be using a different transport, like sending the date via TCP or http/POST.

Best,

Stefan

0 Kudos
ShengrongLiu
New Contributor II

Hi Stefan P.Jung,

Thanks for your prompt reply , I have tried what you said and set  'Is File Text' parameter  as false,  yes , it seems like that the input connector can read the file( the count of the input change 0 to 633) but the count of  geoevent service's in and out  is still 0,  just like what you said"But still it does not look like that the input connector is able to interprete the complete text file" and I found an warn(not error) in geoevent log like this:

com.esri.ges.fabric.core.ZKSerializerCannot unmarshal to class: class com.esri.ges.jaxb.stream.StreamWrapper

I will try other way , thank you again!

0 Kudos
RJSunderman
Esri Regular Contributor

Hello Shengrong Liu

Stefan P. Jung‌ is correct. There is an issue against the FILE inbound transport both to support multi-byte character sets and to provide a better error message than suggesting that the problem is with the UTF-8 character encoding. The suggested work around is to try having the inbound transport treat the file as non-text (setting 'Is File Text' to 'False') ... or using a different transport like HTTP.

You indicated that the input's event count increases when the file is read as non-text ... but the In/Out event counts of the GeoEvent Service do not change. I would try copying the input to create a new instance and then incorporating that new instance in a new GeoEvent Service with not processors or filters, routing the inbound file's content direct to an 'Write to a JSON File' output. What I'm looking for here is whether there is a problem between the existing input instance such that an underlying Kafka topic / consumer is not recognizing that an inbound connector has successfully received and adapted event data. A GeoEvent Service should at least consume data adapted by one of its inputs, so the 'In' event count should increase even if there's some other problem such that no event records actually pass 'Out' from the GeoEvent Service.

For your last question, on how an input's Expected Date Format parameter works and what a "date" will be converted to – I would say that when a GeoEvent Definition specifies that a field is a Date an inbound adapter will attempt to use the pattern in the Expected Date Format parameter to interpret a string to create a Date. In this context a Date is a Java primitive type like LongString, or Double.

You might want to take a look at a blog I published https://community.esri.com/community/gis/enterprise-gis/geoevent/blog/2019/03/14/what-time-is-it-wel...
A date can be represented several different ways.

  • As a verbose string ... "Thursday, August 22, 2019 3:30:00 AM (GMT)"
  • As an ISO 8601 value ... "2019-08-22T20:30:00-07:00"
  • As a long integer in epoch milliseconds ... 1566444600000

All three representations above are the exact same date/time. When using a 'Write to a CSV File' output the default is to represent the time as an ISO 8601 formatted string. The +/- 0:00 at the end of the string tells you how many hours the value has been offset from UTC. When adding or updating feature records in a geodatabase like the spatiotemporal big data store Date values are stored as epoch long integer values in milliseconds.

When you click a feature in a web map it is up to the client how to represent the date value. You should be aware that when an Expected Date Format pattern is specified it is likely that the date value will be considered a local date/time value. This also likely means that a web application, when it queries the date value from the geodatabase, will assume that it is receiving a UTC value and offset the value for you to a local date/time and represent it as a string. I mention this so that you will look for the possibility that a date you read from an input file is not representing as the expected date value once it is (a) written out to a feature record and (b) queried by a client and converted to a string for display in a pop-up dialog.

Hope this information is helpful –
RJ

0 Kudos