Watching for new CSV Files

4143
1
05-13-2015 01:10 PM
BrianLocke
Occasional Contributor II

Watching a folder for CSV Files and occasionally having a few here and there not getting handled.  Also, in order for GEP to see the New file we have to rename it?  How we are getting the CSV file is we have written a service that will download the file and saves it with a GUID.

Any Ideas??

Tags (1)
0 Kudos
1 Reply
RJSunderman
Esri Regular Contributor

Hello Brian –

Yes, one of the limitations of the File transport used by the ‘Watch a Folder for New CSV Files’ inbound connector is that it does not monitor file size or the date/time a file was last modified. The inbound connector will cache the name of any file it has previously read and will not re-read the contents of a file with that filename. This is by design; the connector does this to prevent ingesting event data which has already been processed.

Edit:  23-Jan-2018

Behavior for the ‘Watch a Folder for New CSV Files’ inbound connector was changed at 10.5.1 to no longer require that a file’s name be changed for the input to consider it a new file. The mechanism watching the folder for new files still does not consider file properties such as changes to a file’s “last updated” timestamp or file size. However, if you want an input to re-read files you’ve placed in a folder, you can simply stop and restart your input connector and each file’s content will re-read with its content processed as newly received event records.

The work your service is doing to download event content as a system file, uniquely name that file using a GUID, and place it in a system folder GeoEvent is watching should be sufficient to work around this limitation.

We have an item open in our product backlog related to watching a system folder for JSON files in which files larger than about 5100 bytes are not ingested by the inbound connector watching the system folder. The root issue here may be similar to what you are observing - that some CSV files your service has copied to a system folder are not being picked-up by GeoEvent.

There are several problems inherent when working with file-based input. One example is that GeoEvent may attempt to acquire a handle to a file before an external process is finished writing the file to disk. Another example is related to the issue above; GeoEvent needs to read the contents of a file into memory in “chunks” and may encounter a problem emulating a data stream when retrieving blocks of event data from a system file.

Inputs which watch a system folder for files are generally intended to prove that event filtering and real-time analytics you have designed into a GeoEvent Service behave as you intend. They give you the ability to repeatedly send a small sample of event data to GeoEvent to test the behavior of your GeoEvent Service before transitioning to a production configuration in which real-time data feeds arrive via a stream (e.g. as an HTTP/POST from an external server, or as a reply to a query you make on an external server's URL).

Are you able to identify specific CSV file content which GeoEvent is not retrieving? Is GeoEvent ever able to successfully retrieve data from those files? Or is it only occasionally skipping a file, and if you were to stop and restart the ‘Watch a Folder for New CSV Files’ input it would read the file the “second” time?

If you can consistently reproduce the issue, please open an incident with Esri Technical Support. It would be helpful if you could provide samples of the data or help Tech Support reproduce the issue so that we can determine if you’re encountering a known limitation of the GeoEvent File transport or if there’s a bug we can address.

Hope this information helps –

RJ