I have a link using which I get a certain feed from a site. Now in that link I pass a date and it returns the data from that point forward. Now issue is that everytime the input connector runs it gets the complete data starting from the date in the query.
Is there a way using which i can only get the new/updated records only and not the whole deal? Like is it possible via GeoEvent input connector to check and retrieve only the new data, and not the full load ?
Hello Sardar –
The Poll an External Website for JSON input connector does indeed have a property ‘Receive New Data Only’ … but the property does not provide the behavior I think many folks are looking for.
When you make a request on a site’s endpoint, unless the site is configured to accept a parameter as part of your request specifying the data you want, the only way GeoEvent has to determine if the site has “new” data is to look for a specific header value in the headers returned in the response from the external server.
This “Last-Modified” header (described in Section 14 of RFC 2616 “Header Field Definitions” of the HTTP/1.1 protocol) tells GeoEvent to go ahead and parse the response from the server as there is new information in the response.
From the spec:
The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified.
Last-Modified = "Last-Modified" ":" HTTP-date
An example of its use is
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
The exact meaning of this header field depends on the implementation of the origin server and the nature of the original resource. For files, it may be just the file system last-modified time. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update time stamp of the record. For virtual objects, it may be the last time the internal state changed.
What "Last-Modified" does not do is key in on specific records which have changed. For example, a site serving information on a dozen different active wildfires might update a few of its wildfire event records and then indicate that update(s) have been made by setting its “Last-Modified” to date/time the updates were committed. When GeoEvent polls it will see that the "Last-Modified" date/time is later than the value it has cached and will ingest the event data returned from the site (all of the wildfire records in this case).
The Poll an ArcGIS Server for Features input, on the other hand, is a little more flexible. That input also has a property for polling only for feature records recently added or updated – but in this case GeoEvent is able to work with the Esri Feature Service to incorporate a WHERE clause in its request in order to receive only the specific records which satisfy the WHERE clause. This is not functionality provided by the input you’re using which is making a more general request on a less well-known external server.
It is difficult in the 10.3.x / 10.4.x releases to create a filter expression to compare an event’s date/time against another date/time value. You might be able to implement a work around by handling date/time value received from the external server as Long Integer values (in epoch milliseconds) rather than as Date values. I’ve included a couple of references below to discussion threads which mention this. Generally, however, using a GeoEvent Service to detect and discard “old” event data is not recommended. GeoEvent is assuming that all of the data either sent to an input via HTTP/POST or polled from an external server is “live” and should be processed as current real-time values.