Data lost when using add a feature

Discussion created by MBramer-esristaff Employee on Aug 16, 2013
Latest reply on May 6, 2014 by rsunderman-esristaff

A colleague and I are seeing behavior where messages seem to be getting "dropped" when using the "Add a feature" output.  Whether or not this is what's really going on is unknown, as we don't know the specific location of the "droppage" (GEP not sending all data to AGS? AGS not committing all data sent to it? SQL Server?, etc).  We do not see any errors in either GEP's logs or ArcGIS Server's logs.  Here are details:

We're using a TCP/text input to receive csv data.  The geometry in the messages is POLYGON data in json format.  We have one geoevent service with this input that flows through a filter to only allow one message type through, then outputs to a feature service.  We have another service that's identical, but has a processor after the filter, before the output.  The feature service is on an ArcGIS Server instance on the same machine, and is pointing to a feature layer hosted in SQL Server 2008 R2 (64-bit) on a different machine.  The layer is POLYGON, not point - very important to note this (as I suspect 99.99% of GEP instances are working with points).

We are using the simulator packaged with GEP to send the same record to GEP over and over, just to test throughput.  The message definition is pretty ordinary - about 20 fields or so, all string except for a date and geometry field.  One of the string fields is tagged with TrackID, the date field is tagged with TIME_END and TIME_START, and the geometry field as GEOMETRY.

For each test, we truncated the SQL Server table, reset statistics in GEP Manager's Monitor page, and started the simulator(s).  In the table below, I list how many simulators were used, and at what rate messages were sent ("Simulator Settings"), the data flow ("GEP Settings"), how many features Monitor displayed as processed ("GEP Monitor"), number of records written to the database after the simulators were stopped ("DB Records") and the number of messages not successfully written to the database table ("LOSS"... the difference between GEP Monitor and DB Records).  Incidentally, Monitor reported the same number of messages processed for the TCP Input, the GE Service, and the feature service output.  So the number in the column for "GEP Monitor" is the same for input count, service in/out, and output count as displayed in Monitor.

Here are the specs and results on each test:

Simulator SettingsGEP SettingsGEP MonitorDB recordsLOSS
2 simulators, 5 msgs/sectcp->filter->processor->fs output 1 sec insert2070 features20655
2 simulators, 5 msgs/sectcp->filter->processor->fs output 1 sec insert2690 features267812
2 simulators, 5 msgs/sectcp->filter->processor->fs output 1 sec insert339533869
2 simulators, 5 msgs/sectcp->filter->processor->fs output 3 sec insert212521232
2 simulators, 5 msgs/sectcp->filter->processor->fs output 3 sec insert265526514
2 simulators, 5 msgs/sectcp->filter->processor->fs output 3 sec insert347534696
2 simulators, 5 msgs/sectcp->filter->processor->fs output 6 sec insert204520405
2 simulators, 25 msgs/sectcp->filter->fs output 6 sec insert222522205
2 simulators, 5 msgs/sectcp->filter->fs output 6 sec insert269026882
2 simulators, 25 msgs/sectcp->filter->fs output 12 sec insert2250223515
2 simulators, 25 msgs/sectcp->filter->fs output 12 sec insert242524196
2 simulators, 25 msgs/sectcp->filter->fs output 3 sec insert230022937
1 sumulator, 25 msgs/sectcp->filter->fs output 1 sec insert2075206510

Based on these numbers, there's no real pattern easily noticeable.  One might deduce that a 3 to 6 second feature service output update may be the best but a) we really can't definitively say this but more importantly, b) we're still losing records in these scenarios.  As far as just GEP is concerned, based on what Monitor is reporting, GEP has received, processed, and output everything just fine. 


Is there a definitive way to confirm GEP is successfully receiving/processing/outputting everything, other than what Monitor is reporting? 
Or is Monitor a very trustworthy tool for determining that GEP has received/processed/output everything sent to it? 
What ways exist to track down where the message "dropping" is occurring (especially given there are no errors in GEP or AGS logs)?
Is it possible that GEP is successfully passing on everything, but ArcGIS Server is not successfully writing all records?
Could this be a SQL Server thing?
What other places in the entire cosmos of this data flow (not just within GEP) are potential suspects?
Is there any flaw in our methodology that could explain the discrepancy between what GEP reports and what's written to the DB?