Geoevent Processor 10.4 Will Not Start

4116
5
05-12-2016 11:26 AM
IgorBalotsky1
Regular Contributor

I just installed a fresh instance of ArcGIS Server 10.4 and joined it to the cluster. After this, I installed Geoevent Processor 10.4 on the same instance and when I do to log in, I get a "No service was found" message on the manager login and REST pages. I checked the license and it's fine. I also checked the logs and here is what the errors are:

| ERROR | rint Extender: 1 | ConnectionState                  | 198 - curator-client - 2.8.0 | Connection timed out for connection string (GISVM01.MYCOMPANY.COM:2181) and timeout (10000) / elapsed (119714) org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

| ERROR | rint Extender: 1 | ZKPersistenceUtility             | 245 - com.esri.ges.persistence.zookeeper.zk-persistenceutility - 10.4.0 | KeeperErrorCode = ConnectionLoss for /geoevent/config/keys/ageskey.jks

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /geoevent/config/keys/ageskey.jks

| INFO  | rint Extender: 2 | LicenseChecker                   | 19 - com.esri.ges.framework.i18n.i18n-support - 10.4.0 | License status: GeoEventProcessor: licenseStatusSuccess.

Any ideas? 10.3.1 worked fine until I tried to upgrade.

Thanks,

Igor

5 Replies
IrfanClemson
Frequent Contributor

I too have this kind of problem in GEP 10.3.1 in a cluster environment. Basically, when one machine is shut down and restarted then I see errors like 'No service found' in GEP Manager and in the log file such entries:

m.esri.ges.datastore.agsconnection.AbstractStreamServiceClient Bad response status 400 Bad Request java.io.IOException: Bad response status 400 Bad Request at com.esri.ges.datastore.agsconnection.AbstractStreamServiceClient.getConnection(AbstractStreamServiceClient.java:359)[241:com.esri.ges.framework.datastore.agsconnection-datastore:10.3.1] at com.esri.ges.datastore.agsconnection.AbstractStreamServiceClient.setup(AbstractStreamServiceClient.java:311)[241:com.esri.ges.framework.datastore.agsconnection-datastore:10.3.1] at com.esri.ges.manager.datastore.agsconnection.AGSConnectionStatusListenerStreamServiceClient.setup(AGSConnectionStatusListenerStreamServiceClient.java:30)[344:com.esri.ges.manager.agsconnectionmanager-api:10.3.1] at com.esri.ges.datastore.agsconnection.AbstractStreamServiceClient$1.run(AbstractStreamServiceClient.java:159)[241:com.esri.ges.framework.datastore.agsconnection-datastore:10.3.1] Sep 8, 2016, 10:27:56 PM INFO
com.esri.ges.datastore.agsconnection.AbstractStreamServiceClient During initialization, an unexpected error has occurred. Cannot communication with the Stream Service 'issmon1f'. Error: 'Bad response status 400 Bad Request'. Sep 8, 2016, 10:27:56 PM ERROR

-------------------------
connection timed out for connection string (XXXXXX:2181) and timeout (30000) / elapsed (42961) org.apache.curator.CuratorConnectionLossException:

------------------

The only 'workaround' for me is to remove the 'WebSocketContextURL' entry in the 'Admin' part of AGS; that gets the GEP publishing to Stream Server going again. Then, when I re-enter the 'WebSocketContextURL' entry, everything runs fine--until the next shut down or reboot.

Either I am missing some 'Best Practice' in case of GEP Clusters or there is some bug?

Hello ESRI engineers?

AlexanderBrown5
Frequent Contributor

Meengla,

As Mark Bramer points out in this post:

New GeoEvent Setup 

"GeoEvent clustering has not worked out like we hoped and we no longer recommend using it.  If you have a throughput rate greater than what can be handled by a single real-time server, it is certainly possible to set up multiple real-time servers to handle throughput, but you need to really think about things like how to split up your incoming messages, how to handle any stateful scenarios like 'enter' or 'exit' geofence operations or monitoring ongoing incidents, and how to handle event outputs appropriately (i.e. can you afford outputting messages in a non-chronological order?  ...not necessarily an issue if they're time-stamped)?"

We recommend to stand up a dedicated ArcGIS Server box with GeoEvent for real-time deployments.  We don't recommend attaching GeoEvent to part of an ArcGIS Server cluster.

~Alex

IrfanClemson
Frequent Contributor

Alex,

Thanks for the info. I am kind of relieved that clustering is not being recommended at this time--it's a bit complicated and not living up to the hopes--per that link you provided.

However, I still don't know why would restarting any of the machines in the cluster cause the '400 Bad Request' error. I kind of have an idea as to why--here is some info: 

I have successfully created a three ArcGIS Server (10.3.1) cluster; all these run GEP (10.3.1 with a patch installed). The public facing machine is a separate machine running IIS with Application Routing (ARR) doing the role of not only the standard reverse proxy but also a reverse proxy for WebSocket--I don't think one has to have nGinx running in case of Windows. The ArcGIS has the WebSocketContextURL pointing to the Reverse Proxy.

All that works as expected, I think! But, upon any of these ArcGIS Server reboots, I see these these '400 Bad Request' in the GEP log files. The Clustering diagnostic utility for RabbitMQ doesn't find any problem even after reboots. @Here is the error:

com.esri.ges.datastore.agsconnection.AbstractStreamServiceClient During initialization, an unexpected error has occurred. Cannot communication with the Stream Service 'issmon1f'. Error: 'Bad response status 400 Bad Request'. Sep 8, 2016, 10:27:56 PM ERROR

Anyway, I am about giving up on GEP clustering but had spent so much time on that..I am curious to know as to what is happening? I feel like I was almost there: Failover, High Throughput in an ArcGIS Cluster running GEP. 

Thanks.

PS. rsunderman-esristaff

0 Kudos
Mtclimber03
Regular Contributor

Due to these same issues I had to opt from using a stream service with GeoEvents to having GeoEvents receive and add features to a feature service. Then, I used the ArcToolBox "Create SQL Query Layer" to do my select for max(DATE) by unique ID and publish that to a separate service. This workflow allowed me to keep my AGS machine from being resource depleted and also get the "live points" in a map service to be consumed online. This was not the ideal workflow I was hoping for but it allowed me to accomplish what was needed without standing up a seperate machine just for running GeoEvents that could handle a stream service.

MarcGraham1
Frequent Contributor

Irfan Tak‌ thanks for that above note about removing and readding the WebSocketContextURL, I had the same problem and this helped me!