I sent the following to one of our contractors today. The information on configuring SSL certificates, administrative tips for multi-machine deployments following a 'site' model, and things to check when GeoEvent Server fails to load its ArcGIS Server's configured certificates and instead uses its own SelfSignedCertificate might be of more general use, so I'll leave this here in case it helps someone working with GeoEvent Server deployments.
With a multi-machine ‘site’ configuration it is critical that all machines trust one another. That means that not only do I have to configure an SSL certificate on Box#1 and configure that machine’s ArcGIS Server to use that certificate as its Web Server Certificate … I have to import certificates for Box#2, Box#3, … Box#N into the ArcGIS Server so that it trusts all the other machines participating in the site. I have to do this “fan-out” on every server, setting *that* server’s Web Server Certificate and importing certificates from all the *other* machines onto that server.
I’ve captured what I do that works for me when setting up a couple of machines. But to be honest, SSL certificate configuration is not something I understand at a deep, technical level. Likely there is a “better” way of doing what I propose in the attached, maybe using a wild-card certificate, but I don’t know how to set that up.
I’d also like to break the problem you’re seeing into two pieces. The first being SSL certificate configuration, for which I’ll capture some screenshots (see attached PDF). The second piece involves things I look at when GeoEvent Server seems unable to locate and load the certificates its ArcGIS Server is configured to use.
The second part probably has more to do with why GeoEvent Server completes a fail over to use its SelfSignedCertificate rather than the certificate its ArcGIS Server is configured to use. I’ll apologize if anything I share is overly pedestrian … like I said, SSL certificates are not my cup of tea, so all I can do is show you what works for me and hope that your experience will allow you to iterate and adapt what I have to share.
The first part, SSL certificate configuration, is attached.
For the second part … I would caution against opening the Java Keystore using a command like keytool. I’ve watched developer’s do this, but I’ve never seen that administratively editing the JKS do anything to resolve a problem. GeoEvent Server, when it launches for the first time, interrogates its ArcGIS Server for information on its site and SSL certificates. If you would like to see some evidence for this, you can request DEBUG logging on the com.esri.ges.security.arcgis.sslconfig GeoEvent Server logger component. GeoEvent Server will attempt to copy the certificate configuration of the ArcGIS Server is it running beneath. If GeoEvent Server cannot obtain the certificates from the ArcGIS Server configuration, it will fail over to use its own SelfSignedCertificate. The fail over is intended to at least allow GeoEvent Server to complete its startup – but if GeoEvent Server does not trust machines the same way as its ArcGIS Server does, lots of stuff is probably not going to work.
By the way, it is precisely because GeoEvent Server interrogates its ArcGIS Server for information that it is best to have your ArcGIS Enterprise (Portal for ArcGIS, hosting ArcGIS Server, ArcGIS Data Store) fully configured with a site created, federated and all SSL certificates configured before you introduce GeoEvent Server to the Enterprise. Installing – or at least starting the GeoEvent Gateway and GeoEvent Server – before ArcGIS Server and Portal for ArcGIS are fully configured means that the initial interrogation fails. Security topology may change … you may later decide to federate for example, or SSL certificates have to change … in which case resetting your GeoEvent Server configuration from within GeoEvent Manager (e.g. not an “administrative reset”) should force GeoEvent to pick-up changes made to the Enterprise configuration. Worst case you have to stop and restart GeoEvent Server after resetting its configuration then import your inputs, outputs, …etc. You don’t always have to re-install, but installation order can make your life easier administratively when deploying all this s/w for the first time.
There are a few things I check when I find that GeoEvent Server is using its own SelfSignedCertificate rather than the certificate its ArcGIS Server specifies as its Web server SSL certificate.
- Did I accurately follow the certificate configuration laid out in the attached PDF?
Sometimes a machine gets re-imaged, or a something else invalidates a certificate I had previously generated, applied, and imported using the attached procedure. That is when I have to walk through that whole process again. Sometimes it is just that a certificate has expired. They do that, and rarely when it’s convenient.
- ArcGIS Server maintains two different certificate stores – do their contents match?
- C:\Program Files\ArcGIS\Server\framework\etc\certificates
The two certificate stores should be identical. I’ve found once or twice that files had not been copied from the Server framework into its configuration store. When this happened I had to stop ArcGIS Server, manually create the folder named for the machine (e.g. CARMON.ESRI.COM beneath …\config-store\machines) and copy the files from the framework into the configuration store folder. When I restarted ArcGIS Server and administratively reset GeoEvent Server, it adopted its Server’s certificates and began working as expected.
- ArcGIS Server maintains both JSON and XML copies of its SSL configuration – do they match?
When debugging we’ve found a couple of times that the SSL configuration reported by ArcGIS Server by its Admin API did not match an XML file’s content that GeoEvent Server was using to retrieve certificate information. Specifically a file D:\arcgisserver\config-store\machines\10.0.0.131.json specified a webServerCertificateAlias which did not match what should have been the same information in a C:\Program Files\ArcGIS\Server\framework\etc\machine-config.xml file.
When this happens you might try stopping GeoEvent Server (and GeoEvent Gateway) and reconfiguring the ArcGIS Server’s certificates. If the files match after ArcGIS Server completes a restart, then you can administratively reset GeoEvent Server and it should pick-up the correct certificate configuration.
- Does the GeoEvent Gateway have its correct hostname / IP Address in its com.esri.ges.gateway.cfg file?
Part of the GeoEvent Server administrative reset is to delete this file and make sure that it gets regenerated automatically when GeoEvent Gateway (or maybe its when GeoEvent Server) comes up for the “first” time.
If you look at the file’s content in a text editor you’ll see that it instructs the Gateway as to which server and port it should use for connecting to the Zookeeper distributed configuration store which manages your GeoEvent Server’s configuration. It also specifies the Apache Kafka topic partitions, replication and how to reach the broker. If the machine information in this file designates a machine which does not exist – like when you use cloud image utilities to push a machine image out to multiple virtual machine instances – when GeoEvent Gateway launches it never reaches a stable state and cannot support its GeoEvent Server.
The procedures to administratively reset GeoEvent Server are in a blog: Administratively Reset GeoEvent Server
You can follow the procedures for 10.6.x as they will be the same for 10.7.x and 10.8 deployments. These are the steps, by the way that you have to run on each server when following a multi-machine deployment with a ‘site’ configuration and one of the machines drops out of the configuration and does not automatically re-integrate.
Resetting a multi-machine ‘site’ configuration is both tedious and error prone. You basically have to work as if you’re installing all of the s/w for the first time:
- Install ArcGIS Server, create site, configure certificates, install GeoEvent Server
- Install ArcGIS Server, join site, configure certificates, install GeoEvent Server
- Install ArcGIS Server, join site, configure certificates, install GeoEvent Server (lather, rinse, repeat)
When you already have an ArcGIS Server site with, say, three machines things get messy. I think what you do is use ArcGIS Server Manger to ‘STOP’ two of the machines – you’ll want to stop GeoEvent Gateway and GeoEvent Server on those machines first. The idea is that as far as the ArcGIS Server site is concerned it only has one machine. Complete the admin reset for GeoEvent Server on that machine then start its Gateway, wait a couple minutes, then start its GeoEvent Server.
Then, back in ArcGIS Server Manger to ‘START’ a second machine. The site now thinks it has two machines, only one of which is running GeoEvent Server. Complete the admin reset for GeoEvent Server on the second machine then start its Gateway, wait a couple minutes, then start its GeoEvent Server. As the GeoEvent Gateway and GeoEvent Server come up they’ll discover and coordinate with the running GeoEvent Server, through the AGS site, and work out among themselves how to balance the kafka topics and brokers.
Finally, in ArcGIS Server Manager, ‘START’ the third machine. The site now thinks it has thee machines, only two of which are running GeoEvent Server. Complete the admin reset for GeoEvent Server on the third machine then start its Gateway, wait a couple minutes, then start its GeoEvent Server. As the GeoEvent Gateway and GeoEvent Server come up on this final machine they’ll integrate with the other two.
If you try to bring all three machines on-line at the same time and they were not properly integrated / balanced when they were taken down … they’ll likely not integrate correctly with one another. You have to stage their startup so that the ArcGIS Server site never has more than one machine ‘STARTED’ which does not have a fully initialized and integrated GeoEvent Server. When two or more GeoEvent Server’s try to integrate at the same time things tend to fail. It is precisely this sort of fragility, and the fact that it is so administratively difficult to determine if the machines were not properly integrated / balanced in the first place, that I feel a ‘site’ configuration really doesn’t provide the resiliency it was designed to provide. Sure, when everything is working it works beautifully. But when a machine falls out of configuration … getting the ‘site’ back to nominal is difficult (to say the least).
Hope this information is helpful –