We have added portal counter for our high availability enterprise deployment. The test enterprise environment was recently upgraded to 10.9.1 and is able to collect result from the portal counter. However, the production environment is on version 10.8 and is unbale to collect result from the portal counter.
The site url is https://<machine_name.domain>:7443/arcgis
The token url is https://<machine_name.domain>:7443/arcgis/sharing/rest/generateToken
We also deleted the logs using both server and portal managers but still the counter is giving the same error.
UPDATE 1: We confirmed from our network department that our ArcGIS Monitor Server (Windows) is allowed to communicate with our high availability portal Servers (Linux) for ports 7443 and 22.
The sampling time for the portal connector is 15 minutes.
I also tried it with 1 hour, 5 minutes and 1 minute sample intervals, but the result is the same.
Is this connection to just monitor some part of Portal, or is this the second connection for the standby machine?
Within the rest of your ArcGIS Monitor implementation, how many connections to various components do you have? What is the amount of memory / RAM on your ArcGIS Monitor machine?
Yes the counters are set for both the primary and standby portal machines of the our HA deployment.
We have the following counter for the production environment
All are working except the portal counters.
We have 32 GB of RAM for the ArcGIS Monitor Windows Server.
When ArcGIS Monitor is connected to ArcGIS Server or Portal for ArcGIS, it is looking at the site as a whole not the individual nodes. I am going to make an educated guess that trying to get the standby node is be connected is failing because Monitor isn't able to fully reach that "site". ArcGIS Server runs in an active / active capacity, which is why you can reach both nodes, but they will return the same information. Portal runs in an active / passive mode, but only some components are passive which is why I believe you're getting the timed out error.
If you want to check if the individual nodes are healthy, I recommend creating an HTTP counter and pointing it at the health check URL for each node.
The guess you made is a correct one. However, the same counters are working on our updated Test environment without an issue. Does it have to do with the version of ArcGIS Enterprise.
Can we use the Site URL, something like https://<domain-name >/enterprise/sharing/rest/generateToken without the port to configure our counter?
Also, recently the ArcSOC Optimizer tasks that were previously working fine, started giving the following error
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
Any idea as to what is not working is highly appreciated.
Maybe its not the same error, but I have the ETIMEDOUT message in portal.
Since it is an HA environment, reference must be made to the web adaptor, which is in charge of tracking the two portal servers.
Site URL: https://wa.domain.local/wa
Token URL: https://wa.domain.local/wa/sharing/rest/generatetoken
I saw the same problem in an HA environment in 10.9.1.
I looked up the error, and in ArcGIS Server looks that have the same; so I decided to delete the portal logs, then change the mode to severe and retain only one day. With that its possible perform the portal validation normally.
If the ArcGIS Server test results in an ETIMEDOUT error, the ArcGIS Server log file is too large to read before the request times out. Back up and delete the ArcGIS Server logs using ArcGIS Server Manager to resolve the problem. The log level and log contents of your ArcGIS Server should be checked for errors to determine why the log file is so large.