I thought it might be beneficial to hear other people's (including Esri's) thoughts on the quality and stability of the code powering ArcGIS Enterprise. There are two main reasons I bring this up. One is my organization is still trying to recover the full functionality of our ArcGIS Enterprise site after upgrading from 10.6 to 10.6.1 about a month ago. This will be the third time we have upgraded Portal for ArcGIS, ArcGIS Server, and ArcGIS Data Store. Each upgrade has had issues (mostly with ArcGIS Server). Fortunately up until now, we have been able to fix whatever was broken during the upgrade.
The second reason is how the new versions of Insights for ArcGIS and ArcGIS Enterprise 10.6.1 have been released. ArcGIS Enterprise 10.6.1 was released in July, but a compatible version of Insights for ArcGIS was not release until this week. The only warning not to upgrade was a generic note in the ArcGIS Enterprise 10.6.1 help documentation to make sure you have compatible versions of Workforce and Insights before upgrading. While Workforce help documentation told which versions of Workforce were compatible with which versions of ArcGIS Enterprise up to 10.6.1, the Insights help documentation was not updated for 10.6.1. You had to find a post here on GeoNet to know for sure that Insights 2.3 was incompatible with Enterprise 10.6.1. Move on to the release of Insights 3.0 and now Esri has a warning on the download page to still hold off on upgrading because now ArcGIS Enterprise 10.6.1 will need to be patched for Insights 3.0 to work correctly. Unfortunately, the needed patch is not available yet. My problem with this whole scenario is not that the initial releases of ArcGIS Enterprise 10.6.1 and Insights 3.0 were not in sync, my problem is how poorly the situation was communicated to customers. An email, a blog post, or at least a tweet to let customers know that if you are using those products that you need to wait to upgrade would have been great.
Personally I would like to see a quality improvement release over a new features and functionality release, and an emphasis on ArcGIS Enterprise components being compatible with the current release of ArcGIS Enterprise. We have been using what is now called ArcGIS Enterprise for two years and are getting closer to moving most of our users off of ArcMap. So we now have lots of content and users on ArcGIS Enterprise, but now every upgrade makes me worry that it will cause us to have to rebuild from the ground up (well from backup at least).
I'm a product manager for ArcGIS Enterprise. The code that powers ArcGIS Enterprise is very stable and we always strive to deliver the highest quality possible. But clearly, something is going wrong here. When you upgrade ArcGIS Enterprise it sounds like you are using a manual approach. Have you considered making the move to one of the automation tools (Chef, Powershell, the cloud tools, or depending on the size and scale of your deployment: ArcGIS Enterprise Builder)? Making the move will require effort up front, but should simplify your upgrades later on.
But separate from that, maybe we can find what could be throwing off the upgrade process for you. Did you log any tech support cases as you came across issues?
Thank you for your response. Yes, I have logged about 4 tech support cases, although 2 of those seem to be the same underlying problem and is the problem that we have yet to fix. Long story short, we can no longer publish hosted feature services or run hosted analysis tools. Both issues error out with a fail to connect to server error when ArcGIS Server sends a request to Portal for ArcGIS over the admin URLs. Portal for ArcGIS seems to be able send requests to ArcGIS Server on the admin URLs just fine. Oddly enough, I can publish a traditional map/feature service to Portal for ArcGIS just fine.
I had one Esri support tech want to immediately unfederate and refederate our ArcGIS Server (hosting server) as the first troubleshooting step. Yeah, that didn't happen. The second Esri support tech is convinced that it is the certificates but we have checked those and they seem to be working fine. He wants me to assign our external CA signed certificate to the internal admin URLs for ArcGIS Server and Portal for ArcGIS, but the certificate isn't setup for those URLs.
In the debug logs I am seeing an "ARCGIS_PORTAL_TOKEN Authentication, Unable to validate token...<a namespace that I don't know if I should post here> Server machine 'https://<servername.domain.com>:7443/arcgis/sharing/rest/community/self' returned an error. 'Invalid token.'" when our ArcGIS Server (hosting server) tries to talk to our Portal for ArcGIS. I have also noticed that our tokenServicesUrl is now blank on the admin URL rest/info page, but it is correctly populated on the Web Adaptor rest/info page. I have looked for a config file to try and set the tokenServicesUrl for the admin URL but I cannot find anything in either the install directory or the config directory. I know that value used to be populated with the same generatetoken URL that is on the Web Adaptor rest/info page.
When I compare the web traffic with running a buffer on ArcGIS Online versus our ArcGIS Enterprise deployment I noticed that ArcGIS Online makes a generatetoken call from one server to another after clicking run. Our deployment makes no such call after clicking on run. Our ArcGIS Server will create a new service for the analysis results just fine but when it tries to pass the new service back to Portal that is when the error is thrown for 'Invalid token'.
Any insight on where I need to go next for troubleshooting would be greatly appreciated. Thankfully all of our existing content is working just fine.
Is your web adaptor or reverse proxy on a different domain than your Portal and Server? If so, that's likely the reason why the tokenServicesURL is blank. That shouldn't be an issue, though.
Does the error you posted continue to say "trying a portal token next" or something similar? If so, you can disregard that error.
Can you remote into the Server machine as the user running the Server service and reach the Portal Sharing API in a browser through https://portal.domain.com:7443/arcgis/sharing/rest? Invalid token errors and failed to connect errors are definitely different. Server would only be able to log an invalid token error if it was able to connect to the Portal to actually validate the token.
Any trust issues can be resolved by adding the certificate to the Trusted Root Domain Certificate store.
And yes, never unfederate.
1. We don't have a reverse proxy. Our web adaptors are running on the same domain as Portal and Server, however our external urls are different from our internal urls.
2. The error message does not say anything about "trying a portal token next" or similar. Portal and Server start cleaning up whatever they had already created for the task that was attempted.
3. I will try remoting into the Server machine and go to the Portal Sharing API as the service account later today.
4. Are you talking about adding the certificates to the Trusted Root Domain Certificate store as the service account?
Also, I noticed on the Server admin site that the Token Manager Configuration has the Type set to BUILTIN. Is that correct or should that be PORTAL like the user store and role store?
Thank you so much for your response.
1) Since the external URL is different from the internal URL, then that's why the tokenServicesURL is blank. That's won't affect your ability to publish, though.
2) Ok, there's a DEBUG message that is logged that may seem like it indicates a problem, but it can be ignored. It doesn't seem like you're seeing that error, though.
3) That would tell you whether you can get to the Sharing API, (what Server is trying to do). Have you configured a forward proxy for Server, either through the System Properties of Server or through the IE settings?
4) Yes. To go a bit deeper, if Server is using a self-signed certificate, Portal doesn't care about it, (as long as it's not a certificate mismatch). However, if Server is using a different certificate, then it must validate it. Let's say you use a wildcard certificate signed by your domain signing authority; Portal will look at the certificate, see it's associated with server.domain.com, but the CN is *.esri.com. It determines those don't match exactly, so it checks the root certificate. Since it doesn't trust your root signing authority, the request is blocked. You can simply tell Portal to trust the root certificate:
Server, on the other hand, does everything through the Trusted Root Domain Certificate store through the service account.
The BUILTIN setting within the tokens page is fine, that's what I see in my environment as well.
So I was finally able to remote onto the server as the account that is running the ArcGIS services. I added the certificates to the Trusted Root Domain Certificate store. I was able to reach the internal Portal Sharing API through that account from the ArcGIS Server machine without any problems. I tried both publishing a hosted feature service and performing a feature analysis but both things still failed the way they have been failing.
Both Server and Portal are using self-signed certificates for the internal urls and they continue to resolve correctly just like they always have.
We are not using a forward proxy nor do we have one configured in either IE or Server system properties. However, the support tech had me set the WebContextUrl for both Server and Portal, but that didn't change anything and I fail to see why we needed to set that property since we are not running a high availability environment or a reverse proxy. Would the WebContextUrl be causing additional issues.
Is there any way to "re-upgrade" ArcGIS Server without breaking the federation? At this point it seems to me that the upgrade process broke something internally on the ArcGIS Server side of things.
-- Edit 10/23/2018 14:25 --
Made an interesting discovery this afternoon. I signed into Esri Maps for Office in Excel 2016 for the first time since the upgrade and got a warning that I haven't seen before. "Functionality such as the following will not be available due to the configuration of your Portal for ArcGIS. Select Features by drawing a rectangle. Find Near Selection. Add Excel Data using the Address Location Type."
After a quick search on GeoNet I found a post where an Esri employee mentions this is due to Esri Maps for Office not being able to identify a hosting server configured in the Portal. That would actually make a lot of sense based on the fact that it is the creation of new hosted services that keeps erroring out. However, both the admin site for Portal and Server say that my only federated server's role is that of the hosting server. The config information for the hosting server matches up in both the Server and Portal admin sites. The hosting server also validates successfully on the Portal admin site.
Under the https://machine.domain.com/<context>/sharing/rest/portals/self page, scroll down until you see the supportsHostedServices property. Is that True or False? Pretty sure that's what every client will look at to determine if the Portal has a hosting server.