A site is having intermittent issues with Portal dropping out.
The specific problem relates to authentication, which is SAML2 based. Approximately once every 2 weeks they will go to the ‘sign in’ screen. Instead of it presenting the standard “sign in”, it just says unauthorized (in red text). At this point the Portal page goes into a loop of refreshing.
We have provided instructions on how to restart. (It’s slow and takes the whole ‘enterprise’ down for a while.)
My thoughts on securing a workable resolution to this are:
1) Use SCOM monitoring to detect when the word ‘Unauthorized’ is present on the sign in page, which suggest it’s down. This could run every minute or two. An email could be sent to the ‘relevant’ parties to advise of the outage.
2) Look to create some form of web service that can be called. This would close down the Portal Service, wait and then close down any running PostGres tasks. Portal can then be restarted.
This web service could potentially be called by SCOM, or an appointed person. It would potentially mean that the GIS Management Team could self-service this.
Alternatively, I’m thinking that a scheduled tasks could refresh the Portal each night so that it is more stable?
Here are the Particulars
Windows Server 2012 R2
Users are managed through ADFS. The login uses SAML 2 authentication.
The event appears to be random. The actual time period can be between 7 – 14 “calendar” days.
Has anyone found these issues at any other site, or if anyone knows of any tools that are already ‘out there’?