The below issue was caused by a change in domain controller for the Portal's IWA identity store config.
"Wal files size exceeded the maximum size"
These errors occur in the portal logs every minute and began to occur at the same timestamp as the IWA config change. No other change occured on the Portal machine at this time.
Despite these errors, the ArcGIS Enterprise (10.6) is functioning as normal and the underlying postgres db seems to be behaving fine. The walarchive is consistently increasing as per the 60 minute archive setting in postgresql.conf. These archive dumps are all 16,384 KB and have modified dates AFTER the above change and errors began to occur.
Therefore, I do not believe anyting is actually going wrong and impacting users, but there is clearly a config/setting somewhere which is tied to the old domain controller. Any thoughts as to where I can investigate further? The underlying postgres logs indicate no issues in this space.
FYI, we do not use webgis DR and do not plan on using this as a solution moving forward due to the scale of our deployment (some incremental backups took 7+ hours to complete).
The reason why the "Wal files size exceeded the maximum size" error is getting written to the logs is because the total file sizes of the walarchive logs under the C:\arcgisportal\backups\walarchive folder, (or wherever your default content directory was set to during the install) is more than 5 GB.
Before you run the DR tool, the total size is capped at 5 GB and the logs rollover. After you run the DR tool, the limit is removed so they can grow unbounded. If you run the DR tool again, then it deletes all walarchive logs in the directory.
This doesn't have anything to do with IWA/authentication settings.
Our walarchive is actually well over 30 GB now and slowly increasing per the postgres 1 hour cutoff of the transaction log and/or user activity. We did run the webgisdr tool consistently in the past (last run ~June) , which would be why the walarchive is currently able to extend beyond 5 GB. From our experience of postgres, everything is operating as normal other than this error being raised in the Portal logs.
Just to clarify, this is a production system that has been in use for some years. The only change made during our regular maintenance was to tweak the IWA config. These erros began at the exact timestamp this change occured, hence our confusion as we also agree that this has nothing to do with the underlying portal db, however it is quite the coincidence.
The only reason why we are creating the walarchive logs is to support incremental backups. If you don't play on using them, then you can either run a full backup with the DR tool to clear them out, or delete them manually. If you take a look at the time stamp of the IWA change, and then look at the walarchive logs and see that before that time stamp, the logs were greater than 5 GB, then something could have been stuck in the cache. It's an internal flag that's set that says "start logging the space problem", essentially, so if that flag isn't updated, then perhaps it was flipped when your IWA change was made. Just a guess, though.
We are also having the same issue. We followed the recommended steps and took a backup using WebGISDR Utility. After successful backup, we are still getting warning but only from the stand-by portal (we have high availability ArcGIS Enterprise set-up). Earlier, the warning was coming for both servers but now it's only coming for the current standby portal machine. Do we have to follow an additional step for high availability portal set-up?