Frequent "The database server was found to be stopped. Re-starting it."

2518
15
03-05-2019 11:28 AM
Highlighted
New Contributor II

In our highly available ArcGIS Enterprise, we are seeing the error message below, logged every 5-10 minutes. As a result, we are seeing portal performance issues.

The database server was found to be stopped. Re-starting it.

When investigating the Portal DB framework logs, we found this error message logged at the same frequency:

The process cannot access the file because it is being used by another process

Our question is: what file is the database trying to access? Is there a resolution to this error.

database stopped

15 Replies
Highlighted
Esri Frequent Contributor

If you stop the Portal service, do all Portal processes (ArcGISPortal.exe, javaw.exe, java.exe and postgresql.exe), go away? There may be an orphaned process that results in the errors in the db logs. We check to see whether the database is running, and since it can't start due to a potentially orphaned process, you'll see the errors logs in the Portal logs.

Highlighted
New Contributor II

It looks like all Portal processes do go away--some javaw processes remain for the Server and Datastore elements. 

We've also rebooted the server, and that solves the issue for a few days. Then it suddenly starts again, like some other process is triggering the problem.

Is is possible for Server or Datastore to be conflicting with the Portal processes?

Reply
0 Kudos
Highlighted
Esri Frequent Contributor

There's no problem with having Server and Data Store on the same machine as Portal.

Are you seeing this on primary or standby? Do you see anything in the Event Viewer?

Highlighted
New Contributor II

99% of the time, this message is appearing on the primary machine, but every once in a while, it pop up on the standby. 

Event viewer is showing this error message every time the log entry appears in Portaladmin:

The description for Event ID 0 from source PostgreSQL cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

pg_ctl: another server might be running; trying to start server anyway

Would a reinstallation of portal resolve the issue? Or does it seem like the conflict is a different piece of software?

Reply
0 Kudos
Highlighted
New Contributor III

Hi Michael,

I started my own post, "Warning Message in the Portal Logs", before finding yours. Have you found a resolution to this problem? I am encountering the same issue. I restarted the Portal service. The warning messages are not as frequent but they still show up in the log file. Prior to restarting the service, the messages were showing up in the logs dozens of times throughout the day. Now it is only a few times a day.  

We have the Portal site installed on one server and the Data Store installed to another server. In the Task Manager for the server where the Portal site is installed, I see PostgresSQL Server listed 15 times under the Processes tab. Do you know if this is normal behavior? No one is using our Portal site. I am in the process of federating but before I do I would like to get this warning message resolved.

Reply
0 Kudos
Highlighted
Occasional Contributor III

PostgreSQL listed numerous times is normal behavior, assuming that Portal is the one spinning up those instances. It likes to keep individual task separate, then terminate the instance once it is finished.

Reply
0 Kudos
Highlighted
Esri Contributor

Hi all,

In our Portal there is also a frequent logging of this error, every 2, 3 or 4 hours during a day. We upgraded to 10.7, but these errors ocurred even when we were at 10.5.1 and 10.6. We have a dedicated machine for Portal, junst one machine, no HA, and at the time of these errors in the portal logs, in the event viewer two events are logged from PostgreSQL:

pg_ctl: another server might be running; trying to start server anyway

and after this one

pg_ctl: could not start server
Examine the log output.

PostgreSQL is running and listening on port 7654.

the db.log has the same lines as described by Michael.

So, overall, looks like the same behavior, regardless of HA, or other products installed on the same machine, data store listens on 9876.

Someone figured this out?

Highlighted
Occasional Contributor III

We were able to resolve this issue. It was caused by (what we believe) was a framework-level conflict with a security monitoring agent--in our case an outdated version of Carbon Black Sensor. Updating the sensor software resolved the issue. Removing it entirely worked too, obviously. 

I would recommend strategically removing (simply disabling didn't do the trick) security agents in a test environment to see what makes it tick. I suspect that Carbon Black was also using some form of Postgres as its internal database, causing a conflict.

Highlighted
Esteemed Contributor

Has carbon black software caused any other issues with GIS at your org?  My org is installing this software on all servers to prevent/alert us to malicious attacks, so any feedback on impacts of carbon black would be greatly appreciated.