Hello,
I've been managing a highly available production and DR portal 10.5.1 environments for several months now, and as of 2 monthes ago the webgisdr tool stopped importing portal content into the DR portal environment. I've been unable to troubleshoot why. The resulting error from the import:
Failed to restore the Portal for ArcGIS.
Admin Url: https://alias.domain.com/webdadaptor.
{"error":{"code":500,"details":null,"message":"Failed to import site. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up"}}
After doing some more digging, the portal's postgres sql log shows the following:
FATAL: terminating connection due to administrator command
LOG: disconnection: session time: 15:51:16.671 user={adminaccount} database=gwdb host={serverIP} port=63277
LOG: autovacuum launcher shutting down
LOG: shutting down
LOG: database system is shut down
--4minute break--
LOG: database system was interrupted; last known up at 2018-07-08 06:00:52 PDT
LOG: connection received: host=127.0.0.1 port=55163
FATAL: the database system is starting up
LOG: received fast shutdown request
LOG: redo starts at F/97000028
LOG: shutting down
LOG: database system is shut down
After that point the portal admin page reverts back to Create new site or Join new site page. The data store, host server and another federated server environment all import correctly. The strange thing is, once I re-enter the PSA account the disaster recovery site loads up with the same portal site before the import started. Has anyone come across a similar issue?
To give a little bit of a background, the portal environments are identical except for server names and IP addresses. All servers are running on the same OS (Windows Server 2016). The DR environment machines have had their host files modified to always redirect the web context URL to the VIP in their data center as well. They both use the same context URL allowing for the webgisdr tool to work in the past. I've tried a wide variety of things: re-establishing the PSA and retrying the import, reinstalling portal in the DR environment, reconfiguring the DR environment from scratch, importing a backup that previously worked and no luck.
I noticed the release of a portal security patch recently and that is one of my next steps. However I'm a little hesitant to move forward with applying the patch before identifying the current issue. I'm also trying to identify if any windows patches or ports are being blocked or are interfering with the backup import, but I wanted to put the question here before going down that rabbit hole.
Thanks!
Solved! Go to Solution.
I'm wondering if the HA settings are off. In the console of the DR tool or the logs, do you see messages about unregistering the standby Portal? If so, it unregisters successfully? Can you remove the standby manually and then run the restore? If this doesn't work, then it's likely not related to Portal HA, unless unregistering the standby doesn't leave the primary in a correct state.
Do you see any errors in the Event Viewer related to Postgres? There may be a few different ones, (make sure it's installed, something about a PID file, etc) but if PG can't start up for some reason, one error will be clear about the problem. The others can be ignored.
Here are the event viewer errors from the last import attempt:
ERROR | pg_ctl: the PID file "C:/arcgisportal/db/postmaster.pid" is empty | 09/04/2018 14:45 |
ERROR | pg_ctl: PID file "C:/arcgisportal/db/postmaster.pid" does not exist | 09/04/2018 14:55 |
ERROR | Is server running? | 09/04/2018 14:55 |
ERROR | pg_ctl: could not send stop signal (PID: 11308): No such process | 09/04/2018 14:59 |
Here's portal logs during the same time. After the DR backup failed we went ahead and re-entered the PSA. Sorry, it's not the best format:
WARNING | HA: Error in testing connection to the master database at <computer FQDN>:7,654. null | 2018-09-04T14:45:17,585 |
WARNING | HA: Master down, check if this node needs to be elevated to be the new master. | 2018-09-04T14:45:17,585 |
SEVERE | HA: Error in HA plugin. No node in the standby.nodes property is running | 2018-09-04T14:45:18,588 |
WARNING | The web server was found to be stopped. Re-starting it. | 2018-09-04T14:46:22,821 |
WARNING | The web server was found to be stopped. Re-starting it. | 2018-09-04T14:46:35,201 |
WARNING | Failed to rename the directory <DR content Directory>\arcgisportal\content. | 2018-09-04T14:49:19,246 |
SEVERE | Failed to update log settings. java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up. | 2018-09-04T14:55:32,508 |
SEVERE | Migration failed. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up | 2018-09-04T14:55:32,509 |
SEVERE | com.esri.arcgis.portal.admin.core.site.migration.MigrationException: com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.portal.admin.core.site.migration.Migration.a(Migration.java:277) at com.esri.arcgis.portal.admin.core.site.migration.Migration.migrate(Migration.java:109) at com.esri.arcgis.portal.admin.core.site.backuprestore.BackupRestore.importSite(BackupRestore.java:819) at com.esri.arcgis.portal.admin.rest.site.SiteResource.importSite(SiteResource.java:566) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:910) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:858) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:812) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.esri.commons.web.AppFilter.doFilter(AppFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.esri.arcgis.portal.admin.rest.filters.AdminFilter.doFilter(AdminFilter.java:87) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:218) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at com.esri.arcgis.portal.util.TomcatValve.invoke(TomcatValve.java:43) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:442) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1083) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:640) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) Caused by: com.esri.arcgis.portal.admin.core.PortalException: com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.portal.admin.core.log.LogManager.updateLogConfigurationLocal(LogManager.java:181) at com.esri.arcgis.portal.admin.core.site.migration.Migration.a(Migration.java:164) ... 51 more Caused by: com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.portal.admin.core.store.GenericConfigStore.getLogConfig(GenericConfigStore.java:630) at com.esri.arcgis.portal.admin.core.log.LogManager.updateLogConfigurationLocal(LogManager.java:130) ... 52 more Caused by: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.discovery.persistence.impl.DBConfigPersistence.b(DBConfigPersistence.java:494) at com.esri.arcgis.discovery.persistence.impl.DBConfigPersistence.get(DBConfigPersistence.java:288) at com.esri.arcgis.portal.admin.core.store.GenericConfigStore.a(GenericConfigStore.java:839) at com.esri.arcgis.portal.admin.core.store.GenericConfigStore.getLogConfig(GenericConfigStore.java:623) ... 53 more | 2018-09-04T14:55:41,763 |
SEVERE | Failed to import site. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up | 2018-09-04T14:55:41,764 |
SEVERE | Failed to update log settings. java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.. | 2018-09-04T14:58:54,762 |
SEVERE | The process of creating a new site failed. Reverting site creation. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. | 2018-09-04T14:58:54,762 |
I'm wondering if the HA settings are off. In the console of the DR tool or the logs, do you see messages about unregistering the standby Portal? If so, it unregisters successfully? Can you remove the standby manually and then run the restore? If this doesn't work, then it's likely not related to Portal HA, unless unregistering the standby doesn't leave the primary in a correct state.
The console reports that the standby portal machine is removed, usually taking around 2 minutes. After that step it almost immediately fails. However looking back through the logs I do notice that the import for portal started to fail when the primary portal switched to the second machine and the logs also indicate it fails to delete content from the portal it has supposedly removed from the site. I'll try the import without high availability later today.
We were able to import the site successfully after dropping one of the portal machines! Going to do some additional troubleshooting to help identify the cause, but our DR environment is back up and running. Thank you for the help Jonathan!
It looks like we are having the same exact issue, but opposite. Our portal is not high-availability, but the WEBGISDR tool is configuring it as HA. I've posted it in a new question:
https://community.esri.com/thread/223641-failed-to-restore-arcgis-portal