Web GIS DR tool Stopped Working

3793
6
Jump to solution
09-04-2018 02:17 PM
SamJehle1
New Contributor III

Hello,

I've been managing a highly available production and DR portal 10.5.1 environments for several months now, and as of 2 monthes ago the webgisdr tool stopped importing portal content into the DR portal environment. I've been unable to troubleshoot why. The resulting error from the import:

Failed to restore the Portal for ArcGIS.

Admin Url: https://alias.domain.com/webdadaptor.

{"error":{"code":500,"details":null,"message":"Failed to import site. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up"}}

After doing some more digging, the portal's postgres sql log shows the following:

FATAL:  terminating connection due to administrator command

LOG:  disconnection: session time: 15:51:16.671 user={adminaccount} database=gwdb host={serverIP} port=63277

LOG:  autovacuum launcher shutting down

LOG:  shutting down

LOG:  database system is shut down

--4minute break--

LOG:  database system was interrupted; last known up at 2018-07-08 06:00:52 PDT

LOG:  connection received: host=127.0.0.1 port=55163

FATAL:  the database system is starting up

LOG:  received fast shutdown request

LOG:  redo starts at F/97000028

LOG:  shutting down

LOG:  database system is shut down

After that point the portal admin page reverts back to Create new site or Join new site page. The data store, host server and another federated server environment all import correctly. The strange thing is, once I re-enter the PSA account the disaster recovery site loads up with the same portal site before the import started. Has anyone come across a similar issue? 

To give a little bit of a background, the portal environments are identical except for server names and IP addresses. All servers are running on the same OS (Windows Server 2016). The DR environment machines have had their host files modified to always redirect the web context URL to the VIP in their data center as well. They both use the same context URL allowing for the webgisdr tool to work in the past. I've tried a wide variety of things: re-establishing the PSA and retrying the import, reinstalling portal in the DR environment, reconfiguring the DR environment from scratch, importing a backup that previously worked and no luck.

I noticed the release of a portal security patch recently and that is one of my next steps. However I'm a little hesitant to move forward with applying the patch before identifying the current issue. I'm also trying to identify if any windows patches or ports are being blocked or are interfering with the backup import, but I wanted to put the question here before going down that rabbit hole.

Thanks!

0 Kudos
1 Solution

Accepted Solutions
JonathanQuinn
Esri Notable Contributor

I'm wondering if the HA settings are off. In the console of the DR tool or the logs, do you see messages about unregistering the standby Portal? If so, it unregisters successfully? Can you remove the standby manually and then run the restore? If this doesn't work, then it's likely not related to Portal HA, unless unregistering the standby doesn't leave the primary in a correct state.

View solution in original post

0 Kudos
6 Replies
JonathanQuinn
Esri Notable Contributor

Do you see any errors in the Event Viewer related to Postgres? There may be a few different ones, (make sure it's installed, something about a PID file, etc) but if PG can't start up for some reason, one error will be clear about the problem. The others can be ignored.

0 Kudos
SamJehle1
New Contributor III

Here are the event viewer errors from the last import attempt:

ERRORpg_ctl: the PID file "C:/arcgisportal/db/postmaster.pid" is empty09/04/2018 14:45
ERRORpg_ctl: PID file "C:/arcgisportal/db/postmaster.pid" does not exist09/04/2018 14:55
ERRORIs server running?09/04/2018 14:55
ERRORpg_ctl: could not send stop signal (PID: 11308): No such process09/04/2018 14:59

Here's portal logs during the same time. After the DR backup failed we went ahead and re-entered the PSA. Sorry, it's not the best format:

WARNING

HA: Error in testing connection to the master database at <computer FQDN>:7,654. null

2018-09-04T14:45:17,585
WARNING

HA: Master down, check if this node needs to be elevated to be the new master.

2018-09-04T14:45:17,585
SEVERE

HA: Error in HA plugin. No node in the standby.nodes property is running

2018-09-04T14:45:18,588
WARNING

The web server was found to be stopped. Re-starting it.

2018-09-04T14:46:22,821
WARNING

The web server was found to be stopped. Re-starting it.

2018-09-04T14:46:35,201
WARNING

Failed to rename the directory <DR content Directory>\arcgisportal\content.

2018-09-04T14:49:19,246
SEVERE

Failed to update log settings. java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up.

2018-09-04T14:55:32,508
SEVERE

Migration failed. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up

2018-09-04T14:55:32,509
SEVERE

com.esri.arcgis.portal.admin.core.site.migration.MigrationException: com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.portal.admin.core.site.migration.Migration.a(Migration.java:277) at com.esri.arcgis.portal.admin.core.site.migration.Migration.migrate(Migration.java:109) at com.esri.arcgis.portal.admin.core.site.backuprestore.BackupRestore.importSite(BackupRestore.java:819) at com.esri.arcgis.portal.admin.rest.site.SiteResource.importSite(SiteResource.java:566) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:910) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:858) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:812) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.esri.commons.web.AppFilter.doFilter(AppFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.esri.arcgis.portal.admin.rest.filters.AdminFilter.doFilter(AdminFilter.java:87) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:218) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at com.esri.arcgis.portal.util.TomcatValve.invoke(TomcatValve.java:43) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:442) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1083) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:640) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) Caused by: com.esri.arcgis.portal.admin.core.PortalException: com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.portal.admin.core.log.LogManager.updateLogConfigurationLocal(LogManager.java:181) at com.esri.arcgis.portal.admin.core.site.migration.Migration.a(Migration.java:164) ... 51 more Caused by: com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.portal.admin.core.store.GenericConfigStore.getLogConfig(GenericConfigStore.java:630) at com.esri.arcgis.portal.admin.core.log.LogManager.updateLogConfigurationLocal(LogManager.java:130) ... 52 more Caused by: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up at com.esri.arcgis.discovery.persistence.impl.DBConfigPersistence.b(DBConfigPersistence.java:494) at com.esri.arcgis.discovery.persistence.impl.DBConfigPersistence.get(DBConfigPersistence.java:288) at com.esri.arcgis.portal.admin.core.store.GenericConfigStore.a(GenericConfigStore.java:839) at com.esri.arcgis.portal.admin.core.store.GenericConfigStore.getLogConfig(GenericConfigStore.java:623) ... 53 more

2018-09-04T14:55:41,763
SEVERE

Failed to import site. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb FATAL: the database system is starting up

2018-09-04T14:55:41,764
SEVERE

Failed to update log settings. java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections..

2018-09-04T14:58:54,762
SEVERE

The process of creating a new site failed. Reverting site creation. com.esri.arcgis.portal.admin.core.PortalException: java.lang.Exception: Cannot connect to database: jdbc:postgresql://localhost:7654/gwdb Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

2018-09-04T14:58:54,762
0 Kudos
JonathanQuinn
Esri Notable Contributor

I'm wondering if the HA settings are off. In the console of the DR tool or the logs, do you see messages about unregistering the standby Portal? If so, it unregisters successfully? Can you remove the standby manually and then run the restore? If this doesn't work, then it's likely not related to Portal HA, unless unregistering the standby doesn't leave the primary in a correct state.

0 Kudos
SamJehle1
New Contributor III

The console reports that the standby portal machine is removed, usually taking around 2 minutes. After that step it almost immediately fails. However looking back through the logs I do notice that the import for portal started to fail when the primary portal switched to the second machine and the logs also indicate it fails to delete content from the portal it has supposedly removed from the site. I'll try the import without high availability later today.

0 Kudos
SamJehle1
New Contributor III

We were able to import the site successfully after dropping one of the portal machines! Going to do some additional troubleshooting to help identify the cause, but our DR environment is back up and running. Thank you for the help Jonathan!

0 Kudos
MichaelSchoelen
Occasional Contributor III

It looks like we are having the same exact issue, but opposite. Our portal is not high-availability, but the WEBGISDR tool is configuring it as HA. I've posted it in a new question:

https://community.esri.com/thread/223641-failed-to-restore-arcgis-portal 

0 Kudos