POST
|
Hello. We are a small, conservation non-profit with a single-machine 1071 enterprse configuration. In addition to the enterprise server, we have a web server and a dedicated data server that participate in the system. This configuration is our second run at an optimized design for our geospatial needs. We do not have an extensive amount of content in our system, we have a few web apps, less than 30 services (map image and hosted feature), and all our data is currently vector-based from FGDBs. Our webgisdr backup file is just a bit over 7GB. See included diagram below. We are trying to implement a disaster recovery mitigation strategy using the webgisdr utility to replicate our primary enterprise server to a secondary enterprise server. We have followed the instructions outlined in these two resources: + https://www.esri.com/arcgis-blog/products/arcgis-enterprise/administration/migrate-to-a-new-machine-in-arcgis-enterprise-two/ + https://enterprise.arcgis.com/en/portal/latest/administer/windows/overview-disaster-recovery-replication.htm#ESRI_SECTION1_C2CCF3D59BAE4F93AAF841EED439FE2F We want to automate the backup and restore, but so far, we have not yet been able to get consistent results with restoring to the secondary server. The restore takes over 9 hours using a local copy of the backup file and does not result in the secondary server's Portal being in a working state. We are hesitant to add a great deal more content or upgrade the system to include an imageserver until we know we can consistently backup and restore our environment. We are curious to hear what others are doing to backup, restore and test their enterprise systems? Is anyone using built-in VM tools such as replication to accomplish the task? Thanks in advance for any responses. Best, Dixie.
... View more
05-18-2020
09:10 AM
|
1
|
4
|
1596
|
POST
|
Hello again, Jonathan. In the hopefully, helpful feedback department, we conducted another restore test and this time we got a different set of messages in the webgisdr log (see images below). The failure to validate servers concerned us and we did see errors in the Server Manager logs, but the Data Store looked good after a describe. Initially we could not log into the Portal home, but after checking the status of the index and running a re-index from the Portal admin endpoint, we were able to log in and view content. We have not done an exhaustive test, but except for the Identity Store not being configured, everything looks good. I am curious to know if others are finding more consistent results. I also would like to know if others with a fairly simple, single-machine (VM) deployment such as ours are using any other tools, such as VM replication? I will start a separate discussion thread. Thanks for all your help!
... View more
05-18-2020
08:18 AM
|
0
|
3
|
738
|
POST
|
Wow! Thanks so much, Jonathan. It appears that running the reindex worked. The status returned an equal count for the users, groups and search (content?). I ran reindex in full mode and then content was available. So, since this was a test to restore the primary server, I need to try it again because I need a repeatable process that I can automate. Also, there are about a dozen dbxxxxxxxxxxxxx sub-directories in the arcgisportal directory. I will create another clone from the secondary's snapshot and begin again. I will make sure I create a new backup with the webgisdr utility and try to import it as soon as possible during off-hours. If the webgisdr import fails again, is it standard practice to try the portal endpoint export as I was advised by Support? Thanks again for your patience and all your responses. We really want to get a repeatable, reliable process to act as our backup/disaster recovery strategy. Best, Dixie.
... View more
05-12-2020
11:12 AM
|
0
|
5
|
738
|
POST
|
Hi, Jonathan. Yes, there are item ID folders in the content directory on the secondary machine. The amount is not the same, however. The secondary machine has three more items. How do I check the index to tell if it matches up? As for extracting the backup and inspecting the number of folders, the number of folders in the webgisdr full backup and the number in the portal admin export do match up, but they have roughly 300 fewer items.
... View more
05-09-2020
03:56 AM
|
0
|
7
|
738
|
POST
|
So, we have logged a support request and the analyst advised doing an export of the Portal site from the Portal's administrative REST endpoint. We did that and then imported the export into the secondary machine's Portal administrative REST endpoint. The process seems to have completed but there is no content. The Portal is accessible, members and groups are present, but no content. Does the absence of content indicate the process was not successful? Is there a way to hook up the content somehow?
... View more
05-07-2020
05:45 AM
|
0
|
9
|
738
|
POST
|
Hi, Jonathan. Yes, the path "Z:\arcgisportal\db" exists and it is a physical drive. Both the primary and secondary machines use the Z:\ drive for the software install and related directories.
... View more
05-05-2020
10:06 AM
|
0
|
0
|
738
|
POST
|
Here's what I found from the Event Viewer and postgres logs. The Event Viewer and webgisdr logs are in EDT. The postgres log is PDT. The directory referred to below by the Event Viewer does exist and the service account running the NT services has full NTFS permissions. Error in webgisdr log - 19:57:48 (This is EDT) Failed to start the database server. The startup timed out. Event viewer - 7:52:34 PM pg_ctl: server does not shut down Event viewer - 7:52:40 PM pg_ctl: another server might be running; trying to start server anyway Event viewer - 7:52:40 PM pg_ctl: could not start server Examine the log output. Event viewer - 7:57:41 MP pg_ctl: directory "Z:/arcgisportal/db" does not exist postgresql log 2020-04-30 16:58:44 PDT: [13396]: LOG: database system was interrupted; last known up at 2020-04-30 16:33:43 PDT 2020-04-30 16:58:44 PDT: [13396]: LOG: database system was not properly shut down; automatic recovery in progress 2020-04-30 16:58:44 PDT: [13396]: LOG: redo starts at 0/28027350
... View more
05-02-2020
03:52 AM
|
0
|
2
|
1489
|
POST
|
Hi, Jonathan. We are using 10.7.1. I will take a peak at the Event Viewer. The primary and secondary systems should be identical.
... View more
05-01-2020
01:59 PM
|
0
|
3
|
1489
|
POST
|
I will log a support request to find out what might be going wrong. I feel like I am missing something because it doesn't seem like it should be so difficult to get the restore completed. We have a relatively simplistic setup with a single machine and not a great deal of content in our Portal or being served from our Server.
... View more
05-01-2020
06:16 AM
|
0
|
10
|
738
|
POST
|
Tried the restore again, but this time copied the backup from the primary server to the secondary machine so that it was local. Updated the webgisdr.properties file and ran the import command again. The restore got further, but failed again. After over 8 hours, it reported that it could not start the database server. {"error":{"code":500,"details":null,"message":"Failed to import site. java.lang Exception: Failed to start the database server. The startup timed out. Please check the log file at Z:\\arcgisportal\\logs\\database\\pgsql.log."}} Failed to restore the Portal for ArcGIS. I checked the log noted and it only had the following lines: 1 file(s) copied. Several times. I have not changed the logging level in the logback.xml file within the webgisdr directory. Has anyone seen this behavior before? Anyone know how to resolve this so we can get a restore working on our secondary machine? Thanks in advance, Dixie.
... View more
04-30-2020
05:07 PM
|
0
|
5
|
1489
|
POST
|
After ensuring the primary hostname was part of the URL for launching the Web Adaptor configuration for both Portal and Server, I did successfully start a restore. It failed however after several hours. It completed the restore for Data Store, Server and then failed for Portal. The message was: Failed to restore the Portal for ArcGIS: Url: https://enterprise.domain.com/<portal wa>. {"error":{"code":500,"details":null,"message":"Failed to import site. Failed to delete the database directory."}} I saw from the Portal logs that it was looking for a path in the arcgisportal content directory that did not exist and there are several additional directories (5 of them) in the arcgisportal directory that begin with db and have a string of numbers after them, like db1588229730671. There is also a backedupContents20200429 directory under the arcgisdatastore directory. The Windows service account running Portal, Server and Data Store services has full NTFS permissions on both the arcgisportal and arcgisdatastore directories as set by the software install. That is not the case for the arcgisserver directory or the main software install directory. Just wondering if anyone has insight into this issue? Thanks in advance. Best, Dixie.
... View more
04-30-2020
03:10 AM
|
0
|
0
|
1489
|
POST
|
Hi, Jonathan. Thanks for your response. For the Portal Web Adaptor, the configuration page was launched using localhost/<wa>/webadaptor. After getting past the security challenges due to non HTTPs, it resolved to enterprise1.domain.com, but I forced it back to enterprise.domain.com which stuck. The configuration message for the Portal WA returned the appropriate hostname and URL for the Portal - https://enterprise.domain.com/<wa>/home. For the Server Web Adaptor, the configuration page launched using localhost/<wa>/webadaptor/server. I left it as localhost during the configuration. When the configuration was complete, it listed the correct server name as being configured - enterprise.domain.com and then reported that the URL to use to access the Services Directory was: https://localhost/<wa>/rest/services. Is this the problem, then? That I did not use the FQDN for the primary server in the URL for web adaptor configuration pages? I used it within the forms for ArcGIS Server URL and for Portal URL. Please see the uploaded image.
... View more
04-28-2020
12:41 PM
|
0
|
0
|
1489
|
POST
|
So, I have not been able to solve my issue, but I do think I may know where things are breaking and what is contributing to the problem. Maybe this information will lead to more information or ideas for a resolution from someone. I believe the problem lies with the configuration of the Data Store. During the preparation of the secondary machine (enterprise1.domain.com) the software installs and configuration of each component except for the Data Store (Portal, WA, Server, WA) work as expected and correctly pick up the hostname for the primary machine (enterprise.domain.com). I believe it is mainly working because the primary machine's domain name is either reflected in messages displayed during configuration or it is reflected in the browser URL. After the configuration of the Data Store, however, one of the properties (owning system URL) includes the secondary machine's hostname (https://enterprise1/<WA for Server>). It is blank before configuration and then it includes the secondary machine's hostname after configuration. I believe this may be why the restore is not working. I believe what is contributing to the problem is that the VMs we are using (enterprise1.domain.com and enterprise.domain.com) were set up using hostnames in all capital letters. Both the primary, secondary and a common box we use for centralized data access all have hostnames in all caps. We have tried unsuccessfully to change the hostnames (via the system property and Active Directory), but Windows believes the two strings (all lowercase and all uppercase) to be equivalent so it will not allow us to apply any changes. We have found a reference that states the ComputerName registry value could be changed, (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ComputerName\ActiveComputerName\ComputerName) but I am not sure about the downhill effects of this change. We would have to change both the primary and secondary machine names and potentially the data server. These changes would in effect be server name changes, right? So, it could break our primary Portal, right? I have verified that the test system I used to successfully install, configure and restore a webgisdr export had all lowercase server names and the Data Store owning system URL was configured on the secondary machine with the primary machine's hostname (https://enterprise.domain.com/<WA for Server>). I know case-sensitivity matters in some of the technologies that go into the different software components in Enterprise, so that is why I think the core of our problems are related to the server names. I am also, therefore, wary to change them. If anyone has any suggestions/clues please let me know. I really appreciate your time and attention. Best, Dixie. webgisdr utillity datastore #dr strategy
... View more
04-28-2020
08:32 AM
|
0
|
2
|
1489
|
POST
|
Hello. Hope all are staying safe and sane during this unusual time. I am having difficulty trying to implement the single-machine deployment approach for a DR strategy as described in this ArcGIS Blog: https://www.esri.com/arcgis-blog/products/arcgis-enterprise/administration/migrate-to-a-new-machine-in-arcgis-enterprise-two/ I tested this same strategy successfully last year, though it took me four attempts and some help from this community (@JQuinn-esristaff, specifically) to get it right. I am now at the final stage of implementing our new configuration using this strategy and find myself stuck again. I am having a similar problem with trying to restore a webgisdr full backup made from the primary machine to a secondary machine that has been installed and configured with 1071 AGS Enterprise software. Before installing any software on the secondary machine (hostname - enterprise1.domain.com | IP address -10.0.0.2), I edited the secondary machines etc/hosts file to include a reference to the primary machine's FQDN (enterprise.domain.com) while using the secondary machine's IP address (10.0.0.2), so that its IP address would resolve to enterprise.domain.com, as indicated in the blog. Where things break down is during the federation, but I am not sure. I suspect this because while I am logged into Portal (accessed successfully using enterprise.domain.com on the secondary machine), I get a login screen for Portal that references the secondary machine's hostname (enterprise1.domain.com). This happens after I have set the server for federation and saved it and before I set the same server as the hosting server (all using the enterprise.domain.com reference). Checking the Server security configuration within the Adminstration site, shows that the "portalUrl" uses enterprise1.domain.com (the secondary FQDN), rather than enterprise.domain.com (the primary FQDN). So, the "privatePortalUrl" and the "portalUrl" values are not the same. Hence, when I tried a restore I got the error that states the public Portal URLs are not the same and the restore fails. Can anyone point out what I may be doing wrong? Is there a required order to installing the software or for applying the SSL certs? I have had problems with getting the SSL certs done properly in the past, could this be the issue? We are using a wildcard SSL and I have configured both Portal and Server using the cert as an exported root (*.cer) and as an exported existing cert (*.pfx). I have also used the Portal's checkURL utility against the Server's admin URL and it returns a status code of 200, though it does return false for the "secured" value. Thanks for your time and attention. Thanks in advance. Best, Dixie.
... View more
04-19-2020
07:17 AM
|
0
|
28
|
4621
|
Title | Kudos | Posted |
---|---|---|
1 | 04-03-2020 06:03 AM | |
1 | 02-18-2019 09:40 AM | |
2 | 10-19-2020 07:57 AM | |
1 | 05-18-2020 09:10 AM | |
1 | 12-18-2019 01:09 PM |
Online Status |
Offline
|
Date Last Visited |
01-19-2021
02:06 PM
|