Solved! Go to Solution.
In 2017 this is still a problem. It's a disgrace that an application in this day and age cannot recover gracefully from a brief database/network outage.
It's completely insane that the software doesn't even provide an accurate, user-friendly, human-readable error message for this situation.
Shame, ESRI, shame!
Please understand that you are saying the equivalent of, "It's a disgrace that a painter can't immediately continue his brush stroke when the ladder he was using is dragged down the street in a brief safety line mishap. It's completely insane that the brush box doesn't articulate this hazard."
For a TCP/IP application, losing network access, even briefly, severs the communication link. Even if the database client software has the ability to re-establish communication, it's unlikely that this reconnection protocol will be entirely seamless with respect to the client application, and the hash that this makes within the transaction model isn't likely to make application recovery any easier.
It takes a great deal of effort to find the right balance of database API options to implement geodatabases with reliable network access. It's unlikely that the same order of magnitude of performance would be achievable if the client library were optimized for unreliable connectivity (assuming this is even possible). It doesn't seem fair to blame a third party for the error messages generated in some other vendor's library by failure five or six levels away in the OSI model.
If you need to work with unreliable network access, you need to use tools which do not require continuous connectivity. There are a number of available options using the Esri technology stack, including disconnected editing and feature stream editing with REST endpoints.
- V
I will have to respectfully disagree. Even on a reliable network, there are many reasons why a database can be momentarily disconnected. Other applications cope with this sort of thing fine. I appreciate that it's not simple, but it's a shame that ESRI are not even trying.
Database transactions make it feasible.
I've done more than my fair share of programming in the TCP/IP stack, and there really isn't any reason for transient disconnections on a reliable network. Since TCP/IP hides the physical layer from the application, frequent TCP connection failures are the very definition of an unreliable network.
The very nature of Direct Connect connections is to utilize the network protocol of the database API. If they report a connection failure, it's not possible to reject the error and force it to remain connected. At this point, all the cached commands in the Direct Connect thread queue are invalidated. It's not just that this is a non-simple task, it's a nigh-on impossible task. If you then multiply by a dozen different Direct Connect drivers, the magnitude of the problem is exposed.
In fact, database transactions are what make it impossible. You can't rejoin a transaction after it fails, and that low-level failure during a COMMIT is what invalidates the uncommitted changes. Remember that the versioned geodatabase model saves the individual undo-able edits as states in the versioned state tree, and the act of "saving" is really only changing the state of the version associated with the edit session to the last created state. So, in a sense, you can access the committed states through the point where the editing failed, but you'd need to take heroic (and unsupported) measures to align the version to that state. If the edit session is not versioned, then a simple implicit ROLLBACK erases any memory of the edits.
Best practice remains to save frequently during an edit session.
- V
This is getting ridiculous. There are plenty of reasons for transient disconnections on a reliable network. They are mostly user error. Eg, playing around with network settings, disconnecting cables, and most commonly (in my experience), admin users actively disconnecting users so that they can clear locks for various reasons.
Maybe this doesn't happen to you, but it happens to other users.
No, of course it's not possible to reject the error and remain connected. Nobody is suggesting any such thing. What would be nice (and is possible) is to handle the error gracefully. At the VERY LEAST, the application could actually report a human-readable error message that was actually related to the real underlying cause of the problem (eg, "Lost connection to database"). But more than this, the application should not require to be killed and restarted to be able to start using the data again! It's completely ridiculous! Or maybe there is some other way?
The application should report a USEFUL error message, and should then let the user know the consequences (eg, you've now lost all your edits - sorry!), and give the user some options/opportunities to get back to a sane state without having to restart the entire application (which incidentally takes a crazy long time when it comes to ArcMap). IT SHOULD THEN RECONNECT TO THE DATABASE AUTOMATICALLY. Not to recover all the data/transation state, but to allow the user to re-do all the work that they've just lost. Reconnecting to databases is of course trivially easy. Even ArcGIS Server does this (after 30 minutes, by default).
Completely agree. I get rollback/commit, I thought you were asking to somehow get your unsaved transactions back. I would love a message about what happened, again, assuming I can trust that it is 100 percent correct.
Also, I suspect that many GIS users have imperfect, not "reliable" networks. I work for a huge site out in the middle of the desert, which despite all sorts of network and power management smarts, loses power or browns out occasionally. Plus those smart managers cause problems testing and installing fixes that they are required to push. There are probably lots of single-GIS analyst, small agency shops that are stuck with ancient computers and protocols and idiots for network managers.
Here is my much less technical take on this. Databases can become corrupt; the different files that make them up
can get out of synch, or a record can be incompletely saved or saved on a disk sector with problems. I have had
this happen many years ago on a small PC network, and the database continued to work until the bad data area was
accessed. Then we had to reenter everything entered since the failure after restoring to an earlier, safer date (a
date that was not easy to figure out) because a good chunk of the data could not be viewed or printed.
I would prefer database software drop everything in process when there is a lost connection and avoid even the
slightest chance of corruption, especially if it does not quickly provide reliable information about the integrity
of the database after reconnection. I assume it lost my edits but that I can renter them. This has always happened
with SDE, and I have more confidence in it than my file-based databases such as FGDBs.
Using database transactions on a proper RDBMS avoids that kind of corruption. A transaction makes each logical group of database operations either all-or-nothing. If the database doesn't get the 'commit' at the end of the transaction, it can roll back all of the parts of the transaction. It is always internally consistent.
Corruption can of course occur for many other reasons.