We have an ArcGIS 10.5.1 environment with a number of connected one way replicas (SQL Server 2016). Since October we have had issues with our replicas hanging when synchronizing - to the point where we have to cancel the sync and force close ArcMap and/or Catalog. Here are the things we have noticed:
1) We sometimes have multiple replica versions stuck in the sde_versions table associated with one replica. This means we are unable to do a full compress (almost like orphaned replicas, but associated with registered replicas). We have made copies of the databases and messed with the sde_versions table in a dev environment, but as expected this ends up corrupting the replica.
2) Sometimes a replica hangs on one table and we try it again the next day, it's still doesn't work, but within a few days we try it and it suddenly magically syncs.
3) We did not have a consistent maintenance program - but since this problem has started we have recognized the need to fully compress, rebuild indexes, update statistics, etc and have become more consistent (especially when synchronizing replicas).
4) We are not synchronizing TONS of changes.
5) We have tried using disconnected synchronization (data export to XML and importing the XML to the child replica) but it hangs on the same table as connected synchronization does. Also, it's not always failing on one specific table. For example, if a replica hangs on a feature class called "Mains" and then we finally recreate the replica to fix it, the next time it fails it may fail on a different feature class all together.
6) We are not able to constantly recreate the replicas, since they feed our critical web services and our Leadership does not want us to use our production databases for serving web services.
Any thoughts, feedback, anyone experience the same issue? We are all out of ideas here.
I had this happen to me once. The issue was related to a relationship class we had. One table had orphan records which the other did not have a counterpart for. This caused the sync on the replica to spin and spin until it finally just timed out. When I deleted the orphans, it starting working again without issue. ESRI support wasn't much help here and I just had to do trial and error to until I found the issue. I have heard this also can happen if you have a bad geometry value in one of the records.