I did some additional testing, debugging and logging on this issue.
First of all, I simulated some long running task with simple Thread.sleep() calls, making a scheduled thread work for more than 10 minutes three times per hour. Not a single execution of this simulated workers crashed in the way described above. That lead me to the consumption that there is probably something going wrong within the ArcObjects interaction.
So I created an additional observer thread, that logs the thread's state and dumps its StackTrace to a file once it exceeds the 10 minute mark. The results are of one of the two following kinds:
Thread-15 102 RUNNABLE
com.esri.arcgis.interop.NativeObjRef.nativeVtblInvokeNative(Native Method)
com.esri.arcgis.interop.NativeObjRef.a(Unknown Source)
com.esri.arcgis.interop.NativeObjRef.a(Unknown Source)
com.esri.arcgis.interop.Dispatch.vtblInvoke(Unknown Source)
com.esri.arcgis.geodatabase.ICursorProxy.nextRow(Unknown Source)
MyCallingClass.pullData(MyCallingClass.java:142)
or
Thread-22 176 RUNNABLE
com.esri.arcgis.interop.NativeObjRef.nativeVtblInvokeNative(Native Method)
com.esri.arcgis.interop.NativeObjRef.a(Unknown Source)
com.esri.arcgis.interop.NativeObjRef.a(Unknown Source)
com.esri.arcgis.interop.Dispatch.vtblInvoke(Unknown Source)
com.esri.arcgis.geodatabase.IQueryDefProxy.evaluate(Unknown Source)
MyCallingClass.pullData(MyCallingClass.java:89)
The thread is "stuck" (even it is in RUNNABLE state) at that method for 1-3 minutes and than happens the magic (!!): All logs suddenly stop. I never receive any more logs for this SOE instance - and a new one is created subsequently. My conclusion is that AGS is shutting down the SOE instance the hard way (similar to "kill -9 <processId>" on UNIX). No shutdown hook is executed, it is just killed.
I have no clue how to go on. This might be a deadlock bug inside the COM object or related. It cannot be reproduced 100% - it happens from time to time at different queries.