You are experiencing the classic XA 2-PC race condition. It does happen in production environments.
There are 3 things coming to my mind.
- Last agent optimization where JDBC is the non-XA resource.(Lose recovery semantics)
- Have JMS Time-To-Deliver. (Deliberately Lose real time)
- Build retries into JDBC code. (Least effect on functionality)
Weblogic has this LLR optimization avoids this problem and gives you all XA guarantees.