In a transactional database, a deadlock happens when two processes each within its own transaction updates two rows of information but in the opposite order. For example, process A updates row 1 then row 2 in the exact timeframe that process B updates row 2 then row 1. Process A can't finish updating row 2 until process B is finished, but process B cannot finish updating row 1 until process A is finished. No matter how much time is allowed to pass, this situation will never resolve itself and because of this ClustrixDB will kill the youngest transaction of the two.
In short, deadlocks are the result of two or more concurrent transactions modifying the same data such that the transactions are waiting for each other to commit or abort, and thus neither ever does, so one must be killed.
A simple example of how to create a deadlock.
session1> create table foo (id int auto_increment primary key, v varchar); session1> insert into foo values (1,1); session1> insert into foo values (2,2);
Creating the deadlock
session1> begin; session2> begin; session1> update foo set v = v+1 where id = 1; session2> update foo set v = v+1 where id = 2; session1> update foo set v = v+1 where id = 2;
Session1 will now block, waiting for session2 to commit or abort.
session2> update foo set v = v+1 where id = 1;
One of the two transactions will then fail with the error
[Lock manager deadlock detected: ], as the deadlock detector cancels one of the transactions to preserve the other.
Note that deadlocks generally indicate an application logic problem; to help developers track down the portion of application code causing the issue, one can enable the global variable debug_deadlocks, and then check the clustrix.log to see exactly which queries lead to deadlock.
Even if deadlocks are not causing problems it is always best to eliminate sources of contention wherever possible. Below is our methodology for investigating deadlocks in the system.
sql> set global debug_deadlocks=true;
If you execute the above example to cause a deadlock, you would see the following in clustrix.log:
shell> /opt/clustrix/bin/clx cmd 'grep "^2016-03-15" /data/clustrix/log/clustrix.log' | grep "dbcore/deadlock_detect"
2016-03-15 21:48:34.165629 UTC alpo004.colo.sproutsys.com clxnode: INFO dbcore/deadlock_detect.ct:177 victimize(): survivor xid=:0x56ea2520913ac802 user=root@NULL sql="update foo set v = v+1 where id = 2"
2016-03-15 21:48:34.165629 UTC alpo004.colo.sproutsys.com clxnode: INFO dbcore/deadlock_detect.ct:177 victimize(): victim xid=:0x56ea25232f014802 user=root@NULL sql="update foo set v = v+1 where id = 1"
2016-03-15 21:48:34.165629 UTC alpo004.colo.sproutsys.com clxnode: INFO dbcore/deadlock_detect.ct:181 victimize(): Canceling trx 0x56ea25232f014802 in graph to preserve 0x56ea2520913ac802.
Utilizing the methods above should provide the information needed to identify the source of the deadlocking queries.