In either way, let’s assume the coordinator crashes before it can send the 2nd message (COMMIT/ABORT). Since 2PC uses only 2 messages, in order to keep the cohorts consistent, the cohorts can never apply a commit without explicitly receiving a COMMIT message. So, when a coordinator crashes, the cohorts can either (1) apply a ABORT using a timeout or (2) can remain in waiting state indefinitely. It is hard to believe that a 2PC would employ option (2) i.e. waiting indefinitely. Therefore, if option (1) is chosen, then why do we tag 2PC as blocking when a coordinator dies?
"Option 1" does not exist. When a cohort votes 'yes', it waits for the global decision indefinitely. That is 2PC. If you alter it because it does not look sane/plausible enough, that is not 2PC any more.
EDIT:
Some relaxation for the above: the Atif paper (and mostly Wikipedia too) describe the 'classroom' variant of 2PC. But in the original papers there were suggestions for distributed recovery too. You can find some description in the referred articles (both by Wikipedia and the Atif paper - http://dl.acm.org/citation.cfm?id=850772 is one I managed to access). The 3PC article on Wikipedia mentions it too, in its Motivation section.
I find it hard to access most of these papers (like the 'Gray 78' lecture notes from TU Munich), but as far as I understand:
- The recovery mechanism tries to get the global decision from practically anybody
- If the global decision is found anywhere, that is the global decision, and can be carried out. When the coordinator revives, it will do the same. In my interpretation, a cohort could also say 'abort', if its own vote was a 'no'. But again, I have not read the very original paper(s)
- If the global decision has not been found, and everyone (minus the coordinator) is alive, nothing happened so far (coordinator died before sending out 'commit'-s), the transaction can be aborted
- If the global decision has not been found, and there is at least one dead cohort (on top of the coordinator being dead too), that is a problem: in 2PC there is no way to tell if the unreachable cohort got a 'commit' message and carried it out before both of them died, or not. So even if there is a timeout or failure detection applied in the 'voted yes and waiting for global decision' step, there can be an indefinite waiting in the recovery procedure
(Here I was writing only about the case where the coordinator dies and cohort(s) try to deal with it)