Trying to sync a new node and keep getting hashMerkleRoot mismatch

Question

I've tried syncing this same MacBook Pro multiple times and it keeps stalling. The debug.log file keeps repeating the same two lines at block 275162:

2017-07-07 03:09:59 ERROR: ConnectBlock: Consensus::CheckBlock: bad-txnmrklroot, hashMerkleRoot mismatch (code 16) 2017-07-07 03:09:59 ERROR: ConnectTip(): ConnectBlock 0000000000000002373cc5cc98604fa31361a495ec20eebb54861e9f489d3336 failed

Any light on the issue would be appreciated.

score 3 · Accepted Answer · answered Jul 07 '17 at 06:09

3

This error means that you have a corrupted block. The only surefire way to fix this is to redownload the entire blockchain. You can do that by going to the Bitcoin Core datadir (get the path by going to Help > Debug Window in Core and there will be a field labeled datadir) and deleting the blocks folder, after first shutting down Core. Then start Core and let it sync from scratch.

answered Jul 07 '17 at 06:09

Ava Chow

70,382
5
81
161

Thanks for your reply. I have already deleted the blocks directory previously as it stalled before. This has happened several times now syncing from scratch. Would this be a hardware issue at this point? I can sync on other machines just fine. – Andrew Toth Jul 07 '17 at 14:12
2

If you are consistently seeing the same error, then you likely have a hardware issue. In that case, you should run hardware diagnostics to see if any errors are detected. The things to check for are the memory and the disk. – Ava Chow Jul 07 '17 at 16:37

verdy_p · Answer 2 · 2022-08-08T18:57:11.127

This IS related, in fact I've found the cause, and it is definitely caused by bugs in the QT version of Bitcoin Core (that version has severe synchronization issues across threads, and generates uncaught deadlocks exceptions, causing some race bugs in some conditions, or unchecked conduitions based on incorrect assumtion about order of execution of events, and in some cases may be caused by some remote agents using slightly different implemetnations of the protocol and making non-sense requests out of order.

Notably the QT engine conflicts with the LevelBL engine, and can corrupt its working state, especially during the IBD phase where there are LOT of I/O made by LevelDB, and many concurrent threads, that Qt manages incorrectly; there are also some unsafe "quirks" made by the Qt engine, notably in the memory allocator, some parts of been not be ing properly isolated, as well Qt seems to take the priority on all locks made by LevelDB, but LevelDB some times need to hold multiple successible locks on files or on the thread scheduling).

For now the solution is to NOT make the IBD phase within the Qt version. You can start it there. But then exit the Qt app, and ontinue the IBD with bitcoin daamon until "bitcoin-cli getinfo" reports that the IBD is complete.

At this time, the QT version will be generally safe before there will be much less interactions and the LevelDB databases for the blockchain or other indexes will be restructured much less often. Something is wrong in the model allowing the Qt VM and the RPC/IPC mechanism to interoperate and manage their own sets of threads.

This is NOT specific to Windows: I've seen exactly the same thing with a Linux version (either in a native machine, or in a VM, or in WSL).

Bitcoin developers admit themsevles that the Qt version is not well tested (most tests are performed in the daemon version, but there are still other tests to be written; there are many bugs to solve in GitHub, including various race conditions inside dependencies: Qt, LevelDB, GCC and its build chain)

Those bugs are NOT caused by the hardware as indicated; they can occur at anytime and are very hard to test and predict. But developers are trying to restructure Bitcoin Core into more manageable submodules that can be safely tested; but some modules in Bitcoind Core are really huge and perform too many unrelated things.

As the user (even if he tries on a Macbook Pro) is asking for hints, I really suggests him to use the daemon and not the Qt-based UI, especially for syncing. This is much faster and much less error prone. But he will have first to close the QT UI, wait until it termiantes flushing the data to disk, then launch the dameon in a shell window and let it run in the background, and can look at its progress by opening a secondary shell to use "bitcloin-cli -getinfo".

Once this info indicates the node is in sync, he can stop the daemon (press Cmd+C once, wait until you get the prompt after the daemon flushes its state to disk), and can restart using the QT UI after that.

Note that if logs indicate corruption in transactions, he has to delete the index files for transactions; if this does not work, he has to use the "-reindex-chainstate" flag; if this still does not work he has to delete the corrupted block indicated (and all the following ones, i.e. "blk0*.dat", and all undo files "rev0*.dat with the same starting number, and the 2 special metadata files in the block folder) and relaunch the daemon (avoid retrying with the QT UI) with "-reindex" which will restart the IBD sync.

Also there are race issues even on Linux (not just Windows) when the index files in chainstate get "compressed" because the max cache is exhausted (including the additional unused mempool size which as at least 5MiB in size): the compactor seems to delete necessary index files even though they are still referenced in further index files that are needed and used. — verdy_p, Aug 12 '22 at 18:55
There's some incorrect reference counting, and a file may be deleted too early (this is most likely a bug in Bitcoin sources and not even in LevelDB that does the job it was instructed to. This is somewhere occuring in the "/src/util/cache.cc" source file, not tracking correctly an exception that may occur; and possibly a bug in its simple hashing algorithm (which states that collisions per hash bucket should remain small <=1 on average). — verdy_p, Aug 12 '22 at 18:57

Trying to sync a new node and keep getting hashMerkleRoot mismatch

2 Answers2