7

Why one word is found two times in each seed?

Is it use as a checksum?

ant Bldel
  • 1,281
  • 8
  • 14

2 Answers2

8

Yes, it's because the last word is a checksum word which will always be picked from one of the words in the mnemonic. See here for more details.

In fact, one word could occur even more than 2 times. For example,

aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces aces,

is a perfectly valid seed, however, you will have a hard time generating it randomly ;) The resulting address is:

47Rz1Ee4GMj3DVyQ56XWLE5ZnffaAj934Sk6fFE7US4CUyHKBKZZNj9cCigdZweGdN1LhsAmx2VvH573AMLdSKj3QBXdg3F

I didn't check if there's any monero there, though.

The words can be repeated because the mnemonic is a way to encode 256-bit numbers. The seed is just a random 256-bit number, and as such, can have repeating digits or sequences or be divisible by some number or powers of that same number.

The mnemonic is a representation of that 256-bit number in base1626 number system, where each word of the dictionary represents a digit. With this system, 24 words are required to represent 256bits of information, and the 25th word is introduced as a checksum, which is additional information not encoded in the actual seed (the 256-bit number).

If we were to count in such a system, once you reach the last word (zoom) of the dictionary, you need to add one more "digit" to continue counting, as you would do when counting 8,9,10 - but here your "9" is "zoom", example:

0 = abbey
1 = abducts
2 = ability
... 
1625 = zoom
1626 = abducts abbey
1627 = abducts abducts
1628 = abducts ability

and so on... once you use up all the words for the 2nd "digit", you move on with the first "digit",

3251 = abducts zoom
3252 = ability abbey
3253 = ability abducts
3254 = ability ability
3255 = ability ablaze

and so on... until the first digit reaches zoom, then we add 3rd digit on the next increase:

2645502 = zoom zoom
2645503 = abducts abbey abbey

I think you get the idea by now. So as you can see, any combination of words represents some number, and any number can be represented by a series of words, as long as you have a list of words to use and a methodology to use for converting. In fact, the biggest number you can represent with 24 words (remember, 25-th is the checksum) is

= 1626^24 - 1,

which is a 78 digit number (base10, what you call a "normal" number), if you were willing to write all the digits.

So, that is why you can have repeating numbers.

JollyMort
  • 19,934
  • 3
  • 46
  • 105
  • 1
    "I didn't check if there's any monero there, though." Indeed, you could not do so. – Christopher King Sep 10 '16 at 13:54
  • 1
    I couldn't just look up the address on a block explorer, that's for sure. However, I happen to know that at least until block height 1133051, there was 0.01 monero on the address mentioned. – JollyMort Sep 10 '16 at 15:39
4

This is a checksum.

An index is computed at the creation of the list with the CRC32 checksum of the words and then the word at that index is appended to the list.

uint32_t create_checksum_index(const std::vector<std::string> &word_list, uint32_t unique_prefix_length)
{
    //[...]
    boost::crc_32_type result;
    result.process_bytes(trimmed_words.data(), trimmed_words.length());
    return result.checksum() % crypto::ElectrumWords::seed_length;
}

.

words += (' ' + words_store[create_checksum_index(words_store, language->get_unique_prefix_length())]);

When using the words list:

bool checksum_test(std::vector<std::string> seed, uint32_t unique_prefix_length)
{
    // The last word is the checksum.
    std::string last_word = seed.back();
    seed.pop_back();

    std::string checksum = seed[create_checksum_index(seed, unique_prefix_length)];

    std::string trimmed_checksum = checksum.length() > unique_prefix_length ? Language::utf8prefix(checksum, unique_prefix_length) : checksum;
    std::string trimmed_last_word = last_word.length() > unique_prefix_length ? Language::utf8prefix(last_word, unique_prefix_length) : last_word;
    return trimmed_checksum == trimmed_last_word;
}
Clement J.
  • 3,339
  • 2
  • 14
  • 35
  • Could you go on a bit more on how the checksum works, in human language rather than code? :) I believe that that was the point of the question, and I am curious to understand it too. – user141 Oct 01 '16 at 15:50