1

Probably this is very naive question, based on lack of understanding but I was watching a video from Andreas M. Antonopoulos about blockchain, bitcoin and consensus algorithms and he asked for the SHA-256 output of the string Hello!. A guy told him the first few characters, and being Windows/C# dev I decided just for fun to check the answer implementing this very simple C# method:

    public byte[] Generate(string valueToHash)
    {
        byte[] hashValue;
        byte[] stringToBytes = Encoding.ASCII.GetBytes(valueToHash);

        using (SHA256 hashGenerator = SHA256.Create())
        {
            hashValue = hashGenerator.ComputeHash(stringToBytes);
        }

        return hashValue;
    }

and then output the result using this method:

    public static void PrintByteArray(byte[] array)
    {
        for (int i = 0; i < array.Length; i++)
        {
            Console.Write($"{array[i]:X2}");
            if ((i % 4) == 3) Console.Write(" ");
        }
        Console.WriteLine();
    }

But I got a different result.

I remember in the past that there were sites which were storing a huge databases of hashes and the actual value to produce them and back then it was possible to paste the hash value and if it's something common you will get the actual string. This got me to think that no matter what language/Os you are using, a value hashed with SHA-256 for example would produce the same result. However it seems that this is not entirely the case. The guy from the video who provided the answer turned out (from the comments) to use a Mac machine, so what exactly is causing the SHA-256 to produce different outputs for the same input? Is it the OS, is it the programming language? Maybe I've made mistake in my simple code?

Maarten Bodewes
  • 92,551
  • 13
  • 161
  • 313
Leron
  • 121
  • 5
  • 1
    SHA-256 is a deterministic algorithm that always returns the same result for the same input. Chances are in this case that the inputs were somehow different (e.g. including a trailing \0 byte or a trailing newline sequence). – SEJPM Jul 31 '19 at 11:17
  • Mainly see the answer here on that question rather than the mathematical one. – Maarten Bodewes Jul 31 '19 at 11:18
  • 1
    Please only use code blocks for code and input / output. Code blocks are not a highlighting tool. You can use italics and (sparingly) bold or even *bold italics* instead. – Maarten Bodewes Jul 31 '19 at 11:21
  • of course same SHA256 result (aks digest) as long as the inputs are same. programming language has nothing related to the hash result, just a method to implement the SHA256 algorithm... – TJCLK Aug 01 '19 at 07:32
  • @LiDong Thanks I understand that now. However I am still wondering. I checked this question -> https://apple.stackexchange.com/questions/310244/how-do-i-find-the-sha256-hash-of-text-on-a-mac and the sha for "simple text" is different from what I get when I set it as an input for the method in my question. However other people using Windows/Visual Studio are getting what I do. So what is the difference? – Leron Aug 01 '19 at 07:59
  • OK, finally, figure it out. In the case of the terminal vs IDE, the terminal is inserting a new line \n at the end of the string as it is explained in the question. When I add this to the end of my string the hashes are the same. Thanks to all. – Leron Aug 01 '19 at 08:14

1 Answers1

5

Yes, SHA-256 is defined to give the same output for the same input byte sequence on every platform. However, by design, even the smallest change in the input will change the output into something completely different. So the most likely reason for the discrepancy you observed is that the input you used is not actually the same as the original input from the video.

Some possible causes for this discrepancy might include:

  • An extra null byte at the end of one of the inputs.
  • An extra newline or space at the end of one of the inputs.
  • A difference in the encoding of newlines (if the string contains one).
  • A difference in the capitalization of one of the letters (e.g. hello vs. Hello).
  • A difference in punctuation (e.g. Hello! vs. Hello. or "Hello!").
  • A difference in the character encoding used to encode the string as a sequence of bytes (unlikely for a plain ASCII string like Hello!, since most commonly used character encodings today are extensions of ASCII, but possible e.g. if one string was encoded as UTF-16).
Ilmari Karonen
  • 46,120
  • 5
  • 105
  • 181
  • 2
    Also BOM and/or BIDI at the beginning of the input (more common for file input); and different encoding/representation of the output as hex, base64 (uucp, MIME, or URL), base32, base58, base96, ... – dave_thompson_085 Aug 01 '19 at 04:14