Are the results of these two scenarios the same
For $\mathrm{SHAKE256}$, both cases result in exactly the same final internal state.
are there any quirks I should know about?
The canonical meaning of the inputs to hash and XOF functions should be reflected in a change to their internal states & output digests. However, ambiguity between inputs may prevent this.
If $A$, $B$ and $C$ are truly different conceptual things, then the typical way to avoid canonicalization issues is to do:
$$ \mathrm{L} = \mathrm{len}((A, B, C)) $$
$$ \mathrm{\bf{xof}} = \mathrm{SHAKE256}(L \space || \space \mathrm{len}(A) \space || \space A \space || \space \mathrm{len}(B) \space || \space B \space || \space \mathrm{len}(C) \space || \space C) $$
$$ \mathrm{result} = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) $$
which is exactly the same as:
$$ \mathrm{\bf{xof}} = \mathrm{SHAKE256}(\mathrm{len}((A, B, C))) $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{len}(A) \space || \space A) $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{len}(B) \space || \space B) $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{len}(C) \space || \space C) $$
$$ \mathrm{result} = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) $$
Prepending the length metadata of each item & the number of items in the whole collection prevents there from being any confusion. A canonical encoding is one which can be uniquely, completely & unambiguously decoded back into its original meaning. And it's similar to what's done in the TupleHash construction.
Another related quirk, which arises when using hash & XOF objects as entropy pools, is temporal & caller context confusion. For instance, say there's an XOF object like this, which is shared across multiple threads:
$$\mathrm{\bf{xof}} = \mathrm{SHAKE256}(\textrm{sufficient_initial_entropy})$$
And there's three threads using that object, $\mathrm{thread}_A, \mathrm{thread}_B, \mathrm{thread}_C$. Each thread seeds $\mathrm{\bf{xof}}$ prior to retrieving a digest to use as entropic material. They each expect the following to happen:
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{seed}_A) \tag{$\mathrm{thread}_A$ sees} $$
$$ \mathrm{entropy}_A = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{seed}_B) \tag{$\mathrm{thread}_B$ sees} $$
$$ \mathrm{entropy}_B = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{seed}_C) \tag{$\mathrm{thread}_C$ sees} $$
$$ \mathrm{entropy}_C = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) $$
But, the real execution order looks like this:
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{seed}_B) \tag{$\mathrm{thread}_B$ sees} $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{seed}_A) \tag{$\mathrm{thread}_A$ sees} $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{seed}_C) \tag{$\mathrm{thread}_C$ sees} $$
$$ \mathrm{entropy}_A = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) \tag{$\mathrm{thread}_A$ sees} $$
$$ \mathrm{entropy}_C = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) \tag{$\mathrm{thread}_C$ sees} $$
$$ \mathrm{entropy}_B = \mathrm{\bf{xof}}{\mathrm{.digest}}(x) \tag{$\mathrm{thread}_B$ sees} $$
Each thread intended to do incremental absorbsion & incremental squeezing out of their own distinct intermediate digest. But it turns out, they wound up doing all-at-once absorbsion & each squeezed out the identical final digest. Yikes.
There are various mitigations for this, such as:
- Making a copy of $\mathrm{\bf{xof}}$ prior to seeding & retrieval within each calling thread.
- Including with each seed temporal & caller metadata as domain separators.
- Using different domain constants when updating the global $\mathrm{\bf{xof}}$ from when updating its copies.
Here's an example:
$$ \mathrm{\bf{xof}}_A = \mathrm{\bf{xof.}}\mathrm{copy}() \tag{$\mathrm{thread}_A$ sees} $$
$$ \mathrm{seed}_{A^*} = \mathrm{\bf{canonicalize}}( \textrm{thread_id}_A, \space \textrm{monotonic_counter}(), \space \textrm{time_now_ns}(), \space \mathrm{seed}_A ) $$
$$ \mathrm{\bf{xof.}}\mathrm{update}(\mathrm{\bf{canonical}\textrm_\bf{pad}\textrm_\bf{to}\textrm_\bf{blocksize}}(\text{$``$xof-global-update''}, \space \mathrm{seed}_{A^*})) $$
$$ \mathrm{\bf{xof}}_A.\mathrm{update}(\mathrm{\bf{canonical}\textrm_\bf{pad}\textrm_\bf{to}\textrm_\bf{blocksize}}(\text{$``$xof-copy-update''}, \space \mathrm{seed}_{A^*})) $$
$$ \mathrm{entropy}_A = \mathrm{\bf{xof}}_A{\mathrm{.digest}}(x) $$
Since $\mathrm{\bf{xof}}_A$ is $\mathrm{thread}_A$'s independent copy of the global entropy pool, as long as its seed is unique, it is overwhelmingly likely that the internal state will not become the same across threads. In this case, the four values $( \textrm{thread_id}_A,$ $\space \textrm{monotonic_counter}(),$ $\space \textrm{time_now_ns}(),$ $\space \mathrm{seed}_A )$, & the domain (customization) strings, provide a good amount of unique temporal, caller & function context to mitigate the issue described above. However, the specifics of what may be needed to uniquely identify a calling context can vary across applications & environments.