Can I always use "encoding" and "embedding" interchangeably?

Question

This question is restricted to the text domain only.

The meaning of the word "encode" is Convert (information or instruction) into a particular form. One which performs encoding is called an encoder.

In deep learning, an encoder can also be the first part of a neural network (autoencoder) that simulates identity function, which governs the English meaning of encoder since it encodes the input.

Embeddings are encodings where the intention is to preserve semantics. You can observe the following excerpt from the chapter Vector Semantics and Embeddings

In this chapter we introduce vector semantics, which instantiates this linguistic hypothesis by learning representations of the meaning of words, called embeddings, directly from their distributions in texts.

But all encodings may not be the embeddings since encodings might not always preserve semantics (?). I have doubt in this statement which I inferred based on my current knowledge.

Many times, I came across the terms text encoding and text embedding interchangeably. But failing to catch whether they are the same or we need to be choosy while using them.

Consider the following usages of encoding and embedding in the paper titled Generative Adversarial Text to Image Synthesis by Scott Reed et al.

#1: The intuition here is that a text encoding should have a higher compatibility score with images of the correspondong class compared to any other class and vice-versa.

#2: Text encoding $\phi(t)$ is used by both generator and discriminator.

#3: ...where $T$ is the dimension of the text description embedding.

#4: ... we encode the text query $t$ using text encoder $\phi$. The description embedding $\phi(t)$ is first compressed ...

I think they are used interchangeably. Is it true? Can I use any word if I am confident enough that my encoding is semantic preserving? Or is there any strong reason for choosing the words?

If you observe the last point, the word "encoder" is used. Can I use embedder instead of it?

This blog post states that "There are no completely standard terminology guides for machine learning. Each project, research paper, and blog post should explain what each term means. The terms “encoding”, “embedding”, and “latent representation” can be, and often are, used interchangeably." — Dan Dascalescu, Nov 22 '22 at 19:46

Basile Starynkevitch · Answer 1 · 2021-12-27T14:29:53.753

Caveat: I am not a native English speaker (but French). And mostly interested in symbolic artificial intelligence (the topic of my PhD thesis defended in 1990; see books by Jacques Pitrat)

Encoding is related to decoding. Most of the time, if you encode something A into some other thing B, you can "decode" B to get back A.

Otherwise, you would just say "parsing".

Embedding means mapping something into a "greater" set (or category, e.g. differential manifolds).

Feel free to email me for more explanations.

nbro · Answer 2 · 2021-12-26T12:33:24.917

From my experience with reading papers and books, I think these two terms are sometimes used interchangeably.

As you also point out, an encoder (in an auto-encoder) also may also learn some "semantics" of the inputs in order to produce the latent space. However, the way encoders are trained may not produce embeddings, with similar properties to e.g. word embeddings. For example, an image of a cat may be mapped to a latent vector that is closer to the latent vector of a person than to the latent vector of a dog (the usual way deterministic autoencoders are trained doesn't enforce these properties).

So, in my head, an encoding may not have any semantics (one-hot encoding is the typical example), but an embedding has. However, again, it depends on the context, so you should take context into account. So, I expect people to use the term encoding to refer to an embedding.

Josiah Yoder · Answer 3 · 2024-01-25T15:47:20.327

The OP is correct:

One definition of "encoding" is perfectly synonymous with the semantic definition of "embedding." and the two words are used interchangeably, as in the work the OP sites.
Yet "encode" and "embed" can also be used in a contrasting non-semantic vs. semantic sense as the OP also suggests.

Synonymous by definition

According to Merriam-Webseter, one definition of encoding is "to convey symbolically -- 'the capacity of poetry to encode ideology—J. D. Niles'". The symbols of poetry are deeply linked to the subtle variations of meanings in words -- thus poetic "encoding" indeed "embeds" from one representation to another, using representations that are strongly linked to the meanings of terms.

Use of encode and embed in the first Transformer paper.

This section is written in the context of LLMs (large language models) like ChatGPT, specifically in the first Transformer paper, Attention is All You Need.

In this paper, the term "encoding" is indeed used for a non-semantic encoding -- that is, for representations that do NOT attempt to push words with similar meanings closer in the representational (encoding) space. The word "encoding" is used in this sense twice in the paper. The first is for positional encodings, which encode the position of a word within the input text uniquely with an encoding that is something like a soft embedding. The second is for "byte pair encodings" (BPE), which was the term used for Word-Piece encodings at the the time this paper was written. Word-Piece encodings map the input into a sequence of indices that correspond to known words or pieces of words. The original words can be recovered from the sequence of indices and the indices do NOT move meanings closer together.

The term "embedding" is also used in this paper in the sense the PO and others have discussed. Here, the BPE (word-piece) tokens are embedded into a 512-dimensional space, where, during learning, the embeddings will drive similar words towards similar embeddings, thus producing a semantic embedding.

But, in the same paper, the term "encoder" is consistently used for a large network whose sole purpose is to produce rich semantic embeddings that embed each word of the sequence into a rich 512-dimensional space where the position of the vector within that space not only encodes the meaning of that word, but also the meanings of other words to which that word is connected grammatically within the sentence.

What are we to make of this? The words "encoding" and "embedding" do seem to clearly differ in being non-semantic or semantic. Yet the word "encoder" is clearly semantic.

Are the authors of Attention is All You Need being inconsistent about their use of language? I don't think so. I believe that the popular "encoder-decoder" architecture, already popular long before Transformers came on the scene, breaks from the encoding/embedding distinction seen elsewhere and indeed uses the term "encoding" synonymous with the term "embedding."

The Encoder-Decoder Architecture

Why are these called encoder-decoder architectures? Here, encode is used in the synonymous sense of "semantic embeddings," and is used instead of embedding because entire sentences are considered rather than single words. The d2l.ai text clearly defines the encoder-decoder architecture as "consisting of two major components: an encoder that takes a variable-length sequence as input, and a decoder that acts as a conditional language model, taking in the encoded input and the leftwards context of the target sequence and predicting the subsequent token in the target sequence."

Embeddings as manifolds within a larger space

As illustrated clearly in a figure from an answer to another question already linked to this one, the term "embedding" can refer to the process of putting one space within another according to a nonlinear mapping.

When we use this term to talk about embedding the meanings of words within a higher-dimensional space, we should NOT assume that the embeddings fall on a lower dimensional manifold (surface) within that space. They may very well fill the entire dimensionality of that space. But the word still carries this picture of embedding points into a space, just as one embeds nails into wood to make nail-art.

The word encoding describes exactly the same process, but has an emphasis on preserving the meaning of the original text. But yet again, an embedding can also preserve the meaning of the original text. For example, the embeddings of word-pieces must be reversable, or some tokens in the vocabulary could never be copied by the network to the output.

Conclusion

It is clear from this brief review of a tiny fraction of the literature that the terms "encode" and "embed" can be used synonymously (in the sense of transforming one representation to another, considering the meaning of what is being transformed), but they can also be used in contrast to one another (with the word "encode" implying that the meanings of terms are NOT taken into account, while the word "embed" does take those meanings into account).

The encoder-decoder architecture uses "encode" in a sense synonymous with "embed."

Positional encodings and byte-pair encodings (word-piece encodings) use "encode" in a sense that is contrasting with the semantic meaning of "embedding."