How are the parts of GPT connected?

Question

Reading Stephen Wolfram's explanation of ChatGPT, it sounds as if first you train a very powerful "autocomplete" function that doesn't know anything specifically about chatbots, and then you create a corpus of chatbot-dialogs on top of that model to show it how to be a chatbot specifically. I'm curious if anyone can explain in a bit more detail how this process works technically, ie taking an already-trained model and "specializing" it with a second training corpus. How are the two neural networks related to one another?

It's just more training. There's no "end of training process" that stops you from doing any more training after that; you can just keep doing even more training. There's probably some kind of secret sauce in ChatGPT, but this isn't it — user253751, May 09 '23 at 12:56
ChatGPT is a decoder-only transformer model. For a full on (hopefully not too technical explanation), please read this extensive post. It also goes into how the transformer is trained (in two stages). — Robin van Hoorn, May 09 '23 at 20:09

score 1 · Answer 1 · answered May 09 '23 at 20:08

ChatGPT derives also from InstructGPT that is subject to a reinforcement learning training with human feedback (called RLHF) in order to enble it for instruction following.

Basically, to get a chatGPT you take a GPT model, pre-train it to predict the next token in a sentence (so that it learns the structure of the language), and then RLHF is applied to enable it to follow instructions specified in a prompt. Otherwise it will give you just random (but plausible) complexions of sentences.

The power of chatGPT is the combination of a very large model (billions of parameters), a huge dataset, and a large collection of human annotated prompts and results for the instruction follwing thing.

How are the parts of GPT connected?

1 Answers1