3

I'm asking specifically about ChatGPT4, but the question could apply to either that or 3.5.

When you use the ChatGPT API, it's of course up to you to manage conversation history and include that in successive API calls within available context length in whatever manner you choose.

In the case of the web interface, they've obviously implemented some system to manage conversation history in context. It clearly doesn't "remember" the entire thing once the conversation gets very long, because it doesn't have infinite context length. So, what strategy does it use to send conversation history to the model once it's exceeded its context length? Does it truncate all content prior to the max context length? Does it summarize earlier parts of conversations to more efficiently fit them within the context? Does it do some dynamic strategy combining many inputs?

Or is this just another case where we just don't know, and OpenAI is being tight-lipped about what it's actually doing?

Ascendant
  • 131
  • 2
  • well, either it uses some "last N tokens", or it might just ask itself to also summarize the history, and the next time to summarize the summarization+the new question, and so on – Alberto Sep 01 '23 at 15:08
  • @Alberto is there a source for this? Transformers can be finetuned to handle longer contexts (e.g., see https://lmsys.org/blog/2023-06-29-longchat/), so compressing the input tokens doesn't seem absolutely necessary to me. – Alexander Wan Sep 03 '23 at 07:09
  • @AlexanderWan it's not a matter of architecture, it's a matter of efficiency (price-wise)... no, "it might just ask itself" it was just a personal thought, but you can see something like this implemented in RMT – Alberto Sep 04 '23 at 11:00

0 Answers0