Understanding memory mechanisms
LangChain chains and any code you wrap them with are stateless. When you deploy LangChain applications to production, they should also be kept stateless to allow horizontal scaling (more about this in Chapter 9). In this section, we’ll discuss how to organize memory to keep track of interactions between your generative AI application and a specific user.
Trimming chat history
Every chat application should preserve a dialogue history. In prototype applications, you can store it in a variable, though this won’t work for production applications, which we’ll address in the next section.
The chat history is essentially a list of messages, but there are situations where trimming this history becomes necessary. While this was a very important design pattern when LLMs had a limited context window, these days, it’s not that relevant since most of the models (even small open-sourced models) now support 8192 tokens or even more...