I tried building an LLM coding assistant using GPT4. This is what I learned about the current limits of this tech.
The main challenge is handling projects that exceed the LLM’s context size, a problem familiar to other LLM applications like document Q&A. This is especially challenging when each query needs to use the most recent data, and the data is updated frequently.
Fine-tuning the model for every request isn’t feasible, so an alternative data access solution is required. Since the context size is not large enough, we need better mechanisms. Here are a few possibilities:
- Limit support to the current project scope - Copilot Chat follows this strategy. It uses your most recently opened files for its prompts. However, tasks unrelated to these files can’t be done.
- Use vector stores - This works for some tasks but falls short for complex ones. For example, a task like “write a readme file for this project” isn’t semantically linked to specific code, so the retrieved chunks probably won’t be enough.
- Grant LLM access to the file system - This approach can theoretically handle larger projects but involves long LLM chains (requesting one file after another), which don’t work in practice yet. It’s like a new developer looking at your codebase for the first time - he might need to check many files for an accurate response.
- Mix methods 2 & 3 with long-term memory management (like LLM reflection) - This resembles how an experienced developer might operate. He will build a mental model of your codebase over time and will usually need to review just a few files per task.
You might think, “As context sizes grow, won’t this issue resolve itself?” While that might be true to an extent, there are trade-offs limiting the practical use of large context sizes: hardware cost, energy consumption, and inference speed.
Unless a significant tech breakthrough happens, we’ll still need to limit the context in many situations.
What is your experience with handling large changing data with LLMs? Let me know on Twitter.