Juno
Devlog #1
Thursday, December 21st, 2023
The Idea
When I was like 11 I was a huge fan of Homestuck. One of my favourite characters, Dirk, had an AI clone of himself living in his sunglasses. I think that's what first implanted the idea that I'd love to have an AI clone of myself. Now that I'm all grown up and I am a crazy genius machine learning engineer, I've been theorizing how to create something similar to Dirk's chatbot using large language models.
I'm naming my chatbot Juno, after the Roland Juno-60 synthesizer. I wanted to pick something relatively gender neutral because I think it's weird to assign a robot gender. The goal is to give it the ability to think beyond simply generating the next token, and to allow it to hopefully build an internal representation of the world. Eventually I would like to give it some capabilities beyond conversation, maybe allowing it to search the web, create its own files and notes, maybe give it some way of "seeing" my activity on the computer, so it can respond in real time to what I'm doing, but that's far out of reach for me right now.
The Plan
Memory Storage
First issue with building a chatbot is how to manage its memory, since LLMs have a limited context window (I think the ones I've been working with have a context window of like 2048 or 4096 tokens?). The approach most people take is to just insert the last like 1000 tokens or whatever of the conversation into the prompt. The issue is obviously that the bot has no long term memory, and after speaking to it for more than a few short messages, it will forget everything.
A solution which looks promising is langchain's ConversationEntityMemory. In this implementation, when the user sends a message, it gets read by an LLM, which extracts all the named entities (people like Sam or Barabra, places like New York and things like The Dark Soul) and checks its database for descriptions of those entities. These descriptions then get inserted into the prompt which the main LLM recieves and generates a response from. Afterwards, another LLM examines the message and updates the descriptions of each entity. These descriptions are stored back in a dictionary for later use.
I think this is the most robust memory system I've seen so far, but it's incredibly costly, requiring 3 calls to an LLM per message. To make my implementation more efficient, I need a way of extracting entities without reliance on an LLM. I think pythons NLTK is going to be my best bet, as it has functions for named-entity recognition, however I believe these also rely on some sort of neural network. It would be best if there was a standardized way of parsing text in this way, but if neural nets are the only option so be it. Probably more efficient than using an entire LLM at least. As for generating descriptions, I think there are probably some really minimal language models that could do this. Hopefully one under 1B parameters. The question then is how to formulate the prompt to the main LLM, since I will also need to hold some of the most recent conversation as well.
Internal Representation
I saw some study or some post on Medium or something where they had chatGPT ask itself questions before it formulated a response, and it allowed chatGPT to give better responses because it basically thought harder about each response. I'd like to give Juno this ability, but again, the cost of calling an LLM once per message is high enough, response times would be crazy if I was calling it 2 or more times per prompt to think over its answer. One solution would be to try to get Juno to generate its thoughts in the same call as its response. If I specified that juno should generate responses in a format like:
Juno's thoughts:
bla bla bla
Juno:
bla bla bla
For whatever reason, I feel like this isn't right. Like somehow Juno would be more consious if it had a seperate LLM dedicated to forming internal thoughts. I'm not really sure how to explain that though. I also think having a seperate model for thinking would allow Juno to better utilize its context window since I could dump a ton of info into the thinking LLM, then just send the thoughts to the main llm for the response.
Idk. Not sure if any of this even makes sense to anyone else.
Progress
So currently I have a version of Juno with kind of broken memory. This was when I was experimenting with a sort of summarization buffer memory vector database combo. It's running off of llama_ccp_python, and I used an Orca 7b model as the LLM.
Currently short conversations work pretty well, but after you hit the summarization limit, the quality of responses drastically reduces and half the time it just spews jibberish. Here's a screenshot of me asking it for a comment about this blog post:
Looking forward to coming back to this project during Christmas break. I just wanted to show off what I have so far and basically just write something so that my website isn't completely empty. With that, I am off to study for my exam which is in 4 hours!
Peace out,
Declan