Poking around OpenAI.


I haven’t spent much time playing around with the latest LLMs, and decided to spend some time doing so. I was particularly curious about the usecase of using embeddings to supplement user prompts with additional, relevant data (e.g. supply the current status of their recent tickets into the prompt where they might inquire about progress on said tickets). This usecase is interesting because it’s very attainable for existing companies and products to take advantage of, and I imagine it’s roughly how e.g. Stripe’s GPT4 integration with their documentation works.

To play around with that, I created a script that converts all of my writing into embeddings, tokenizes the user-supplied prompt to identify relevant sections of my content to inject into an expanded prompt, and sent that expanded prompt to OpenAI AI’s API.

You can see the code on Github, and read my notes on this project below.

References

This exploration is inspired by the recent work by Eugene Yan and Simon Willison. I owe particular thanks to Eugene Yan for his suggestions to improve the quality of the responses.

The code I’m sharing below is scrapped together from a number of sources:

I found none of the examples quite worked as documented, but ultimately I was able to get them working with some poking around, relearning Pandas, and so on.

Project

My project was to make the OpenAI API answer questions with awareness of all of my personal writing from this blog, StaffEng and Infrastructure Engineering. Specifically this means creating embeddings from Hugo blog posts in Markdown to use with OpenAI.

You can read the code on Github. I’ve done absolutely nothing to make it easy to read, but it is a complete example, and you could use it with your own writing by changing Line 112 to point at your blog’s content directories. (Oh, and changing the prompts on Line 260.

You can see a screenshot of what this looks like below.

Screenshot of terminal program running Github lethain/openai-experiment

This project is pretty neat, in the sense that it works. It did take me a bit longer than expected, probably about three hours to get it working given some interruptions, mostly because the documentation’s examples were all subtly broken or didn’t actually connect together into working code. After it was working, I inevitably spent a few more hours fiddling around as well. My repo is terrible code, but is a full working code if anyone else had similar issues getting the question answering using embeddings stuff working!

The other comment on this project is that I don’t really view this as a particularly effective solution to the problem I wanted to solve, as it’s performing a fairly basic k-means algorithm to match tokenized versions of my blog posts against the query, and then injecting the best matches into the GPT query as context. Going into this, I expected, I dunno, something more sophisticated than this. It’s a very reasonable solution, and a cost efficient solution because it avoids any model (re)training, but feels a bit more basic than I imagined.

Also worth noting, the total cost to developing this app and running it a few dozen times: $0.50.

Thoughts

This was a fun project, in part because it was a detour away from what I’ve spent most of my time on the last few months, which is writing my next book. Writing and editing a book is very valuable work, but it lacks the freeform joy of hacking around a small project with zero users. Without overthinking or overstructuring things too much, here are some bullet points thoughts about this project and expansion of AI in the industry at large:

Anyway, it was a fun project, and I have a much better intuitive sense of what’s possible in this space after spending some time here, which was my goal. I’ll remain very curious to see what comes together here as the timeline progresses.