|
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970 |
- title: Poking around OpenAI.
- url: https://lethain.com/openai-exploration/
- hash_url: 4a485034e94dc6123a624e8a589e8dac
-
- <p>I haven’t spent much time playing around with the latest LLMs,
- and decided to spend some time doing so. I was particularly curious
- about the usecase of using embeddings to supplement user prompts
- with additional, relevant data (e.g. supply the current status of their
- recent tickets into the prompt where they might inquire about progress on
- said tickets). This usecase is interesting because it’s very attainable
- for existing companies and products to take advantage of, and I imagine it’s
- roughly how e.g. Stripe’s GPT4 integration with their documentation works.</p><p>To play around with that, I created a script that converts all of my writing
- into embeddings, tokenizes the user-supplied prompt to identify relevant sections
- of my content to inject into an expanded prompt, and sent that expanded prompt
- to OpenAI AI’s API.</p><p>You can <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py">see the code on Github</a>,
- and read my notes on this project below.</p><h2 id="references">References</h2><p>This exploration is inspired by the recent work
- by <a href="https://eugeneyan.com/writing/llm-experiments/#llm-tools-to-summarize-query-and-advise">Eugene Yan</a>
- and <a href="https://simonwillison.net/2023/Apr/4/llm/">Simon Willison</a>.
- I owe particular thanks to <a href="https://twitter.com/eugeneyan/status/1646336530695467010">Eugene Yan</a>
- for his suggestions to improve the quality of the responses.</p><p>The code I’m sharing below is scrapped together from a number of sources:</p><p>I found none of the examples quite worked as documented, but ultimately I was able to get them working
- with some poking around, relearning Pandas, and so on.</p><h2 id="project">Project</h2><p>My project was to make the OpenAI API answer questions with awareness of all of my personal writing from this blog,
- <a href="https://staffeng.com">StaffEng</a> and <a href="https://infraeng.dev/">Infrastructure Engineering</a>.
- Specifically this means creating embeddings from Hugo blog posts in Markdown to use with OpenAI.</p><p>You can <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py">read the code on Github</a>.
- I’ve done absolutely nothing to make it easy to read, but it is a complete example, and you could use
- it with your own writing by changing <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py#L112">Line 112</a>
- to point at your blog’s content directories. (Oh, and changing the prompts on <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py#L260">Line 260</a>.</p><p>You can see a screenshot of what this looks like below.</p><p><img src="/static/blog/2023/openai-experiment.png" alt="Screenshot of terminal program running Github lethain/openai-experiment"></p><p>This project is pretty neat, in the sense that it works. It did take me a bit longer than expected, probably about three hours
- to get it working given some interruptions, mostly because the documentation’s examples were all subtly broken or didn’t actually connect
- together into working code. After it was working, I inevitably spent a few more hours fiddling around as well.
- My repo is terrible code, but is a full working code if anyone
- else had similar issues getting the question answering using embeddings stuff working!</p><p>The other comment on this project is that I don’t really view this as a particularly effective solution to the problem I wanted to solve,
- as it’s performing a fairly basic k-means algorithm to match tokenized versions of my blog posts against the query,
- and then injecting the best matches into the GPT query as context. Going into this, I expected, I dunno, something more
- sophisticated than this. It’s a very reasonable solution, and a cost efficient solution because it avoids any model (re)training,
- but feels a bit more basic than I imagined.</p><p>Also worth noting, the total cost to developing this app and running it a few dozen times: $0.50.</p><h2 id="thoughts">Thoughts</h2><p>This was a fun project, in part because it was a detour away from what I’ve spent most of my time on the last few months,
- which is writing my next book. Writing and editing a book is very valuable work, but it lacks the freeform joy of
- hacking around a small project with zero users. Without overthinking or overstructuring things too much,
- here are some bullet points thoughts about this project and expansion of AI in the industry at large:</p><ul><li>As someone who’s been working in the industry for a while now, it’s easy to get jaded about new things.
- My first reaction to the recent AI hype is very similar to my first reaction to the crypto hype:
- we’ve seen hype before, and initial hype is rarely correlated with long-term impact on the industry
- or on society. In other words, I wasn’t convinced.</li><li>Conversely, I think part of long-term engineering leadership is remaining open to new things.
- The industry has radically changed from twenty years ago, with mobile development as the most obvious proof point.
- Most things won’t change the industry much, but some things will completely transform it,
- and we owe cautious interest to these potentially transformational projects.</li><li>My personal bet is that the new AI wave is moderately transformative but not massively so.
- Expanding on my thinking a bit, LLMs are showing significant promise at mediocre solutions to very general problems.
- A very common, often unstated, Silicon Valley model is to hire engineers, pretend the engineers are
- solving a problem, hire a huge number of non-engineers to actually solve the problem “until the technology automates it”,
- grow the business rapidly, and hope automation solves the margins in some later year.
- LLM adoption should be a valuable tool in improving margins in this kind of business,
- which in theory should enable new businesses to be created by improving the potential margin.
- However, we’ve been in a decade of <a href="https://www.readmargins.com/p/zirp-explains-the-world">zero-interest-rate policy</a>
- which has meant that current-year margins haven’t mattered much to folks,
- which implies that most of these ideas that should be enabled by improved margins should
- have already been attempted in the preceeding margin-agnostic decade.
- This means that LLMs will make those businesses better, but the businesses themselves should
- have already been tried, and many of them have failed ultimately due to market size preventing
- required returns moreso than margin of operating their large internal teams to mask over missing margin-enhancing technology.</li><li>If you ignore the margin-enhancement opporunties represented by LLMs,
- which I’ve argued shouldn’t generate new business ideas but improve existing business ideas already
- tried over the last decade, then it’s interesting to ponder what the sweet spot is for these tools.
- My take is that they’re very good at supporting domain experts, where the potential damaged caused by
- inaccuracies is constrained, e.g. Github Copilot is a very plausible way to empower a proficient programmer,
- and a very risky way to train a novice in a setting where the code has access to sensitive resources or data.
- However, to the extent that we’re pushing experts from authors to editors, I’m not sure that’s an actual speed
- improvement for our current generation of experts, who already have mastery in authorship and (often) a lesser
- skill in editing. Maybe there is a new generation of experts who are exceptional editors first, and authors second,
- which these tools will foster. If that’s true, then likely the current generation of leaders is unable to
- assess these tools appropriately, but&mldr; I think that most folks make this argument about most new technologies,
- and it’s only true sometimes. (Again, crypto is a clear example of something that has not overtaken existing
- technologies in the real world with significant regulatory overhead.)</li></ul><p>Anyway, it was a fun project, and I have a much better intuitive sense of what’s possible
- in this space after spending some time here, which was my goal. I’ll remain very curious to
- see what comes together here as the timeline progresses.</p><p class="mt6 instapaper_ignoref"></p>
|