Words, words, words
The history and evolution of messaging optimizations at Twitter. One of our early hypotheses was that while we were trying hard to find the best Tweets for our users, we were not necessarily presenting them the right way. We framed the problem as “Imagine a restaurant that has the best dishes one could taste. If the dishes are not plated well, not a lot of people would go in”.
How it all started
One of our early hypotheses was that while we were trying hard to find the best Tweets for our users, we were not necessarily presenting them the right way. We framed the problem as “Imagine a restaurant that has the best dishes one could taste. If the dishes are not plated well, not a lot of people would go in”. The same analogy applies here, the words that users read on the notification were important in delivering the intent of the content shown to them.
This change seeded the beginnings of something new. We changed “In case you missed Elon’s Tweet” to “Elon just Tweeted” and it instantly showed an uptick on our growth dashboards. Nothing had changed in the quality of Tweets recommended just yet, we had only changed the notification's title.
A few weeks later we changed the title from “@elonmusk Retweeted” to “Elon Retweeted”, with even better results. This time, we recorded a lift in user engagement too and to our surprise, fewer users turned off notifications. By this time, the value of picking the right words was apparent to us. We knew we had to run more copy experiments to simplify our product messaging and to genuinely help users make informed decisions.
How it became annoying quickly
Over the next few quarters, our team invested regularly in copy changes. With these frequent changes came several challenges:
The number of copy changes we could test was bottlenecked on the 1 content designer we had across all of the engineering teams.
Code changes would take us 2-3 weeks of coordination time across design, product, and engineering to get the new copy reviewed, translated into 42 languages, and added to the code base. We would then deploy the entire application and monitor the release for every copy change.
As the number of copy changes went up, cataloging them, alongside their metadata and aggregating experiment learnings became a chore. Every time someone tried to change the copy we’d want to go back and look at what we’d learned through the previous experiments. Reading through several Google documents to figure out the key points and surfacing them for the next iteration started eating up sacred engineering hours.
It became evident that we needed a smarter way to expedite this process.
How we patched it
To bring efficiency to this laborious process, Murph Finncum initiated the internal tooling to trigger experimentation for copy changes without redeploying code. Now, we could change a copy on a simple UI, translate it, test it, and ship it to production with a few button clicks.
Soon, this tooling was organically adopted across various teams and non-technical functions. We were shipping copy changes more frequently and improving the product comprehension rapidly. Over time this tooling evolved to support more features towards rapid copy iterations.
State of things today
Turns out there’s a pattern that emerges as companies scale and various functions are added to the mix. Content writers write copies, and hand them off to the product manager who paste them across spreadsheets, documents, and work tickets for reviews, translations, etc and the engineers transfer those into code. This passing of the parcel often results in human errors - “Oops I missed the quotations” and the process repeats all over again. Once there’s a critical number of copies to manage, teams start building tooling out of frustration. The problem? Teams operate in silos, building tools to solve for themselves. They are not designed to scale across teams and use cases. This means we have many different copy management systems floating across the company and the chaos continues.
Add LLMs to this chaos
Now with the adoption of LLMs, specifically towards content creation at scale, we’re presented with 2 additional challenges:
LLM-generated content will be massive, the content lifecycles will also look drastically different. Content will be created, experimented, shipped, and iterated continuously, adapting to every individual’s preferences. The current CMSes are built around human workflows, not at the scale and speed of LLM content generation, which brings much more agility, variations, and testing requirements with it.
LLM-generated content also tends to be generic because tools like ChatGPT are not aware of the context that is locked within various data sources and have no feedback on what’s working effectively in the real world. https://www.reddit.com/r/copywriting/comments/1akn0e4/comment/kpgahj6
Caught in the middle of chaos
As companies rush to infuse AI into their products, many are stuck in the middle on how to effectively make the large volume of AI-generated content useful for themselves and their customers.
The challenge is twofold: (1) GPT-generated content is generic, not informed by real-world data on what works (or not), and (2) It is generated at a pace that the manual processes involving spreadsheets and documents can’t keep up with managing the content.
Conclusion
Recapping our thoughts and learnings so far:
Words make the first impression on users. If we cannot make them sound like we crafted them by hand, we might as well get off the AI bandwagon right now.
Large organizations have to walk cautiously with the use of this content. The risk of throwing up un-tested content in front of users is too high.
As LLMs evolve, serving unlimited variations of content is not so distant.
As costs of content creation go down with LLMs, there will be new requirements for platforms to store, organize, experiment, and serve this massive content. Andrew Chen states it beautifully here.
At just words, we are excited for this future. To that end, we are on a mission to empower companies to continuously and efficiently test high-quality LLM content, keeping their organizational complexities in mind.