- Full Stack AI
- Posts
- Beyond the Hype: The Long-Term Business Value of Generative AI
Beyond the Hype: The Long-Term Business Value of Generative AI
The Generative AI industry is on fire:
OpenAI's ChatGPT reached 100 million users within two months, faster than any other app before.
Generative AI companies received over $2.5 billion in funding in 2022.
Stable Diffusion models have been downloaded from Huggingface almost 10 million times in the last month alone.
However, what is the long-term business value that can be derived from these emerging technologies? Let's explore both current and future use cases.
Today
Generative AI tools can currently be grouped into two main use cases.
Use case 1: Autocomplete for everything
This term was coined by Noah Smith to describe the potential of generative AI models in supporting us with creative endeavors. These models allow humans to enter their idea in the form of a prompt, and the AI system generates a first draft based on the prompt. Humans can then edit and refine the output, resulting in a much more polished final product. In fact, I used this technique to write this post using Notion AI. This approach has already been applied in several use cases.
Jasper.ai is an AI writing assistant that can help with creative writing,
Github Copilot is a tool that uses large language models to assist with programming.
Text-to-image models like Stable Diffusion or Midjourney can be used by creatives to prototype assets quickly, e.g., app icons or interior design.
Use case 2: Answer engines
Although GPT-3 has been available for some time now, it was ChatGPT that popularized generative question answering. This technology has the potential to revolutionize search engines, as demonstrated by Bing, You.com, and Neeva, who have incorporated it into their systems. Instead of receiving links based on the user’s query, these enhanced search engines return answers.
Another exciting use of similar technology is called "chat-your-data". These systems enable users to ask custom chatbots questions about a given piece of data. The data can take any form that can be represented as text, such as meeting notes, software code, or podcast transcripts. The user receives an answer directly, without needing to scan the entire text. Example use cases are:
Context is a tool that allows users to search through YouTube playlists (e.g. of popular podcasts like The Tim Ferriss Show) based on keywords or topics of interest.
Elicit is another tool that utilizes GPT-3 technology to summarize research findings based on the user's interests. This tool can save researchers time and effort by automating the literature review process.
In the (near) future
With the available technology and current developments in the field, I predict that there will be at least two main use cases for even more powerful generative AI tools in the not-so-distant future.
Use case 1: Personal assistant
Hypothesis: Future personal assistants will assist users with a wide range of tasks, from answering simple questions to performing complex actions.
Creating Ironman-like personal assistants will require solving the following problems:
The next generation of chatbots has to be multimodal by nature, capable of understanding various forms of input, such as speech, text, images, and data tables. They will also be able to choose the most appropriate representation of results for a given task. Initial research in this field has experimented with multimodal chatbots, and YouChat is already capable of outputting a handful of formats depending on the input query.
Personal assistants will need to rely heavily on external knowledge retrieval to output faithful information. Knowledge can be retrieved from sources such as knowledge graphs (e.g., biomedical ontologies), tools (e.g., Wolfram Alpha), and APIs (e.g., Crunchbase).Models like RETRO from DeepMind can already incorporate external knowledge into their outputs, and initial experiments show that LLMs can teach themselves how to use external tools. You can test this out for yourself in this cool demo by James Weaver.
Moreover, personal assistants must be extensible and capable of providing end-to-end process support. For example, they should be able to set up background processes to monitor prices, and they perform actions such as booking flights or purchasing items without the user having to interface with a website. Some tools on the market are already going in this direction. Although still in private beta, Fixie.ai seems to position itself as a platform for building agents capable of solving complex problems. Additionally, HTTPie’s AI could potentially be useful for interfacing with APIs.
Use case 2: Content creator
Hypothesis: Powerful generative AI models beyond just image generation will speed up all content creation and design processes.
However, in order to fully realize the potential of generative AI in this area, the following technical challenges must be addressed:
Creating long-form multimedia content is essential for supporting creatives in a wide range of fields. Whether you are working on a film, designing a video game, or writing a book, modeling temporal dependencies is necessary to generate reasonable outputs. Existing tools that support with creating long-ish content include Tome (for presentations), Runway (for video editing), and Luma (for 3D rendering).
Content creation systems will not just be limited to text-to-X functionality, but will also provide intuitive interaction features. Creators will be able to input rough sketches in different formats instead of just text prompts. Furthermore, the AI outputs will be editable through text-based interfaces as well as gestures. Recent research already provides better guidance approaches for text-to-image models, such as Stable Diffusion. For example, ControlNet (demo) allows users to control Stable Diffusion generation better by providing sketches or example images. In addition, tools like remove.bg have established what good UX related to image editing should look like.
In addition, future content will be easily customizable. For example, it will be possible to automatically personalize content for the user. Children's books will feature the reader and their loved ones, and it will be possible to insert oneself into any video game with the click of a button. Personalization is already being used in text-to-image models. The most established personalization method is DreamBooth (which is used to generate avatars in Lensa). Additional approaches, such as Custom Diffusion by Adobe Research or LoRA, are being developed to overcome DreamBooth's limitations, such as large checkpoint file sizes and limited combination possibilities. Personalization features also seem to be part of Runway's impressive Gen-1 text-to-video model.
Conclusion
Beyond the current hype around models like ChatGPT, the potential for generative AI is real. Even contemporary approaches allow professionals to become more productive by supporting tasks like writing or programming. It's difficult to say how long it will take to realize the outlined future use cases. However, it should be clear that chatbots and other virtual assistants might change the way we interact with the web. Thus, organizations and individuals should prepare for the foreseeable future by asking themselves the following questions:
How can we use the existing tooling today to increase our productivity?
Which of our processes will most strongly be influenced by the future developments in this field?
How will users interact with our services in a world where the classical search engine-to-website route is not the main entrypoint to our business?