Generative vs. Predictive AI, Part 2

Why can generative AI companies launch products so quickly?

I stumbled upon this tweet by Alex Ratner, the CEO of the data-centric AI platform Snorkel, a while ago.

tl;dr: Generative AI companies can launch their products quicker, but most high-value AI use cases are driven by predictive AI.

Especially the first point resonated with me, based on my recent experience.

While I’ve been working on predictive AI systems for years, I recently got the chance to work on an app powered by DreamBooth-customized Stable Diffusion models.

In fact, we were able to launch a working prototype fairly quickly, at least much quicker than we usually do, e.g., when working on visual inspection systems.

Naturally, I wanted to dive in deeper.

Here’s my attempt to compare predictive and generative AI systems from a technical perspective.

The comparison framework

I will compare both approaches along the typical tech stack for AI-powered applications.

Here’s what it looks like:

Tech stack for AI-powered products

It’s largely inspired by this great blog post from Cohere’s Jay Alammar.

All AI companies rely on ML models which are trained using data and MLOps tooling.

Don’t get me wrong. I’m referring to MLOps tooling as the software which is required for training the models.

It can be as simple as a collection of Jupyter notebooks, and as complex as a KubeFlow pipeline for automatic retraining.

The model training is typically run on cloud platforms which allow for flexible computing.

Finally, trained models are exposed to the end user through the application layer, for example a web application.

Why is building predictive AI systems so hard?

As described in my previous post, predictive AI systems typically have very high performance requirements.

Let’s say we want to develop an AI-powered application for supporting the detection of lung tumors from chest CT scans.

Clinical application of such as system will require precision and recall metrics beyond 95%.

To achieve these metrics, we will need high-quality data for training our models which covers a lot of edge cases.

How do we accomplish that?

Sure, we can download a publicly accessible dataset like LIDC-IDRI and start from there.

But it won’t be enough. Why?

  • The dataset contains a little over 1k scans. This won’t be enough to cover the natural variance in the population.

  • The annotations are not optimal. Some tumors were detected by only one annotator and some by five annotators. In addition, the annotators often don’t agree on the severity of the tumor.

Thus, we need to build a data engine to derive additional edge cases and high-quality consensus annotations.

I will go into the details of a data engine in a future post.

For now, it’s sufficient to understand that it involves many iterations of sourcing data, deriving and evaluating labels, and performing detailed error analysis to inform the next steps.

On the model side, we will most likely use a proven neural network architecture (maybe even pre-trained weights), but it will require fine-tuning on our custom dataset.

At this point, you can see that the development process for predictive AI systems is experimentation-heavy.

It requires an iterative and more scientific approach.

As a result, predictive AI companies typically own large parts of their tech stack, typically everything besides compute (cloud platform) and some parts of their MLOps tooling (e.g., an experimentation platform).

Now, what about generative AI?

Why can generative AI apps launch so quickly?

As you may have noticed, generative AI apps are spawning left and right.

I wanted to post a recent market map here, but by the time you’re reading this, it is most likely outdated already.

Why is this the case?

As described in my previous post, generative AI systems are built on top of foundation models.

These foundation models can be open (e.g., Stable Diffusion on HuggingFace) or closed source (e.g., ChatGPT by OpenAI).

Let’s say we want to launch an assistant for creating meeting notes. It automatically creates notes for our Zoom meetings and lets users interact with the notes using a chatbot.

We can easily create this using the OpenAI API endpoints:

  • We can transcribe the Zoom recording using the Whisper model.

  • Summarize the meeting transcript using the GPT-4 endpoint.

  • Our chatbot can be powered by the ChatGPT endpoint.

It is not required to train custom models or collect proprietary datasets for launching this app.

You’re mostly playing at the application layer. The development process is more engineering-heavy and requires traditional software engineering techniques.

Our main tasks will involve building a web application and developing middleware to keep track of the various API calls.

To conclude, generative AI apps are less vertically integrated at launch, because they only require the development of the application layer.

Tech stack of predictive vs. generative AI companies at launch (blue = owned, yellow = externally sourced, green = part owned, part externally sourced)

How will this play out over time?

At this point, you might ask yourself: How defensible is our product? What keeps other startups or incumbents like Microsoft or Zoom from replicating our meeting notes app?

Good question.

Right now, a superb user experience and moving quickly seem to be the only moats for our product, and they’re not defensible over time.

Incumbents are moving quickly these days, as exemplified by the daily product announcements from big tech companies.

What can we do about this? Here are a few options:

  • Use the chatbot interactions and user feedback to train even better, custom models which create an enhanced user experience.

  • Connect additional data sources to enrich the meeting notes, e.g., a company’s Notion workspace.

  • Deploy our own transcription and summarization models (e.g., on Huggingface) to save costs and create a pricing moat.

All of these approaches require us to own deeper parts of the tech stack.

Prediction: The few big winners in generative AI outside of big tech companies will likely be vertically integrated.

To justify this prediction, let’s look at the most established startups in this space:

  • Runway trains custom models for its video and image editing platform, going as far as publishing its own research on video generation models.

  • Jasper for Business allows companies to fine-tune writing assistants to match their brand voice.

Let’s see how this prediction ages. At least a16z tends to agree with me.

What do you think about this topic? Please let me know by answering to this email or DM’ing me on LinkedIn or Twitter.

If you came across this article on my website, please consider subscribing to my newsletter to receive these kinds of insights straight in your inbox. You can find the archive of previous issues here.