Full Stack AI
Posts
How to Double Your Iteration Speed in AI Projects

How to Double Your Iteration Speed in AI Projects

Three simple steps to run more experiments and ship quicker

Felix Peters
May 02, 2023

The key to success in AI projects is to iterate quickly.

In the past, I’ve made the mistake of overengineering and overthinking a lot. Especially in the early phases of the project, it’s tempting to address all problems at once. But this approach leads to project delays, frustrating all stakeholders.

Over time, I adopted a different approach to running my projects. It is primarily based on the Lean Startup methodology.

Let’s dive right in.

Learn more and ship quicker with fast iterations

The Lean Startup methodology says entrepreneurs should focus on running many small iterations to find product-market fit. You build an early MVP based on your business model, evaluate it with prospective customers, and adjust your business model and/or product accordingly.

We can apply this to AI projects as well. In our context, the MVP is a complete ML pipeline, and an iteration equals an experiment.

This approach has many advantages:

You will learn more in a shorter amount of time.
You will avoid overthinking and over-engineering.
You will be able to demo your current model at all times.
You will not run the same experiment twice.

Three simple steps are required to enable running more experiments in less amount of time.

Let’s go through them one by one.

Step #1: Create a reproducible environment

Nothing creates more friction than having to install specific package versions over and over again, just to run your preprocessing notebook.

A reproducible environment will enable you to run experiments quickly without having to set up the required infrastructure over and over again.

What kind of environment you choose depends on the use case.

In most cases, a simple Python virtual environment will be sufficient. I prefer a combination of pyenv (for managing Python versions) and venv (for managing package versions) for most of my projects.

If you’re switching between platforms a lot, or have multiple people working on the same project, containerized environments like VSCode Dev Containers might be a more suitable choice.

Step #2: Establish an end-to-end pipeline early on

You should be able to go from raw data to performance metrics on your test set at all times during the project.

This way, you will always have up-to-date metrics to share with major project stakeholders.

Don’t get me wrong. This does not have to be a completely automated Kubeflow pipeline.

It can be as simple as a collection of notebooks and scripts, plus a configuration and readme file.

Typical pipelines in my projects look like this:

An EDA notebook for examining large amounts of data and descriptive statistics
A data preparation script that (optionally) downloads the raw data and converts it to the required format
A model training script that takes the preprocessed data, trains a model, tracks the experiment, and stores the resulting artifacts in the artifact store
An inference script applying the trained model to the test set
A performance evaluation notebook that calculates all relevant metrics on the test set and enables qualitative evaluation

This is all that’s needed in the early stages of a project.

Step #3: Document your findings

I know, I know. Nobody wants to do documentation. But hear me out.

Documenting the main findings of your experiment will save you lots of time down the road.

You don’t need to write pages of text. Just summarize a few key points of each experiment for your future self (and your teammates):

The primary experiment variables
Link to the tracking run for this experiment (e.g., in Weights & Biases)
Short description of the results (with some key visuals)
Proposals for the next experiments based on your findings

You can do this in markdown pages in your repo, or a dedicated solution such as a Weights & Biases report. It only matters that you do it.

How to run more expeiments in AI projectss

That's it.

If you focus on these three factors, you will run two iterations where you previously ran one.

As a result, your AI-powered product will improve faster.

And that's what we're all after, right? 😉

If you find value in these posts, consider forwarding them to your friends/colleagues (if you’re receiving the email), or subscribing to my newsletter (if you found this via the blog). Also, follow me on LinkedIn for daily tips on building AI-powered products.