Data Science

How do I automate a machine learning workflow using only 10 lines of Python

It’s magical – until you try to determine which model to use for your dataset. Should you be random forest or logistic regression? What if the naive Bayesian model is better than both? For most of us, answering this means hours of manual testing, model building and confusion.

But what if you can automate the entire model selection process?
In this article, I will walk you through a simple but powerful Python automation that automatically selects the best machine learning model for your dataset. You don’t need in-depth ML knowledge or tweaking skills. Just insert your data and let Python do the rest.

Why is ML model selection automatically selected?

There are a number of reasons, let’s take a look at some of them. Consider it:

  • Most datasets can be modeled in a number of ways.
  • Trying each model manually is time consuming.
  • Choosing the wrong model early on may derail your project.

Automation allows you to:

  • Compare dozens of models immediately.
  • Get performance metrics without writing duplicate code.
  • Based on accuracy, F1 score or RMSE determine the best performing algorithm.

This is not only convenient, but also smart ML hygiene.

The library we will use

We will explore 2 underrated Python ML automation libraries. These are all Lazy and Pycaret. You can install these two using the PIP command given below.

pip install lazypredict
pip install pycaret

Import the required libraries

Now that we have installed the required libraries, let’s import them. We will also import some other libraries that will help us load our data and prepare for modeling. We can import them using the code given below.

import pandas as pd
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier
from pycaret.classification import *

Loading the dataset

We will use a freely available diabetes dataset that you can view from this link. We will use the command below to download the data, store it in the data frame, and then define X (function) and Y (result).

# Load dataset
url = "
df = pd.read_csv(url, header=None)

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

Lazy usage

Now that we have loaded the dataset and imported the required libraries, let’s divide the data into training and test datasets. After that, we ended up passing it to LazyPredict to understand which is the best model for our data.

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# LazyClassifier
clf = LazyClassifier(verbose=0, ignore_warnings=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

# Top 5 models
print(models.head(5))

In the output, we can clearly see that laziness is trying to fit the data into a 20+ ML model, and performance in terms of accuracy, ROC, AUC, etc. is proven to be the best model for the data. This makes the decision less time consuming and more accurate. Similarly, we can create graphs of the accuracy of these models to make it a more visual decision. You can also check that the time spent is negligible, thus saving it more time.

import matplotlib.pyplot as plt

# Assuming `models` is the LazyPredict DataFrame
top_models = models.sort_values("Accuracy", ascending=False).head(10)

plt.figure(figsize=(10, 6))
top_models["Accuracy"].plot(kind="barh", color="skyblue")
plt.xlabel("Accuracy")
plt.title("Top 10 Models by Accuracy (LazyPredict)")
plt.gca().invert_yaxis()
plt.tight_layout()
Model performance visualization

Using Pycaret

Now, let’s check how Pycaret works. We will create models and compare performance using the same dataset. We will use the entire dataset because Pycaret itself will do test training.

The following code will:

  • Run more than 15 models
  • Evaluate them with cross-validation
  • Return the best based on performance

All are two lines of code.

clf = setup(data=df, target=df.columns[-1])
best_model = compare_models()
Pycaret data analysis
Pycaret model performance

As we see here, Pycaret provides more information on model performance. This may take a few seconds more than lazy booking, but it also provides more information so that we can make informed decisions about which model we want to use.

Real-life use cases

Some real-life cases where these libraries may be beneficial are:

  • Rapid prototyping for hackathons
  • Internal dashboard provides analysts with the best model
  • Teaching ML without drowning in grammar
  • Fully deployed pre-test ideas

in conclusion

Using a car garage like we discussed does not mean you should skip learning the math behind the model. But in a fast-paced world, this is a huge productivity gain.

What I like about laziness and pycaret is that they provide you with a quick feedback loop so you can focus on functional engineering, domain knowledge and explanation.

If you are starting a new ML project, try this workflow. You will save time, make better decisions and impress your team. Let Python do the heavy lifting when building smarter solutions.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button