is magical — until you’re stuck trying to decide which model to use for your dataset. Should you go with a random forest or logistic regression? What if a naïve Bayes model outperforms both? For most of us, answering that means hours of manual testing, model building, and confusion.
But what if you could automate the entire model selection process?
In this article, I’ll walk you through a simple but powerful Python automation that selects the best machine learning models for your dataset automatically. You don’t need deep ML knowledge or tuning skills. Just plug in your data and let Python do the rest.
Why Automate ML Model Selection?
There are multiple reasons, let’s see some of them. Think about it:
- Most datasets can be modeled in multiple ways.
- Trying each model manually is time-consuming.
- Picking the wrong model early can derail your project.
Automation lets you:
- Compare dozens of models instantly.
- Get performance metrics without writing repetitive code.
- Identify top-performing algorithms based on accuracy, F1 score, or RMSE.
It’s not just convenient, it’s smart ML hygiene.
Libraries We Will Use
We will be exploring 2 underrated Python ML Automation libraries. These are lazypredict and pycaret. You can install both of these using the pip command given below.
pip install lazypredict
pip install pycaret
Importing Required Libraries
Now that we have installed the required libraries, let’s import them. We will also import some other libraries that will help us load the data and prepare it for modelling. We can import them using the code given below.
import pandas as pd
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier
from pycaret.classification import *
Loading Dataset
We will be using the diabetes dataset that is freely available, and you can check out this data from this link. We will use the command below to download the data, store it in a dataframe, and define the X(Features) and Y(Outcome).
# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
df = pd.read_csv(url, header=None)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
Using LazyPredict
Now that we have the dataset loaded and the required libraries imported, let’s split the data into a training and a testing dataset. After that, we will finally pass it to lazypredict to understand which is the best model for our data.
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# LazyClassifier
clf = LazyClassifier(verbose=0, ignore_warnings=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
# Top 5 models
print(models.head(5))

In the output, we can clearly see that LazyPredict tried fitting the data in 20+ ML Models, and the performance in terms of Accuracy, ROC, AUC, etc. is shown to select the best model for the data. This makes the decision less time-consuming and more accurate. Similarly, we can create a plot of the accuracy of these models to make it a more visual decision. You can also check the time taken which is negligible which makes it much more time saving.
import matplotlib.pyplot as plt
# Assuming `models` is the LazyPredict DataFrame
top_models = models.sort_values("Accuracy", ascending=False).head(10)
plt.figure(figsize=(10, 6))
top_models["Accuracy"].plot(kind="barh", color="skyblue")
plt.xlabel("Accuracy")
plt.title("Top 10 Models by Accuracy (LazyPredict)")
plt.gca().invert_yaxis()
plt.tight_layout()

Using PyCaret
Now let’s check how PyCaret works. We will use the same dataset to create the models and compare performance. We will use the entire dataset as PyCaret itself does a test-train split.
The code below will:
- Run 15+ models
- Evaluate them with cross-validation
- Return the best one based on performance
All in two lines of code.
clf = setup(data=df, target=df.columns[-1])
best_model = compare_models()


As we can see here, PyCaret provides much more information about the model’s performance. It may take a few seconds more than LazyPredict, but it also provides more information, so that we can make an informed decision about which model we want to go ahead with.
Real-Life Use Cases
Some real-life use cases where these libraries can be beneficial are:
- Rapid prototyping in hackathons
- Internal dashboards that suggest the best model for analysts
- Teaching ML without drowning in syntax
- Pre-testing ideas before full-scale deployment
Conclusion
Using AutoML libraries like the ones we discussed doesn’t mean you should skip learning the math behind models. But in a fast-paced world, it’s a huge productivity boost.
What I love about lazypredict and pycaret is that they give you a quick feedback loop, so you can focus on feature engineering, domain knowledge, and interpretation.
If you’re starting a new ML project, try this workflow. You’ll save time, make better decisions, and impress your team. Let Python do the heavy lifting while you build smarter solutions.