Overfitting or Just Lucky? What I Learned Debugging a Too-Perfect ML Model

May 26, 2025 1 minute read

Introduction

Today, something strange happened.
My model scored 99.7% accuracy on the test set. At first, I was thrilled — then skeptical:
Is this overfitting, or did I make a mistake somewhere?

This post documents the journey from that suspicion to discovery and learning.

1. “99% Accuracy… Is That Even Real?”

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# Output: 0.997

First suspicion: Maybe there’s data leakage.
Check: Reinspect the train/test split.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    features, target, test_size=0.2, random_state=42, stratify=target
)

✅ Split looks fine. But the accuracy is still abnormally high.

2. “Am I Accidentally Predicting the Target Itself?”

Maybe the target or a proxy made it into the features? Let’s check .corr():

import pandas as pd

correlations = df.corr(numeric_only=True)["target"].sort_values(ascending=False)
print(correlations.head(5))

🔍 Turns out some features are nearly perfectly correlated with target.
The culprit: a derived column closely related to the label — possibly calculated from it.

3. “What Features Should Be Dropped Then?”

Conclusion:

Columns like IDs may act as surrogate keys and leak label information.
Derived columns that are direct transformations of the target must be excluded.

✅ After removing those columns: new accuracy is 81.2% — much more believable.

What I Realized

High accuracy can be a red flag.
Data leakage is common and often subtle.
Simple tools like .corr() can uncover big issues.
Before trusting performance metrics, always question your data.

What I Want to Do Next

Automate feature checks for target leakage
Add column validation in data split logic
Build rules to detect label-like engineered features

When your model’s too good to be true,
check the data before you celebrate.

Share on

X Facebook LinkedIn Bluesky

Zeu Park

Overfitting or Just Lucky? What I Learned Debugging a Too-Perfect ML Model

Introduction

1. “99% Accuracy… Is That Even Real?”

2. “Am I Accidentally Predicting the Target Itself?”

3. “What Features Should Be Dropped Then?”

What I Realized

What I Want to Do Next

Share on

You May Also Enjoy

Oracle Free Tier Limitations: Regional Resource Exhaustion and Deployment Dilemmas

Is Using AI to Write Code Helping or Hurting My Long-Term Growth?

Why Feature Engineering and Domain Knowledge Outperform Fancy Models

What is LLM Fine-Tuning? Making the Model Speak Your Language