Why Is My Logistic Regression So Bad? Troubleshooting Low Accuracy

May 26, 2025 1 minute read

Introduction

I thought logistic regression would be a reliable baseline.
But I got 58% accuracy — not even close to acceptable.

Here’s how I debugged it, what mistakes I found, and what ultimately helped.

1. “Did I Forget to Normalize?”

Logistic regression isn’t distance-based, but feature scaling still matters.
→ Apply StandardScaler:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

📉 Accuracy rose slightly to 60%.
✅ Somewhat better, but still lacking.

2. “Is This Due to Class Imbalance?”

import numpy as np

np.bincount(y_train)
# Output: [870, 130]

💡 Severe imbalance. Most predictions defaulted to class 0.

Fix: Use class_weight='balanced'

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(class_weight='balanced')
model.fit(X_train_scaled, y_train)

📈 Accuracy now 66%. Recall improved significantly.

3. “What If Accuracy Isn’t the Best Metric?”

Check precision, recall, f1 instead:

from sklearn.metrics import classification_report

y_pred = model.predict(X_test_scaled)
print(classification_report(y_test, y_pred))

Insight: Accuracy hid key issues.

Precision was okay but recall was low
F1-score showed room for threshold tuning

What I Realized

Logistic regression is sensitive to preprocessing.
Imbalanced data can ruin performance.
Accuracy alone is a misleading metric — look at the full picture.

What I Want to Do Next

Try threshold tuning for F1 maximization
Experiment with SMOTE for resampling
Compare logistic regression with tree-based models

The model wasn’t bad —
I just didn’t understand it well enough. Until now.

Share on

X Facebook LinkedIn Bluesky

Zeu Park

Why Is My Logistic Regression So Bad? Troubleshooting Low Accuracy

Introduction

1. “Did I Forget to Normalize?”

2. “Is This Due to Class Imbalance?”

3. “What If Accuracy Isn’t the Best Metric?”

What I Realized

What I Want to Do Next

Share on

You May Also Enjoy

Oracle Free Tier Limitations: Regional Resource Exhaustion and Deployment Dilemmas

Is Using AI to Write Code Helping or Hurting My Long-Term Growth?

Why Feature Engineering and Domain Knowledge Outperform Fancy Models

What is LLM Fine-Tuning? Making the Model Speak Your Language