From Confusion to Clarity: My Journey Into Machine Learning and Feature Engineering

May 18, 2025 2 minute read

Introduction

When I first encountered machine learning (ML), it was exciting — and also totally overwhelming.
What does it mean to train a model? How do I handle raw data? Do I really need preprocessing if the algorithm is strong enough?

This post is a reflection of the questions I asked, the false starts I had, and how I’m finally beginning to see why feature engineering is the real power behind effective ML.

The Early Questions I Had

Here are some of the honest, beginner questions I asked while starting ML:

“Isn’t it ok to not do preprocessing”
“feature engineering을 why does it matter? the graphs won’t be used for the ML”
“What is group X and y? How is it used in the algorith?”
“What is the differences among many ML models? What are the purpose of each model and why are they important?”

And one of the biggest:

“How many features are needed for the ML to get high credits?”

These weren’t just questions about code — they were about understanding how ML thinks.

Working With the Russian Number Plate Dataset

To make this more real, I worked on a dataset containing Russian car auction records, which included:

plate (numberplate)
date, price
region_code, region_name
government_code, agency
forbidden, priority, significance

What I Tried

Extracted meaningful segments from the plate (e.g. prefix letters)
Determined if a plate was a government car using partial string matching (startswith)
Merged regional dictionaries using region_code
Removed or transformed outliers in price
Created new binary columns from categoricals (like forbidden == 1)
Converted date into month and year features

What I Learned

Raw columns ≠ features. Just having data isn’t enough — you have to transform it.
Feature engineering means knowing the domain. In this case, understanding Russian plates helped me create useful features.
Visualizing outliers helped me decide what to drop, not just guess.
Government vehicle classification wasn’t in the original data — I had to create that signal from patterns.

What I Thought About

All this made me realize:

Machine learning without feature engineering is like cooking without prepping ingredients
You can’t rely on the algorithm to figure out messy signals
Even unsupervised tasks benefit from careful feature shaping

I used to think ML was about fancy algorithms. Now I think it’s about thoughtful representation.

What I Want to Do Next

Finish building a preprocessing pipeline for the plate dataset
Apply LabelEncoder or OneHotEncoder where appropriate
Use correlation matrices to filter out redundant features
Build a basic classification model (e.g. RandomForest) with these features
Practice on another public dataset and compare modeling results with and without preprocessing

Machine learning didn’t “click” for me because of math — it clicked because of feature engineering.
And now, I finally understand why preprocessing is half the work.

Share on

X Facebook LinkedIn Bluesky

Zeu Park

From Confusion to Clarity: My Journey Into Machine Learning and Feature Engineering

Introduction

The Early Questions I Had

Working With the Russian Number Plate Dataset

What I Tried

What I Learned

What I Thought About

What I Want to Do Next

Share on

You May Also Enjoy

Oracle Free Tier Limitations: Regional Resource Exhaustion and Deployment Dilemmas

Is Using AI to Write Code Helping or Hurting My Long-Term Growth?

Why Feature Engineering and Domain Knowledge Outperform Fancy Models

What is LLM Fine-Tuning? Making the Model Speak Your Language