Classic ML · AI Stack Deep Dive

Quick Facts

At a Glance

Basic Concepts

Supervised learning: predict a label / value from features (regression, classification).
Unsupervised: find structure without labels (clustering, dimensionality reduction).
Reinforcement learning: an agent learns by taking actions and getting rewards.
Train / validation / test split — never evaluate on data the model has seen.
Overfitting = great on training, bad on new data. Regularization fights it.

Landscape

The Major Libraries

Library	Best for
scikit-learn	The Swiss army knife — regression, classification, clustering, pipelines, all in clean Python.
XGBoost	Gradient-boosted trees — wins most tabular Kaggle competitions.
LightGBM	Microsoft's GBT — faster training, similar accuracy.
CatBoost	Yandex's GBT — best-in-class with categorical features.
PyTorch	The default deep-learning framework today; flexible, research-friendly.
TensorFlow / Keras	Google's DL framework; strong production tooling (TFX, TF Serving).
JAX	Composable function transforms for high-performance ML / research.
Hugging Face Transformers	Pre-trained model hub for NLP, vision, audio.
statsmodels	Classical statistics — linear models, hypothesis tests, time series.
Prophet / NeuralProphet	Forecasting library from Meta.
OpenCV	Computer vision toolkit (faces, edges, tracking).
Surprise / implicit / RecBole	Recommender systems.

Mechanics

Common Tasks

Tabular Prediction

The bread & butter — predicting churn, fraud, prices, conversion from a row of features. Gradient-boosted trees (XGBoost / LightGBM / CatBoost) almost always win, with neural nets close behind on very large datasets.

import xgboost as xgb
model = xgb.XGBClassifier(n_estimators=500, max_depth=6)
model.fit(X_train, y_train)
pred = model.predict_proba(X_test)[:, 1]

Computer Vision

Classification: "what's in this image?" — ResNet, EfficientNet, ViT.
Detection: "where are the things?" — YOLO, DETR, Mask R-CNN.
Segmentation: "label every pixel" — U-Net, SAM.
Generation: Stable Diffusion, Flux, Imagen.

NLP (pre-LLM)

Tokenization, TF-IDF, word2vec / GloVe, named-entity recognition, topic modeling. Most of these still matter as building blocks — and BERT-family encoders remain the best choice for cheap, fast classification & embeddings.

Forecasting & Time Series

ARIMA, Prophet, neural forecasting (N-BEATS, Temporal Fusion Transformer). Most production forecasting still combines a tree model on lagged features with seasonal decomposition.

Evaluation Metrics

Task	Metric
Regression	RMSE, MAE, R²
Binary classification	AUC-ROC, precision, recall, F1, log-loss
Multi-class	Accuracy, macro F1, confusion matrix
Ranking / RecSys	NDCG, MRR, MAP, hit-rate
Forecasting	MAPE, sMAPE, MASE
Detection	mAP @ IoU thresholds

Classic ML vs LLMs — When to Use Which

Use Classic ML when…

You have structured / tabular data.
You need millisecond-latency & pennies-per-prediction.
You need interpretability & explainability.
Compliance demands deterministic models.

Reach for LLMs when…

Inputs are free-form natural language.
You need reasoning, summarization, or generation.
You want zero-shot — no labeled training data.
Latency & cost can absorb a few hundred ms / cents.

Continue

Other AI Stack Layers

Foundation Models Model Providers Frameworks Vector DBs Dev Agents MLOps Data Prep ↑ Back to AI Landscape