Summaries for machine learning course (3)

January 26, 2025

Feature scaling:

How ?

1. Min-Max Scaling (Normalization): Scales features to a specific range

2. Mean normalization

3. Standardization: transforms features to have zero mean and unit variance.

When ?

When you have multiple features in your dataset, it is generally recommended to use the same scaling method for all features to maintain consistency and comparability between them. Here's why and how to approach it:

Why Use the Same Scaling Method?

Uniformity: Different scaling methods can transform features into ranges or distributions that may not align well, potentially confusing machine learning models.
Model Sensitivity: Many models (e.g., distance-based models like k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), or Principal Component Analysis (PCA)) assume that features are on the same scale. Using different scaling methods might bias the model toward features with a wider range.
Interpretability: Having a consistent scaling method ensures that all transformed features are comparable, which aids in model interpretation and debugging.

Exceptions

Heterogeneous Features: If your features are of different types (e.g., one is a count and another is a percentage), consider whether scaling them differently makes sense based on domain knowledge. For example:
- A feature measured in dollars might use logarithmic scaling to reduce skewness.
- A binary feature might not need scaling at all.
Sparse Data: If some features are sparse (e.g., encoded categorical variables), scaling might affect sparsity and could require a tailored approach like preserving the 0 values while scaling non-zero values.

Search This Blog

MishenMed: Statistical Consulting for Clinical Trials

Summaries for machine learning course (3)

Why Use the Same Scaling Method?

Exceptions

Comments

Post a Comment

Popular posts from this blog

Analysis of Repeated Measures Data using SAS

Four essential statistical functions for simulation in SAS

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden