Summaries for machine learning course (4)

January 27, 2025

Checking gradient descent for convergence:

There are two ways to check whether the gradient descent is working correctly:

1. A graph of the learning curve: If gradient descent is working correctly, the cost $J$ should decrease with each iteration. If J increases, it usually means the learning rate $α$ is too high or there’s a bug in the code. By around 300 iterations, the cost curve starts to level off, and by 400 iterations, it flattens out. This shows that gradient descent has mostly converged, as the cost is no longer decreasing significantly.

For a different application, gradient descent might require 1,000 or even 100,000 iterations to converge. Predicting the exact number of iterations needed in advance is often challenging.

2. Automatic convergence test:

Disadvantage:

Choosing an appropriate threshold ( $\epsilon$ ) is not straightforward. If $\epsilon$ is too small, the algorithm may run unnecessarily long, wasting computational resources. If it’s too large, the algorithm might stop prematurely before reaching a good solution.

#iterations refers to iterations for one of the parameters, either w or b.

How to choose the learning rate?

f the cost $J$ keeps fluctuating or increases during iterations, there are two possible reasons: either there is a bug in the code, or the learning rate is too high. To check for a bug, try setting a smaller learning rate and observe whether $J$ still increases.

Try 0.001,0.003,0.01,0.03,0.1,0.3,1..on a small dataset of the data

Feature engineering: using intuition to design new features by transforming or combining original features.

Polynomial regression:

For polynomial regression, be cautious about the feature scaling.

How to handle feature scaling?

Apply scaling before generating polynomial features:

Scale the original features (e.g., using StandardScaler or MinMaxScaler) before generating the polynomial terms.

This ensures that both the original features and their polynomial terms are on a similar scale.

Include Polynomial Features in Scaling:

If scaling is applied after generating polynomial features, make sure all terms are included in the scaling process.

Key Points to Remember

Always scale the input features to prevent the polynomial terms from growing disproportionately.
If you're using scikit-learn, consider using pipelines to streamline preprocessing and ensure consistent scaling.
Use standardization (z-score scaling) or min-max scaling, depending on your model and dataset.

Proper feature scaling ensures numerical stability, faster convergence, and better model performance in polynomial regression tasks.

FYI:

Scikit-learn is a popular open-source Python library for machine learning. It provides simple and efficient tools for data mining and analysis, making it a go-to library for building, testing, and deploying machine learning models.

numpy is the fundamental package for working with matrices in Python.

matplotlib is a famous library to plot graphs in Python.

Search This Blog

MishenMed: Statistical Consulting for Clinical Trials

Summaries for machine learning course (4)

Key Points to Remember

Comments

Post a Comment

Popular posts from this blog

Analysis of Repeated Measures Data using SAS (1)

Understanding Binding vs. Non-Binding Futility Analysis in Clinical Trials

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden