Summaries for machine learning course (4)

 Checking gradient descent for convergence:

There are two ways to check whether the gradient descent is working correctly:

1. A graph of the learning curve: If gradient descent is working correctly, the cost  should decrease with each iteration. If J increases, it usually means the learning rate  is too high or there’s a bug in the code. By around 300 iterations, the cost curve starts to level off, and by 400 iterations, it flattens out. This shows that gradient descent has mostly converged, as the cost is no longer decreasing significantly.

For a different application, gradient descent might require 1,000 or even 100,000 iterations to converge. Predicting the exact number of iterations needed in advance is often challenging.



2. Automatic convergence test:

Disadvantage:

Choosing an appropriate threshold (ϵ\epsilon) is not straightforward. If ϵ\epsilon is too small, the algorithm may run unnecessarily long, wasting computational resources. If it’s too large, the algorithm might stop prematurely before reaching a good solution.





#iterations refers to iterations for one of the parameters, either w or b.


How to choose the learning rate?

f the cost JJ keeps fluctuating or increases during iterations, there are two possible reasons: either there is a bug in the code, or the learning rate is too high. To check for a bug, try setting a smaller learning rate and observe whether JJ still increases.




Try 0.001,0.003,0.01,0.03,0.1,0.3,1..on a small dataset of the data 







Feature engineering: using intuition to design new features by transforming or combining original features.

Polynomial regression:

For polynomial regression, be cautious about the feature scaling. 

How to handle feature scaling?

Apply scaling before generating polynomial features:

Scale the original features (e.g., using StandardScaler or MinMaxScaler) before generating the polynomial terms.

This ensures that both the original features and their polynomial terms are on a similar scale.



Include Polynomial Features in Scaling:
If scaling is applied after generating polynomial features, make sure all terms are included in the scaling process.

Key Points to Remember

  • Always scale the input features to prevent the polynomial terms from growing disproportionately.
  • If you're using scikit-learn, consider using pipelines to streamline preprocessing and ensure consistent scaling.
  • Use standardization (z-score scaling) or min-max scaling, depending on your model and dataset.

Proper feature scaling ensures numerical stability, faster convergence, and better model performance in polynomial regression tasks.



FYI:

Scikit-learn is a popular open-source Python library for machine learning. It provides simple and efficient tools for data mining and analysis, making it a go-to library for building, testing, and deploying machine learning models.
numpy is the fundamental package for working with matrices in Python.
matplotlib is a famous library to plot graphs in Python.



Comments

Popular posts from this blog

Analysis of Repeated Measures Data using SAS

Four essential statistical functions for simulation in SAS

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden