Summaries for machine learning course (4)
Checking gradient descent for convergence:
There are two ways to check whether the gradient descent is working correctly:
1. A graph of the learning curve: If gradient descent is working correctly, the cost should decrease with each iteration. If J increases, it usually means the learning rate is too high or there’s a bug in the code. By around 300 iterations, the cost curve starts to level off, and by 400 iterations, it flattens out. This shows that gradient descent has mostly converged, as the cost is no longer decreasing significantly.
For a different application, gradient descent might require 1,000 or even 100,000 iterations to converge. Predicting the exact number of iterations needed in advance is often challenging.
2. Automatic convergence test:
Disadvantage:
Choosing an appropriate threshold () is not straightforward. If is too small, the algorithm may run unnecessarily long, wasting computational resources. If it’s too large, the algorithm might stop prematurely before reaching a good solution.
#iterations refers to iterations for one of the parameters, either w or b.
How to choose the learning rate?
f the cost keeps fluctuating or increases during iterations, there are two possible reasons: either there is a bug in the code, or the learning rate is too high. To check for a bug, try setting a smaller learning rate and observe whether still increases.
Apply scaling before generating polynomial features:
Scale the original features (e.g., using StandardScaler or MinMaxScaler) before generating the polynomial terms.
This ensures that both the original features and their polynomial terms are on a similar scale.
Key Points to Remember
- Always scale the input features to prevent the polynomial terms from growing disproportionately.
- If you're using scikit-learn, consider using pipelines to streamline preprocessing and ensure consistent scaling.
- Use standardization (z-score scaling) or min-max scaling, depending on your model and dataset.
Proper feature scaling ensures numerical stability, faster convergence, and better model performance in polynomial regression tasks.
Comments
Post a Comment