Posts

Showing posts from January, 2025

Summaries for machine learning course (4)

Image
 Checking gradient descent for convergence: There are two ways to check whether the gradient descent is working correctly: 1. A graph of the learning curve: If gradient descent is working correctly, the cost  J should decrease with each iteration. If J increases, it usually means the learning rate  α is too high or there’s a bug in the code. By around 300 iterations, the cost curve starts to level off, and by 400 iterations, it flattens out. This shows that gradient descent has mostly converged, as the cost is no longer decreasing significantly. For a different application, gradient descent might require 1,000 or even 100,000 iterations to converge. Predicting the exact number of iterations needed in advance is often challenging. 2. Automatic convergence test: Disadvantage: Choosing an appropriate threshold ( ϵ \epsilon ϵ ) is not straightforward. If ϵ \epsilon ϵ is too small, the algorithm may run unnecessarily long, wasting computational resources. If it’s too large,...

Summaries for machine learning course (3)

Image
 Feature scaling: When we have different features that take on very different  ranges of values, it can cause gradient descent to run slowly. Through rescaling  the different features, they all take on a comparable range of values and can take gradient descent run much faster How ? Common scaling method: 1.  Min-Max Scaling (Normalization) : Scales features to a specific range 2. Mean normalization 3. Standardization: transforms  features to have zero mean and unit variance. When ? Question: For feature scaling, if there are multiple features, should I use the same method to do the feature scaling? When you have multiple features in your dataset, it is generally recommended to use the same scaling method for all features to maintain consistency and comparability between them. Here's why and how to approach it: Why Use the Same Scaling Method? Uniformity : Different scaling methods can transform features into ranges or distributions that may not align well, pote...

Summaries for machine learning course (2)

Image
Multiple linear regression : Linear regression with multiple variables v.s. univariate regression Vectorization using NumPy makes your code shorter and run faster: f=np.dot(w,x)+b vectorization increases efficiency for large-scale data: parallel processing Gradient descent for univariate linear regression vs multivariate linear regression Normal equation:

Summaries for machine learning course (1)

Image
Machine learning algorithms: Supervised learning: used most in real-world applications  Unsupervised learning recommender systems Reinforcement learning (Tool vs how to apply the tool) 1. Supervised learning: X   to    Y input  to     output/lable learns from being given"right answers" Algorithms: 1.1regression: predict house price vs. size  1.1.1 Terminology: x==input variable/feature y==output variable/target m=number of training examples (x,y)=single training example Add 1/2 to look neater find w and b to minimize J 1.1.2 Train the model with gradient descent Gradient Descent is an optimization algorithm used to minimize a function by iteratively moving towards the function's steepest descent, as defined by the negative of the gradient. It's widely used in machine learning and deep learning to minimize the cost or loss function and optimize the parameters of a model. J will not always follow a bell shape, so it may have >1 minimum ...