Summaries for machine learning course (1)

Machine learning algorithms:

Supervised learning: used most in real-world applications 

Unsupervised learning

recommender systems

Reinforcement learning

(Tool vs how to apply the tool)





1. Supervised learning:

X   to    Y

input  to     output/lable

learns from being given"right answers"

Algorithms:

1.1regression: predict house price vs. size 

1.1.1 Terminology:

x==input variable/feature

y==output variable/target

m=number of training examples

(x,y)=single training example



Add 1/2 to look neater






find w and b to minimize J


1.1.2 Train the model with gradient descent

Gradient Descent is an optimization algorithm used to minimize a function by iteratively moving towards the function's steepest descent, as defined by the negative of the gradient. It's widely used in machine learning and deep learning to minimize the cost or loss function and optimize the parameters of a model.
J will not always follow a bell shape, so it may have >1 minimum

learning rate: alpha (between 0 and 1)
Derivative term of J
w is assigned and b is assigned as the following:




Repeat until the algorithm converges. Converges refer to reaching a local minimum where the parameters w and b no longer change much with each additional step you take.

Remember alpha, the learning rate is always positive.


The derivative term is equal to the slop in this example.



1.1.3 How do you choose a learning rate?

If the learning rate is too small, gradient descent may be slow.
If the learning rate is too large, gradient descent may :
                 --overshoot, and never reach the minimum.
                 --Fail to converge, diverge 
  





It can reach a local minimum with a fixed learning rate since when it is near a local minimum, 
   --- Derivative (slope in the example) becomes smaller
   --- Update steps become smaller


1.1.4 Example : Gradient descent for linear regression








If using the squared error cost function, it has a single global minimum. It is due to its bowl shape and it also called convex function. 



some other functions may have more than one local minimum.

1.1.5 Example :Batch gradient descent for linear regression model : Other gradient descents use subsets.









1.2 classification : predict categories     e.g. breast cancer detection : benign va malignant



2. unsupervised learning: --Find sth interesting in unlabeled data--Data only comes with inputs x, but not output labels y. The algorithm has to find structure in the data 

vs supervised learning--learn from data labeled with the right answers

Algorithm:

2.1Clustering: Group similar data points together e.g. Google News, DNA microarray

2.2Anomaly detection: find unusual data points. e.g. fraud detection

2.3Dimensionality reduction: compress data using fewer numbers


Reference :

Code/class notes for reference : https://github.com/greyhatguy007/Machine-Learning-Specialization-Coursera/tree/main/C1%20-%20Supervised%20Machine%20Learning%20-%20Regression%20and%20Classification
















 












Comments

Popular posts from this blog

Analysis of Repeated Measures Data using SAS

Four essential statistical functions for simulation in SAS

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden