Classification with logistic regression
Logistic regression:
inner y=1; outer y=0
cost function for logistic regression
The cost function for linear regression is convex.
However, if you apply the same cost function to logistic regression, it is not convex.
Definition of Convexity
A function is convex if for any two points and in its domain and for any , the following inequality holds:
This means that the function does not have multiple local minima—there is only one global minimum, making optimization easier.
Convex vs. Non-Convex Functions
- Convex Function: Has a bowl-like shape, ensuring gradient-based optimization methods (e.g., gradient descent) reliably converge to the global minimum.
- Non-Convex Function: Can have multiple local minima, making optimization more difficult since gradient descent might get stuck in a local minimum.
Cost function for logistic regression:


Maximum Likelihood Estimation (MLE) for the Cost Function of Logistic Regression
1. Understanding the Likelihood Function
Logistic regression is used for binary classification, where the target variable takes values 0 or 1. The model predicts the probability of given input features using the sigmoid function:
where:
- is the probability of ,
- is the vector of parameters,
- is the feature vector.
Given a dataset of m independent observations , we assume the outputs yi are Bernoulli distributed:
2. Constructing the Likelihood Function
The likelihood function
3. Taking the Log-Likelihood
Since likelihood calculations involve products, it's more convenient to work with the log-likelihood function:
This is the function that logistic regression aims to maximize to find the best parameters θ. However, since optimization is typically done by minimization, we use the negative log-likelihood as the cost function.
4. Cost Function for Logistic Regression
By negating the log-likelihood, we obtain the log loss function, which is the cost function for logistic regression:
This function is:
Convex, ensuring a single global minimum.
Differentiable, allows optimization using gradient descent.
Gradient descent for logistic regression
The problem of overfitting
Regularization to reduece overfitting
what dose overfitting refer to ?
1. If the model dose not fit the training set well—underfit/high bias
2. fits training set pretty well
3. fits the training set extremely well--overfit/high variance---if your training set were just a little bit different, the function fits could end up totally different.
Addressing overfitting
option 1:
option 3
What regularization does is encourage
the learning algorithm to shrink the values of
the parameters without necessarily
demanding that the parameter is set to exactly 0.
Regularization is for wj, no need to do it for b.
cost function with regularization:
lamda=0, if need to minimize the costfunction, it causes overfitting
lamda=10^10, if need to minimize the cost function, wj should be closed to 0 and underfit
Regularized linear regression:
Regularization shrinks wj a little bit compared to the usual update
Regularized logistic regression:
Comments
Post a Comment