posted Jun 9, 2014, 10:32 PM by Teng-Yok Lee
[
updated Jun 10, 2014, 8:59 AM
]
REF: http://www.holehouse.org/mlclass/06_Logistic_Regression.html
Because it took me a while to finally derive it, I decide to put the detail here. Since I cannot type the equation nice, I simplify the notations.
Note 1: Gradients of logistic cost functions
Here the cost function is denoted as F(t) where t terms for theta. m is the number of samples. (x(i), y(i)), i = 1 ... m, is the training set. ht(x) =1/(1+exp(-tTx)) is the logistic function with parameter theta (t).
F ( t ) = 1/ m sum i = 1 ... m y ( i ) log h t ( x ( i ) ) + (1 - y ( i ) )log (1 - h t ( x ( i ) ))
The partial gradient w.tr.t tj is denoted as d/d tj = d j. Then the partial gradient at t j, aka, d j F( t) is derived as follows: d j F ( t )
= 1/ m sum i = 1 ... m y ( i ) / h t ( x ( i ) ) d j h t ( x ( i ) ) + (1 - y ( i ) )/(1 - h t ( x ( i ) )) d j (1 - h t ( x ( i ) ))
= 1/ m sum i = 1 ... m y (i) / h t ( x ( i ) ) d j h t ( x ( i ) ) - (1 - y ( i ) )/(1 - h t ( x ( i ) )) d j h t ( x ( i ) ) <-- Remove the constant 1 , which is in boldface above.
= 1/ m sum i = 1 ... m d j h t ( x ( i ) ) { y ( i ) / h t ( x ( i ) ) - (1 - y ( i ) )/(1 - h t ( x ( i ) ))} <-- Separate d j h t ( x ( i ) ) from both terms .
= 1/ m sum i = 1 ... m d j h t (x (i) ) { y ( i ) (1 - h t ( x ( i ) )) - h t ( x ( i ) )(1 - y ( i ) )}/{ h t ( x ( i ) )(1 - h t ( x ( i ) )}
= 1/ m sum i = 1 ... m d j h t (x (i) ) ( y ( i ) - h t ( x ( i ) ))/{ h t ( x ( i ) )(1 - h t ( x ( i ) )}
where
d j h t ( x ( i ) )
= d j (1 + exp(- t T x ( i ) )) -1
= -(1 + exp(- t T x ( i ) )) -2 exp(- t T x ( i ) ) x ( i ) j
= -{1/(1 + exp(- t T x ( i ) )} {exp(- t T x ( i ) ) / (1 + exp(- t T x ( i ) )} x ( i ) j
= -{ h t ( x ( i ) )(1 - h t ( x ( i ) )} x ( i ) j Thus d j F ( t )
= 1/ m sum i = 1 ... m d j h t ( x ( i ) ) ( y ( i ) - h t ( x ( i ) ))/{ h t ( x ( i ) )(1 - h t ( x ( i ) )}
= 1/ m sum i = 1 ... m ( y ( i ) - h t ( x ( i ) )) {-{ h t ( x ( i ) )(1 - h t ( x ( i ) )} x ( i ) j }/{ h t ( x ( i ) )(1 - h t ( x ( i ) )}
= 1/ m sum i = 1 ... m ( h t ( x ( i ) ) - y ( i ) ) x ( i ) j
Note 2: How is logistic regression related to MLE?Actually the logistic regression cost can be treated as the likelihood of a Bernoulli random variable. Here P[ x; t] = ht( x) is the probability that x belongs to class 0. Then the pdf of x is f ( x ) = P [ x ; t ] y (1 - P [ x ; t ]) 1 - y
and its likelihood of theta t is:
L ( t ) = log f ( x ) = y log P [ x ; t ] + (1- y ) log 1 - P [ x ; t ] = y log h t ( x ) + (1- y ) log 1 - h t ( x )
That's why optimizing the cost function F is equivalent to find the Maximum Likelihood Estimator.
|
|