You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>If the above condition is not met for a given vector \( \boldsymbol{x}_i \) we have </p>
652
665
<p> <br>
@@ -675,7 +688,10 @@ <h2 id="a-p-dimensional-space-of-features">A \( p \)-dimensional space of featur
675
688
$$
676
689
<p> <br>
677
690
678
-
<p>When we try to separate hyperplanes, if it exists, we can use it to construct a natural classifier: a test observation is assigned a given class depending on which side of the hyperplane it is located.</p>
691
+
<p>When we try to separate hyperplanes, if it exists, we can use it to
692
+
construct a natural classifier: a test observation is assigned a given
693
+
class depending on which side of the hyperplane it is located.
694
+
</p>
679
695
</section>
680
696
681
697
<section>
@@ -690,6 +706,10 @@ <h2 id="the-two-dimensional-case">The two-dimensional case </h2>
690
706
some reinforcement so that future data points can be classified with
691
707
more confidence.
692
708
</p>
709
+
</section>
710
+
711
+
<section>
712
+
<h2id="linear-classifier">Linear classifier </h2>
693
713
694
714
<p>What a linear classifier attempts to accomplish is to split the
695
715
feature space into two half spaces by placing a hyperplane between the
@@ -740,7 +760,11 @@ <h2 id="first-attempt-at-a-minimization-approach">First attempt at a minimizatio
740
760
$$
741
761
<p> <br>
742
762
743
-
<p>We could now for example define all values \( y_i =1 \) as misclassified in case we have \( \boldsymbol{w}^T\boldsymbol{x}_i+b <0\)andtheoppositeifwehave\(y_i=-1\).Takingthederivativesgivesus</p>
763
+
<p>We could now for example define all values \( y_i =1 \) as misclassified
764
+
in case we have \( \boldsymbol{w}^T\boldsymbol{x}_i+b <0\)andtheoppositeifwehave
<h2id="problems-with-the-simpler-approach">Problems with the Simpler Approach </h2>
811
-
812
810
<p>There are however problems with this approach, although it looks
813
-
pretty straightforward to implement. When running the above code, we see that we can easily end up with many diffeent lines which separate the two classes.
811
+
pretty straightforward to implement. When running such a code, we see that we can easily end up with many diffeent lines which separate the two classes.
<p>All points are thus at a signed distance from the decision boundary defined by the line \( L \). The parameters \( b \) and \( w_1 \) and \( w_2 \) define this line. </p>
840
+
</section>
842
841
843
-
<p>We seek thus the largest value \( M \) defined by</p>
842
+
<section>
843
+
<h2id="largest-value-m">Largest value \( M \) </h2>
844
+
845
+
<p>We seek the largest value \( M \) defined by</p>
844
846
<p> <br>
845
847
$$
846
848
\frac{1}{\vert \vert \boldsymbol{w}\vert\vert}y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, n,
@@ -895,6 +897,10 @@ <h2 id="a-quick-reminder-on-lagrangian-multipliers">A quick Reminder on Lagrangi
<p>We need to remember that we took \( dx \) and \( dy \) to be arbitrary and thus we must have</p>
963
977
<p> <br>
@@ -987,7 +1001,7 @@ <h2 id="adding-the-multiplier">Adding the Multiplier </h2>
987
1001
</section>
988
1002
989
1003
<section>
990
-
<h2id="setting-up-the-problem">Setting up the Problem</h2>
1004
+
<h2id="setting-up-the-problem">Setting up the problem</h2>
991
1005
<p>In order to solve the above problem, we define the following Lagrangian function to be minimized </p>
992
1006
<p> <br>
993
1007
$$
@@ -996,6 +1010,10 @@ <h2 id="setting-up-the-problem">Setting up the Problem </h2>
996
1010
<p> <br>
997
1011
998
1012
<p>where \( \lambda_i \) is a so-called Lagrange multiplier subject to the condition \( \lambda_i \geq 0 \).</p>
1013
+
</section>
1014
+
1015
+
<section>
1016
+
<h2id="setting-up-derivaties">Setting up derivaties </h2>
999
1017
1000
1018
<p>Taking the derivatives with respect to \( b \) and \( \boldsymbol{w} \) we obtain </p>
1001
1019
<p> <br>
@@ -1018,9 +1036,13 @@ <h2 id="setting-up-the-problem">Setting up the Problem </h2>
1018
1036
$$
1019
1037
<p> <br>
1020
1038
1021
-
<p>subject to the constraints \( \lambda_i\geq 0 \) and \( \sum_i\lambda_iy_i=0 \).
1022
-
We must in addition satisfy the <ahref="https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions" target="_blank">Karush-Kuhn-Tucker</a> (KKT) condition
1023
-
</p>
1039
+
<p>subject to the constraints \( \lambda_i\geq 0 \) and \( \sum_i\lambda_iy_i=0 \). </p>
<p>We must in addition satisfy the <ahref="https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions" target="_blank">Karush-Kuhn-Tucker</a> (KKT) condition</p>
1024
1046
<p> <br>
1025
1047
$$
1026
1048
\lambda_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) -1\right] \hspace{0.1cm}\forall i.
@@ -1094,7 +1116,10 @@ <h2 id="the-last-steps">The last steps </h2>
1094
1116
b = \frac{1}{N_s}\sum_{j\in N_s}\left(y_j-\sum_{i=1}^n\lambda_iy_i\boldsymbol{x}_i^T\boldsymbol{x}_j\right).
vector \( \boldsymbol{\lambda}=[\lambda_1, \lambda_2,\dots, \lambda_n] \) is the optimization variable we are dealing with.
1667
1724
</p>
1668
1725
1669
-
<p>In our case we are particularly interested in a class of optimization problems called convex optmization problems.
1670
-
In our discussion on gradient descent methods we discussed at length the definition of a convex function.
1726
+
<p>In our case we are particularly interested in a class of optimization
1727
+
problems called convex optmization problems.
1671
1728
</p>
1672
1729
1673
1730
<p>Convex optimization problems play a central role in applied mathematics and we recommend strongly <ahref="http://web.stanford.edu/~boyd/cvxbook/" target="_blank">Boyd and Vandenberghe's text on the topics</a>.</p>
@@ -1740,6 +1797,10 @@ <h2 id="a-simple-example">A simple example </h2>
1740
1797
\end{align*}
1741
1798
$$
1742
1799
<p> <br>
1800
+
</section>
1801
+
1802
+
<section>
1803
+
<h2id="rewriting-in-terms-of-vectors-and-matrices">Rewriting in terms of vectors and matrices </h2>
1743
1804
1744
1805
<p>The minimization problem can be rewritten in terms of vectors and matrices as (with \( x \) and \( y \) being the unknowns)</p>
1745
1806
<p> <br>
@@ -1754,6 +1815,10 @@ <h2 id="a-simple-example">A simple example </h2>
0 commit comments