CompPhysics
diff --git a/‎doc/pub/week13/html/week13-bs.html‎
Lines changed: 171 additions & 69 deletions b/‎doc/pub/week13/html/week13-bs.html‎
Lines changed: 171 additions & 69 deletions
diff --git a/‎doc/pub/week13/html/week13-reveal.html‎
Lines changed: 121 additions & 54 deletions b/‎doc/pub/week13/html/week13-reveal.html‎
Lines changed: 121 additions & 54 deletions
@@ -473,6 +473,10 @@ <h2 id="classical-support-vector-machines-overarching-aims-best-visualized-with-
 intuitive way in terms of lines in a two-dimensional space separating
 the two classes (see figure below).
 </p>
+</section>
+
+<section>
+<h2 id="basic-mathematics">Basic mathematics </h2>
 
 <p>The basic mathematics behind the SVM is however less familiar to most of us. 
 It relies on the definition of hyperplanes and the
@@ -601,10 +605,15 @@ <h2 id="what-is-a-hyperplane">What is a hyperplane? </h2>
 distinctly classifies the data points.
 </p>
 
-<p>In a \( p \)-dimensional space, a hyperplane is what we call an affine subspace of dimension of \( p-1 \).
-As an example, in two dimension, a hyperplane is simply as straight line while in three dimensions it is 
-a two-dimensional subspace, or stated simply, a plane. 
+<p>In a \( p \)-dimensional space, a hyperplane is what we call an affine
+subspace of dimension of \( p-1 \).  As an example, in two dimension, a
+hyperplane is simply as straight line while in three dimensions it is
+a two-dimensional subspace, or stated simply, a plane.
 </p>
+</section>
+
+<section>
+<h2 id="two-dimensional-case">Two-dimensional case </h2>
 
 <p>In two dimensions, with the variables \( x_1 \) and \( x_2 \), the hyperplane is defined as</p>
 <p>&nbsp;<br>
@@ -647,6 +656,10 @@ <h2 id="a-p-dimensional-space-of-features">A \( p \)-dimensional space of featur
 \boldsymbol{x}_i = \begin{bmatrix} x_{i1} \\ x_{i2} \\ \dots \\ \dots \\ x_{ip} \end{bmatrix}.
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="more-details">More details </h2>
 
 <p>If the above condition is not met for a given vector \( \boldsymbol{x}_i \) we have </p>
 <p>&nbsp;<br>
@@ -675,7 +688,10 @@ <h2 id="a-p-dimensional-space-of-features">A \( p \)-dimensional space of featur
 $$
 <p>&nbsp;<br>
 
-<p>When we try to separate hyperplanes, if it exists, we can use it to construct a natural classifier: a test observation is assigned a given class depending on which side of the hyperplane it is located.</p>
+<p>When we try to separate hyperplanes, if it exists, we can use it to
+construct a natural classifier: a test observation is assigned a given
+class depending on which side of the hyperplane it is located.
+</p>
 </section>
 
 <section>
@@ -690,6 +706,10 @@ <h2 id="the-two-dimensional-case">The two-dimensional case </h2>
 some reinforcement so that future data points can be classified with
 more confidence.
 </p>
+</section>
+
+<section>
+<h2 id="linear-classifier">Linear classifier </h2>
 
 <p>What a linear classifier attempts to accomplish is to split the
 feature space into two half spaces by placing a hyperplane between the
@@ -740,7 +760,11 @@ <h2 id="first-attempt-at-a-minimization-approach">First attempt at a minimizatio
 $$
 <p>&nbsp;<br>
 
-<p>We could now for example define all values \( y_i =1 \) as misclassified in case we have \( \boldsymbol{w}^T\boldsymbol{x}_i+b < 0 \) and the opposite if we have \( y_i=-1 \). Taking the derivatives gives us</p>
+<p>We could now for example define all values \( y_i =1 \) as misclassified
+in case we have \( \boldsymbol{w}^T\boldsymbol{x}_i+b < 0 \) and the opposite if we have
+\( y_i=-1 \). Taking the derivatives gives us
+</p>
+
 <p>&nbsp;<br>
 $$
 \frac{\partial C}{\partial b} = -\sum_{i\in M} y_i,
@@ -776,41 +800,15 @@ <h2 id="solving-the-equations">Solving the equations </h2>
 </section>
 
 <section>
-<h2 id="code-example">Code Example </h2>
+<h2 id="problems-with-the-simpler-approach">Problems with the Simpler Approach </h2>
 
 <p>The equations we discussed above can be coded rather easily (the
-framework is similar to what we developed for logistic
-regression). We are going to set up a simple case with two classes only and we want to find a line which separates them the best possible way.
+framework is similar to what has been  developed for say logistic
+regression).
 </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="problems-with-the-simpler-approach">Problems with the Simpler Approach </h2>
-
 <p>There are however problems with this approach, although it looks
-pretty straightforward to implement. When running the above code, we see that we can easily end up with many diffeent lines which separate the two classes.
+pretty straightforward to implement. When running such a code, we see that we can easily end up with many diffeent lines which separate the two classes.
 </p>
 
 <p>For small
@@ -839,8 +837,12 @@ <h2 id="a-better-approach">A better approach </h2>
 <p>&nbsp;<br>
 
 <p>All points are thus at a signed distance from the decision boundary defined by the line \( L \). The parameters \( b \) and \( w_1 \) and \( w_2 \) define this line. </p>
+</section>
 
-<p>We seek thus the largest value \( M \) defined by</p>
+<section>
+<h2 id="largest-value-m">Largest value \( M \) </h2>
+
+<p>We seek the largest value \( M \) defined by</p>
 <p>&nbsp;<br>
 $$
 \frac{1}{\vert \vert \boldsymbol{w}\vert\vert}y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, n, 
@@ -895,6 +897,10 @@ <h2 id="a-quick-reminder-on-lagrangian-multipliers">A quick Reminder on Lagrangi
 df = \frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial y}dy+\frac{\partial f}{\partial z}dz.
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="not-all-variables-are-indepenent-of-each-other">Not all variables are indepenent of each other </h2>
 
 <p>In many problems the variables \( x,y,z \) are often subject to constraints (such as those above for the margin)
 so that they are no longer all independent. It is possible at least in principle to use each 
@@ -918,6 +924,10 @@ <h2 id="a-quick-reminder-on-lagrangian-multipliers">A quick Reminder on Lagrangi
 d\phi = \frac{\partial \phi}{\partial x}dx+\frac{\partial \phi}{\partial y}dy+\frac{\partial \phi}{\partial z}dz =0.
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="only-two-independent-variables">Only two independent variables </h2>
 
 <p>Now we cannot set anymore</p>
 <p>&nbsp;<br>
@@ -958,6 +968,10 @@ <h2 id="adding-the-multiplier">Adding the Multiplier </h2>
 \frac{\partial f}{\partial z}+\lambda\frac{\partial \phi}{\partial z} =0.
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="more-details">More details </h2>
 
 <p>We need to remember that we took \( dx \) and \( dy \) to be arbitrary and thus we must have</p>
 <p>&nbsp;<br>
@@ -987,7 +1001,7 @@ <h2 id="adding-the-multiplier">Adding the Multiplier </h2>
 </section>
 
 <section>
-<h2 id="setting-up-the-problem">Setting up the Problem </h2>
+<h2 id="setting-up-the-problem">Setting up the problem </h2>
 <p>In order to solve the above problem, we define the following Lagrangian function to be minimized </p>
 <p>&nbsp;<br>
 $$
@@ -996,6 +1010,10 @@ <h2 id="setting-up-the-problem">Setting up the Problem </h2>
 <p>&nbsp;<br>
 
 <p>where \( \lambda_i \) is a so-called Lagrange multiplier subject to the condition \( \lambda_i \geq 0 \).</p>
+</section>
+
+<section>
+<h2 id="setting-up-derivaties">Setting up derivaties </h2>
 
 <p>Taking the derivatives  with respect to \( b \) and \( \boldsymbol{w} \) we obtain </p>
 <p>&nbsp;<br>
@@ -1018,9 +1036,13 @@ <h2 id="setting-up-the-problem">Setting up the Problem </h2>
 $$
 <p>&nbsp;<br>
 
-<p>subject to the constraints \( \lambda_i\geq 0 \) and \( \sum_i\lambda_iy_i=0 \). 
-We must in addition satisfy the <a href="https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions" target="_blank">Karush-Kuhn-Tucker</a> (KKT) condition
-</p>
+<p>subject to the constraints \( \lambda_i\geq 0 \) and \( \sum_i\lambda_iy_i=0 \). </p>
+</section>
+
+<section>
+<h2 id="karush-kuhn-tucker-condition">Karush-Kuhn-Tucker condition </h2>
+
+<p>We must in addition satisfy the <a href="https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions" target="_blank">Karush-Kuhn-Tucker</a> (KKT) condition</p>
 <p>&nbsp;<br>
 $$
 \lambda_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) -1\right] \hspace{0.1cm}\forall i.
@@ -1094,7 +1116,10 @@ <h2 id="the-last-steps">The last steps </h2>
 b = \frac{1}{N_s}\sum_{j\in N_s}\left(y_j-\sum_{i=1}^n\lambda_iy_i\boldsymbol{x}_i^T\boldsymbol{x}_j\right).
 $$
 <p>&nbsp;<br>
+</section>
 
+<section>
+<h2 id="classifier-equations">Classifier equations </h2>
 <p>With our hyperplane coefficients we can use our classifier to assign any observation by simply using </p>
 <p>&nbsp;<br>
 $$
@@ -1132,10 +1157,17 @@ <h2 id="a-soft-classifier">A soft classifier </h2>
 $$
 <p>&nbsp;<br>
 
-<p>with the requirement \( \xi_i\geq 0 \). The total violation is now \( \sum_i\xi \). 
-The value \( \xi_i \) in the constraint the last constraint corresponds to the  amount by which the prediction
-\( y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)=1 \) is on the wrong side of its margin. Hence by bounding the sum \( \sum_i \xi_i \),
-we bound the total amount by which predictions fall on the wrong side of their margins.
+<p>with the requirement \( \xi_i\geq 0 \). The total violation is now \( \sum_i\xi \). </p>
+</section>
+
+<section>
+<h2 id="misclassification">Misclassification </h2>
+
+<p>The value \( \xi_i \) in the constraint the last constraint corresponds to
+the amount by which the prediction \( y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)=1 \) is on
+the wrong side of its margin. Hence by bounding the sum \( \sum_i
+\xi_i \), we bound the total amount by which predictions fall on the
+wrong side of their margins.
 </p>
 
 <p>Misclassifications occur when \( \xi_i > 1 \). Thus bounding the total sum by some value \( C \) bounds in turn the total number of
@@ -1161,6 +1193,10 @@ <h2 id="soft-optmization-problem">Soft optmization problem </h2>
 <p>&nbsp;<br>
 
 <p>with the requirement \( \xi_i\geq 0 \).</p>
+</section>
+
+<section>
+<h2 id="derivatives-with-respect-to-b-and-boldsymbol-w">Derivatives with respect to \( b \) and \( \boldsymbol{w} \) </h2>
 
 <p>Taking the derivatives  with respect to \( b \) and \( \boldsymbol{w} \) we obtain </p>
 <p>&nbsp;<br>
@@ -1182,6 +1218,10 @@ <h2 id="soft-optmization-problem">Soft optmization problem </h2>
 \lambda_i = C-\gamma_i \hspace{0.1cm}\forall i.
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="new-constraints">New constraints </h2>
 
 <p>Inserting these constraints into the equation for \( {\cal L} \) we obtain the same equation as before</p>
 <p>&nbsp;<br>
@@ -1306,7 +1346,9 @@ <h2 id="kernels-and-non-linearity">Kernels and non-linearity </h2>
 <section>
 <h2 id="the-equations">The equations </h2>
 
-<p>Suppose we define a polynomial transformation of degree two only (we continue to live in a plane with \( x_i \) and \( y_i \) as variables)</p>
+<p>Suppose we define a polynomial transformation of degree two only (we
+continue to live in a plane with \( x_i \) and \( y_i \) as variables)
+</p>
 <p>&nbsp;<br>
 $$
 z = \phi(x_i) =\left(x_i^2, y_i^2, \sqrt{2}x_iy_i\right).
@@ -1327,9 +1369,13 @@ <h2 id="the-equations">The equations </h2>
 $$
 <p>&nbsp;<br>
 
-<p>from which we also find \( b \).
-To compute \( \boldsymbol{z}_i^T\boldsymbol{z}_j \) we define the kernel \( K(\boldsymbol{x}_i,\boldsymbol{x}_j) \) as
-</p>
+<p>from which we also find \( b \).</p>
+</section>
+
+<section>
+<h2 id="defining-the-kernel">Defining the kernel </h2>
+
+<p>To compute \( \boldsymbol{z}_i^T\boldsymbol{z}_j \) we define the kernel \( K(\boldsymbol{x}_i,\boldsymbol{x}_j) \) as</p>
 <p>&nbsp;<br>
 $$
 K(\boldsymbol{x}_i,\boldsymbol{x}_j)=\boldsymbol{z}_i^T\boldsymbol{z}_j= \phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j).
@@ -1342,6 +1388,10 @@ <h2 id="the-equations">The equations </h2>
 K(\boldsymbol{x}_i,\boldsymbol{x}_j)=[x_i^2, y_i^2, \sqrt{2}x_iy_i]^T\begin{bmatrix} x_j^2 \\ y_j^2 \\ \sqrt{2}x_jy_j \end{bmatrix}=x_i^2x_j^2+2x_ix_jy_iy_j+y_i^2y_j^2.
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="kernel-trick">Kernel trick </h2>
 
 <p>We note that this is nothing but the dot product of the two original
 vectors \( (\boldsymbol{x}_i^T\boldsymbol{x}_j)^2 \). Instead of thus computing the
@@ -1381,7 +1431,10 @@ <h2 id="the-problem-to-solve">The problem to solve </h2>
 \( \boldsymbol{y}=[y_1,y_2,\dots,y_n] \). 
 If we add the slack constants this leads to the additional constraint \( 0\leq \lambda_i \leq C \).
 </p>
+</section>
 
+<section>
+<h2 id="convex-optimization">Convex optimization </h2>
 <p>We can rewrite this (see the solutions below) in terms of a convex optimization problem of the type</p>
 <p>&nbsp;<br>
 $$
@@ -1399,7 +1452,7 @@ <h2 id="the-problem-to-solve">The problem to solve </h2>
 </section>
 
 <section>
-<h2 id="different-kernels-and-mercer-s-theorem">Different kernels and Mercer's theorem </h2>
+<h2 id="different-kernels">Different kernels </h2>
 
 <p>There are several popular kernels being used. These are</p>
 <ol>
@@ -1410,6 +1463,10 @@ <h2 id="different-kernels-and-mercer-s-theorem">Different kernels and Mercer's t
 </ol>
 <p>
 <p>and many other ones.</p>
+</section>
+
+<section>
+<h2 id="mercer-s-theorem">Mercer's theorem </h2>
 
 <p>An important theorem for us is <a href="https://en.wikipedia.org/wiki/Mercer%27s_theorem" target="_blank">Mercer's
 theorem</a>.  The
@@ -1666,8 +1723,8 @@ <h2 id="mathematical-optimization-of-convex-functions">Mathematical optimization
 vector \( \boldsymbol{\lambda}=[\lambda_1, \lambda_2,\dots, \lambda_n] \) is the optimization variable we are dealing with.
 </p>
 
-<p>In our case we are particularly interested in a class of optimization problems called convex optmization problems. 
-In our discussion on gradient descent methods we discussed at length the definition of a convex function. 
+<p>In our case we are particularly interested in a class of optimization
+problems called convex optmization problems.
 </p>
 
 <p>Convex optimization problems play a central role in applied mathematics and we recommend strongly <a href="http://web.stanford.edu/~boyd/cvxbook/" target="_blank">Boyd and Vandenberghe's text on the topics</a>.</p>
@@ -1740,6 +1797,10 @@ <h2 id="a-simple-example">A simple example </h2>
 \end{align*}
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="rewriting-in-terms-of-vectors-and-matrices">Rewriting in terms of vectors and matrices </h2>
 
 <p>The minimization problem can be rewritten in terms of vectors and matrices as (with \( x \) and \( y \) being the unknowns)</p>
 <p>&nbsp;<br>
@@ -1754,6 +1815,10 @@ <h2 id="a-simple-example">A simple example </h2>
 \begin{bmatrix} -1 & 0 \\ 0 & -1 \\ -1 & -3 \\ 2 & 5 \\ 3 & 4\end{bmatrix}\begin{bmatrix} x \\ y\end{bmatrix} \preceq \begin{bmatrix}0 \\ 0\\ -15 \\ 100 \\ 80\end{bmatrix}.
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="rewriting-inequalities">Rewriting inequalities </h2>
 
 <p>We have collapsed all the inequalities into a single matrix \( \boldsymbol{G} \). We see also that our matrix </p>
 <p>&nbsp;<br>
@@ -1835,9 +1900,11 @@ <h2 id="back-to-the-more-realistic-cases">Back to the more realistic cases </h2>
 </section>
 
 <section>
-<h2 id="support-vector-machines-for-regression">Support vector machines for regression </h2>
-
-<p>Material will be added here.</p>
+<h2 id="plans-for-next-week">Plans for next week </h2>
+<ol>
+<p><li> Discussion of quantum support vector machines</li>
+<p><li> Introducing quantum neural networks</li>
+</ol>
 </section>