Skip to content

Commit 8f8dbb7

Browse files
committed
update week 16
1 parent 20e9fca commit 8f8dbb7

File tree

8 files changed

+312
-552
lines changed

8 files changed

+312
-552
lines changed

doc/pub/week16/html/week16-bs.html

Lines changed: 38 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,11 @@
104104
('Implementation with PennyLane',
105105
2,
106106
None,
107-
'implementation-with-pennylane')]}
107+
'implementation-with-pennylane'),
108+
('Kullback-Leibler divergence',
109+
2,
110+
None,
111+
'kullback-leibler-divergence')]}
108112
end of tocinfo -->
109113

110114
<body>
@@ -166,6 +170,7 @@
166170
<!-- navigation toc: --> <li><a href="#gibbs-sampling" style="font-size: 80%;">Gibbs sampling</a></li>
167171
<!-- navigation toc: --> <li><a href="#parameter-optimization-and-variational-techniques" style="font-size: 80%;">Parameter Optimization and Variational Techniques</a></li>
168172
<!-- navigation toc: --> <li><a href="#implementation-with-pennylane" style="font-size: 80%;">Implementation with PennyLane</a></li>
173+
<!-- navigation toc: --> <li><a href="#kullback-leibler-divergence" style="font-size: 80%;">Kullback-Leibler divergence</a></li>
169174

170175
</ul>
171176
</li>
@@ -241,8 +246,7 @@ <h2 id="introduction" class="anchor">Introduction </h2>
241246
first classical Boltzmann machines and restricted Boltzmann machines (RBMs).
242247
Thereafter we
243248
introduce QBMs and their restricted variant (RQBM), discuss training
244-
methods, and illustrate practical implementation using
245-
PennyLane.
249+
methods, and illustrate practical implementation using PennyLane.
246250
</p>
247251
</div>
248252
</div>
@@ -623,16 +627,15 @@ <h2 id="observables" class="anchor">Observables </h2>
623627
\langle \hat{O}\rangle = \mathrm{Tr} (\rho \hat{O}).
624628
$$
625629

626-
<p>In the QBM
627-
context, one is interested in the probability \( p(v) \) of measuring the
630+
<p>In the QBM context, one is interested in the probability \( p(v) \) of measuring the
628631
visible qubits in computational basis state \( v \). If the full thermal
629632
state lives on both visible and hidden qubits, this probability is
630633
</p>
631634
$$
632-
p_\Theta(v)=\mathrm{Tr}\bigl[Pi_v^{(\text{vis})}\,\rho(\theta)\bigr],
635+
p_{\Theta}(v)=\mathrm{Tr}\bigl[\Pi_v^{(\mathrm{vis})}\rho(\Theta)\bigr],
633636
$$
634637

635-
<p>where \( Pi_v^{(\text{vis})}=|v\>\ < v| \) acts on the visible subspace.</p>
638+
<p>where \( \Pi_v^{(\mathrm{vis})}=\vert v\rangle\langle v\vert \) acts on the visible subspace.</p>
636639

637640
<p>Equivalently, one may <em>trace out</em> the hidden qubits and work with the
638641
reduced density matrix on the visible subsystem. Computing these
@@ -645,7 +648,7 @@ <h2 id="observables" class="anchor">Observables </h2>
645648
<h2 id="quantum-boltzmann-machines" class="anchor">Quantum Boltzmann Machines </h2>
646649

647650
<p>The model distribution over
648-
classical bitstrings \( v \) is given by the diagonal of the quantum Gibbs
651+
classical bitstrings \( v \) is given by the diagonal of the quantum Gibbs (see whiteboard notes for definition of quantum Gibbs state)
649652
state \( \rho = e^{-H}/Z \).
650653
</p>
651654

@@ -679,7 +682,7 @@ <h2 id="restricted-qbm-rqbm" class="anchor">Restricted QBM (RQBM) </h2>
679682
couplings, and \( V_{ii{\prime}} \) are possible visible-visible couplings.
680683
(Classically, \( V=0 \) in an RBM).
681684
Importantly, there are no hidden-hidden \( ZZ \) terms in
682-
this restricted model. Equation&#8239;xxxx is a
685+
this restricted model. The above equation is a
683686
direct quantum analogue of the RBM energy function, promoting it to an
684687
operator acting on qubits.
685688
.
@@ -701,40 +704,41 @@ <h2 id="energy-based-training-objective-and-gradients" class="anchor">Energy-Bas
701704
<div class="panel panel-default">
702705
<div class="panel-body">
703706
<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
704-
<p>The goal is to adjust the Hamiltonian parameters \theta so that the
707+
<p>The goal is to adjust the Hamiltonian parameters \( \Theta \) so that the
705708
model distribution
706709
</p>
707710
$$
708-
\(p_\theta(v)=\ < v|\rho(\theta)|v\>\),
711+
p_{\Theta}(v)=\langle v\vert\rho(\Theta)\vert v\rangle,
709712
$$
710713

711714
<p>approximates
712-
\( p_{\rm data}(v) \). Equivalently, one can view the data distribution as
715+
\( p_{\mathrm{data}}(v) \). Equivalently, one can view the data distribution as
713716
a target density matrix \( \eta \) (diagonal in the computational basis) and
714717
minimize the quantum relative entropy (quantum KL divergence)
715718
</p>
716719
$$
717-
S(\eta\Vert \rho(\theta)) = \mathrm{Tr}\!\bigl[\eta\ln\eta\bigr] - \mathrm{Tr}\!\bigl[\eta\ln\rho(\theta)\bigr].
720+
S(\eta\vert \rho(\Theta)) = \mathrm{Tr}\!\bigl[\eta\ln\eta\bigr] - \mathrm{Tr}\!\bigl[\eta\ln\rho(\Theta)\bigr].
718721
$$
719722
</div>
720723
</div>
721724

725+
722726
<!-- !split -->
723727
<h2 id="non-negative-loss" class="anchor">Non-negative loss </h2>
724728

725729
<p>This loss is non-negative and equals zero only when \( \eta=\rho(\theta) \).
726730
Writing \( \rho=e^{-H}/Z \), one finds the gradient of the relative entropy
727-
(for parameter \theta in \( H \)) as
731+
(for parameter \( \Theta \) in \( H \)) as
728732
</p>
729733
$$
730-
\frac{\partial}{\partial\theta} S(\eta\Vert\rho)
731-
= \mathrm{Tr}\!\Bigl[\eta\,\partial_\theta(\beta H + \ln Z)\Bigr]
732-
= \beta\Bigl(\Tr[\eta\,\partial_\theta H] - \mathrm{Tr}[\rho\,\partial_\theta H]\Bigr).
734+
\frac{\partial}{\partial\Theta} S(\eta\vert\rho)
735+
= \mathrm{Tr}\!\Bigl[\eta\,\partial_\Theta(\beta H + \ln Z)\Bigr]
736+
= \beta\Bigl(\mathrm{Tr}[\eta\,\partial_\theta H] - \mathrm{Tr}[\rho\,\partial_\Theta H]\Bigr).
733737
$$
734738

735739
<p>In other words,</p>
736740
$$
737-
\nabla_\theta S \;=\; \beta\Bigl(\langle \partial_\theta H\rangle_{\rm data} \;-\; \langle \partial_\theta H\rangle_{\rm model}\Bigr).
741+
\nabla_\Theta S = \beta\Bigl(\langle \partial_\Theta H\rangle_{\rm data}-\langle \partial_\Theta H\rangle_{\rm model}\Bigr).
738742
$$
739743

740744

@@ -747,11 +751,11 @@ <h2 id="analogy-with-classical-rbm" class="anchor">Analogy with classical RBM </
747751
distribution
748752
</p>
749753
$$
750-
\eta\Vert\rho)=\mathrm{Tr}[\eta\ln\eta]-\mathrm{Tr}[\eta\ln\rho].
754+
\eta\vert\rho)=\mathrm{Tr}[\eta\ln\eta]-\mathrm{Tr}[\eta\ln\rho].
751755
$$
752756

753-
<p>In practice, one computes $\langle \( \partial_\theta H\rangle_{\rm data} \) by averaging over the training
754-
set, and estimates \( \langle \partial_\theta H\rangle_{\rm model} \) by
757+
<p>In practice, one computes \( \langle \partial_{\Theta} H\rangle_{\mathrm{data}} \) by averaging over the training
758+
set, and estimates \( \langle \partial_{\theta} H\rangle_{\mathrm{model}} \) by
755759
sampling from the quantum model.
756760
</p>
757761

@@ -761,7 +765,7 @@ <h2 id="gibbs-sampling" class="anchor">Gibbs sampling </h2>
761765
<p>Note that preparing exact Gibbs samples of a non-commuting Hamiltonian
762766
is hard. Many methods have been proposed to approximate the model
763767
expectation. For example, one may use a bound on the quantum free
764-
energy (as in Amin et al. ), or perform contrastive divergence with a
768+
energy, or perform contrastive divergence with a
765769
quantum device. Recent theoretical work shows that minimizing the
766770
relative entropy in QBM training can be done with stochastic gradient
767771
descent in polynomial sample complexity under reasonable assumptions .
@@ -770,7 +774,7 @@ <h2 id="gibbs-sampling" class="anchor">Gibbs sampling </h2>
770774
<!-- !split -->
771775
<h2 id="parameter-optimization-and-variational-techniques" class="anchor">Parameter Optimization and Variational Techniques </h2>
772776

773-
<p>Given the gradient above, one can optimize \theta by standard
777+
<p>Given the gradient above, one can optimize \( \Theta \) by standard
774778
gradient-based methods (SGD, Adam, etc.). In a gate-based setting, we
775779
implement the RQBM Hamiltonian via a parameterized quantum circuit
776780
(ansatz) and use variational quantum algorithms (VQAs). Each
@@ -779,39 +783,11 @@ <h2 id="parameter-optimization-and-variational-techniques" class="anchor">Parame
779783
parameter-shift rule or automatic differentiation.
780784
</p>
781785

782-
<p>One approach is the \( \beta \)-Variational Quantum Eigensolver (&#946;-VQE)
783-
technique. Liu et al. (2021) proposed a variational ansatz to
784-
represent a thermal (mixed) state using a combination of a classical
785-
neural network and a quantum circuit. Huijgen et al. (2024) applied
786-
this to QBM training: an inner loop runs &#946;-VQE to approximate the
787-
Gibbs state of H(\theta), while an outer loop updates \theta to
788-
minimize the relative entropy to the data . This &#8220;nested loop&#8221;
789-
algorithm effectively sidesteps direct sampling of the true quantum
790-
Boltzmann state by variational approximation. It has been shown to
791-
work on both classical and quantum target data, achieving
792-
high-fidelity learning for up to 10 qubits .
793-
</p>
794-
795-
<p>Other sophisticated ans&#228;tze exist. For example, Evolved Quantum
796-
Boltzmann Machines (Minervini et al., 2025) prepare a thermal state
797-
under one Hamiltonian and then evolve it under another, combining
798-
imaginary- and real-time evolution. They derive analytical gradient
799-
formulas and propose natural-gradient variants . There are also
800-
&#8220;semi-quantum&#8221; RBMs (sqRBMs) which commute in the visible subspace and
801-
treat the hidden units quantum-mechanically. Intriguingly, sqRBMs
802-
were found to be expressively equivalent to classical RBMs, requiring
803-
fewer hidden units for the same number of parameters . In practice,
804-
however, variational optimization in high dimensions can suffer from
805-
barren plateaus. Recent analysis shows that training QBMs using the
806-
relative entropy objective avoids many of these concentration issues,
807-
with provably polynomial complexity under realistic conditions .
808-
</p>
809-
810786
<!-- !split -->
811787
<h2 id="implementation-with-pennylane" class="anchor">Implementation with PennyLane </h2>
812788

813789
<p>As a concrete example, we outline how to implement an RQBM in
814-
PennyLane. We consider n_v visible and n_h hidden qubits. The ansatz
790+
PennyLane. We consider \( n_v \) visible and \( n_h \) hidden qubits. The ansatz
815791
can be, for instance, layers of parameterized single-qubit rotations
816792
and entangling gates that respect the bipartite structure. Below is
817793
illustrative code (in Python) using PennyLane&#8217;s default.qubit
@@ -822,17 +798,20 @@ <h2 id="implementation-with-pennylane" class="anchor">Implementation with PennyL
822798

823799
<!-- !ec -->
824800

825-
<p>This circuit takes a parameter vector params of length n_v+n_h and
826-
returns the probabilities q_\theta(v) of measuring each visible
827-
bitstring v. Notice we measure only the visible wires (the
828-
wires=list(range(n_v)) in qml.probs marginalizes out the hidden
801+
<p>This circuit takes a parameter vector params of length \( n_v+n_h \) and
802+
returns the probabilities \( p_{\Theta}(v) \) of measuring each visible
803+
bitstring \( v \). Notice we measure only the visible wires (the
804+
wires=list(range(n$\_$v)) in qml.probs marginalizes out the hidden
829805
qubit).
830806
</p>
831807

808+
<!-- !split -->
809+
<h2 id="kullback-leibler-divergence" class="anchor">Kullback-Leibler divergence </h2>
810+
832811
<p>Next, we train this model to match a target dataset distribution.
833-
Suppose our data has distribution target = [p(00), p(01), p(10),
834-
p(11)]. We can define the (classical) loss as the Kullback-Leibler
835-
divergence D_{\rm KL}(p_{\rm data}\Vert q_\theta) or simply the
812+
Suppose our data has distribution target = \( [p(00), p(01), p(10),
813+
p(11)] \). We can define the (classical) loss as the Kullback-Leibler
814+
divergence \( D_{\mathrm{KL}}(p_{\mathrm{data}}\vert\vert q_\Theta) \) or simply the
836815
negative log-likelihood. Then we update params by gradient descent.
837816
PennyLane&#8217;s automatic differentiation can compute gradients via the
838817
parameter-shift rule, but we show an explicit parameter-shift
@@ -869,19 +848,6 @@ <h2 id="implementation-with-pennylane" class="anchor">Implementation with PennyL
869848
gradient, etc.). The above shows that PennyLane can seamlessly
870849
integrate quantum circuit definitions with classical training logic.
871850
</p>
872-
873-
$$
874-
H = -\sum_{a=1}^N \Gamma_a,\sigma^x_a ;-;\sum_{a=1}^N b_a,\sigma^z_a ;-;\sum_{a < b} u_{ab},\sigma^z_a \sigma^z_b,
875-
$$
876-
877-
<p>where \( \sigma_a^x,\sigma_a^z \) are the Pauli matrices on qubit \( a \),
878-
\( \Gamma_a \) is a &#8220;transverse&#8221; field, \( b_a \) are local fields (biases),
879-
and \( u_{ab} \) are interaction strengths . Here each qubit can be
880-
interpreted analogously to a classical unit, but the \( \sigma^x \) term
881-
induces quantum superposition. One can also include \( \sigma^y \) or
882-
more complex terms, but we focus on this common ansatz.
883-
</p>
884-
885851
<!-- ------------------- end of main content --------------- -->
886852
</div> <!-- end container -->
887853
<!-- include javascript, jQuery *first* -->

0 commit comments

Comments
 (0)