104104 ('Implementation with PennyLane',
105105 2,
106106 None,
107- 'implementation-with-pennylane')]}
107+ 'implementation-with-pennylane'),
108+ ('Kullback-Leibler divergence',
109+ 2,
110+ None,
111+ 'kullback-leibler-divergence')]}
108112end of tocinfo -->
109113
110114< body >
166170 <!-- navigation toc: --> < li > < a href ="#gibbs-sampling " style ="font-size: 80%; "> Gibbs sampling</ a > </ li >
167171 <!-- navigation toc: --> < li > < a href ="#parameter-optimization-and-variational-techniques " style ="font-size: 80%; "> Parameter Optimization and Variational Techniques</ a > </ li >
168172 <!-- navigation toc: --> < li > < a href ="#implementation-with-pennylane " style ="font-size: 80%; "> Implementation with PennyLane</ a > </ li >
173+ <!-- navigation toc: --> < li > < a href ="#kullback-leibler-divergence " style ="font-size: 80%; "> Kullback-Leibler divergence</ a > </ li >
169174
170175 </ ul >
171176 </ li >
@@ -241,8 +246,7 @@ <h2 id="introduction" class="anchor">Introduction </h2>
241246first classical Boltzmann machines and restricted Boltzmann machines (RBMs).
242247Thereafter we
243248introduce QBMs and their restricted variant (RQBM), discuss training
244- methods, and illustrate practical implementation using
245- PennyLane.
249+ methods, and illustrate practical implementation using PennyLane.
246250</ p >
247251</ div >
248252</ div >
@@ -623,16 +627,15 @@ <h2 id="observables" class="anchor">Observables </h2>
623627\langle \hat{O}\rangle = \mathrm{Tr} (\rho \hat{O}).
624628$$
625629
626- < p > In the QBM
627- context, one is interested in the probability \( p(v) \) of measuring the
630+ < p > In the QBM context, one is interested in the probability \( p(v) \) of measuring the
628631visible qubits in computational basis state \( v \). If the full thermal
629632state lives on both visible and hidden qubits, this probability is
630633</ p >
631634$$
632- p_\Theta(v)=\mathrm{Tr}\bigl[Pi_v^{(\text {vis})}\,\ rho(\theta )\bigr],
635+ p_{ \Theta} (v)=\mathrm{Tr}\bigl[\ Pi_v^{(\mathrm {vis})}\rho(\Theta )\bigr],
633636$$
634637
635- < p > where \( Pi_v^{(\text {vis})}=|v\ > \ < v | \) acts on the visible subspace. </ p >
638+ < p > where \( \ Pi_v^{(\mathrm {vis})}=\vert v\rangle\langle v\vert \) acts on the visible subspace.</ p >
636639
637640< p > Equivalently, one may < em > trace out</ em > the hidden qubits and work with the
638641reduced density matrix on the visible subsystem. Computing these
@@ -645,7 +648,7 @@ <h2 id="observables" class="anchor">Observables </h2>
645648< h2 id ="quantum-boltzmann-machines " class ="anchor "> Quantum Boltzmann Machines </ h2 >
646649
647650< p > The model distribution over
648- classical bitstrings \( v \) is given by the diagonal of the quantum Gibbs
651+ classical bitstrings \( v \) is given by the diagonal of the quantum Gibbs (see whiteboard notes for definition of quantum Gibbs state)
649652state \( \rho = e^{-H}/Z \).
650653</ p >
651654
@@ -679,7 +682,7 @@ <h2 id="restricted-qbm-rqbm" class="anchor">Restricted QBM (RQBM) </h2>
679682couplings, and \( V_{ii{\prime}} \) are possible visible-visible couplings.
680683(Classically, \( V=0 \) in an RBM).
681684Importantly, there are no hidden-hidden \( ZZ \) terms in
682- this restricted model. Equation xxxx is a
685+ this restricted model. The above equation is a
683686direct quantum analogue of the RBM energy function, promoting it to an
684687operator acting on qubits.
685688.
@@ -701,40 +704,41 @@ <h2 id="energy-based-training-objective-and-gradients" class="anchor">Energy-Bas
701704< div class ="panel panel-default ">
702705< div class ="panel-body ">
703706<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
704- < p > The goal is to adjust the Hamiltonian parameters \theta so that the
707+ < p > The goal is to adjust the Hamiltonian parameters \( \Theta \) so that the
705708model distribution
706709</ p >
707710$$
708- \(p_\theta (v)=\ < v |\ rho(\theta)|v\ > \) ,
711+ p_{\Theta} (v)=\langle v\vert\ rho(\Theta)\vert v\rangle ,
709712$$
710713
711714< p > approximates
712- \( p_{\rm data}(v) \). Equivalently, one can view the data distribution as
715+ \( p_{\mathrm{ data} }(v) \). Equivalently, one can view the data distribution as
713716a target density matrix \( \eta \) (diagonal in the computational basis) and
714717minimize the quantum relative entropy (quantum KL divergence)
715718</ p >
716719$$
717- S(\eta\Vert \rho(\theta )) = \mathrm{Tr}\!\bigl[\eta\ln\eta\bigr] - \mathrm{Tr}\!\bigl[\eta\ln\rho(\theta )\bigr].
720+ S(\eta\vert \rho(\Theta )) = \mathrm{Tr}\!\bigl[\eta\ln\eta\bigr] - \mathrm{Tr}\!\bigl[\eta\ln\rho(\Theta )\bigr].
718721$$
719722</ div >
720723</ div >
721724
725+
722726<!-- !split -->
723727< h2 id ="non-negative-loss " class ="anchor "> Non-negative loss </ h2 >
724728
725729< p > This loss is non-negative and equals zero only when \( \eta=\rho(\theta) \).
726730Writing \( \rho=e^{-H}/Z \), one finds the gradient of the relative entropy
727- (for parameter \theta in \( H \)) as
731+ (for parameter \( \Theta \) in \( H \)) as
728732</ p >
729733$$
730- \frac{\partial}{\partial\theta } S(\eta\Vert \rho)
731- = \mathrm{Tr}\!\Bigl[\eta\,\partial_\theta (\beta H + \ln Z)\Bigr]
732- = \beta\Bigl(\Tr [\eta\,\partial_\theta H] - \mathrm{Tr}[\rho\,\partial_\theta H]\Bigr).
734+ \frac{\partial}{\partial\Theta } S(\eta\vert \rho)
735+ = \mathrm{Tr}\!\Bigl[\eta\,\partial_\Theta (\beta H + \ln Z)\Bigr]
736+ = \beta\Bigl(\mathrm{Tr} [\eta\,\partial_\theta H] - \mathrm{Tr}[\rho\,\partial_\Theta H]\Bigr).
733737$$
734738
735739< p > In other words,</ p >
736740$$
737- \nabla_\theta S \;=\; \beta\Bigl(\langle \partial_\theta H\rangle_{\rm data} \;-\; \ langle \partial_\theta H\rangle_{\rm model}\Bigr).
741+ \nabla_\Theta S = \beta\Bigl(\langle \partial_\Theta H\rangle_{\rm data}-\ langle \partial_\Theta H\rangle_{\rm model}\Bigr).
738742$$
739743
740744
@@ -747,11 +751,11 @@ <h2 id="analogy-with-classical-rbm" class="anchor">Analogy with classical RBM </
747751distribution
748752</ p >
749753$$
750- \eta\Vert \rho)=\mathrm{Tr}[\eta\ln\eta]-\mathrm{Tr}[\eta\ln\rho].
754+ \eta\vert \rho)=\mathrm{Tr}[\eta\ln\eta]-\mathrm{Tr}[\eta\ln\rho].
751755$$
752756
753- < p > In practice, one computes $\langle \( \partial_\theta H\rangle_{\rm data} \) by averaging over the training
754- set, and estimates \( \langle \partial_\theta H\rangle_{\rm model} \) by
757+ < p > In practice, one computes \( \langle \partial_{\Theta} H\rangle_{\mathrm{ data} } \) by averaging over the training
758+ set, and estimates \( \langle \partial_{ \theta} H\rangle_{\mathrm{ model} } \) by
755759sampling from the quantum model.
756760</ p >
757761
@@ -761,7 +765,7 @@ <h2 id="gibbs-sampling" class="anchor">Gibbs sampling </h2>
761765< p > Note that preparing exact Gibbs samples of a non-commuting Hamiltonian
762766is hard. Many methods have been proposed to approximate the model
763767expectation. For example, one may use a bound on the quantum free
764- energy (as in Amin et al. ) , or perform contrastive divergence with a
768+ energy, or perform contrastive divergence with a
765769quantum device. Recent theoretical work shows that minimizing the
766770relative entropy in QBM training can be done with stochastic gradient
767771descent in polynomial sample complexity under reasonable assumptions .
@@ -770,7 +774,7 @@ <h2 id="gibbs-sampling" class="anchor">Gibbs sampling </h2>
770774<!-- !split -->
771775< h2 id ="parameter-optimization-and-variational-techniques " class ="anchor "> Parameter Optimization and Variational Techniques </ h2 >
772776
773- < p > Given the gradient above, one can optimize \theta by standard
777+ < p > Given the gradient above, one can optimize \( \Theta \) by standard
774778gradient-based methods (SGD, Adam, etc.). In a gate-based setting, we
775779implement the RQBM Hamiltonian via a parameterized quantum circuit
776780(ansatz) and use variational quantum algorithms (VQAs). Each
@@ -779,39 +783,11 @@ <h2 id="parameter-optimization-and-variational-techniques" class="anchor">Parame
779783parameter-shift rule or automatic differentiation.
780784</ p >
781785
782- < p > One approach is the \( \beta \)-Variational Quantum Eigensolver (β-VQE)
783- technique. Liu et al. (2021) proposed a variational ansatz to
784- represent a thermal (mixed) state using a combination of a classical
785- neural network and a quantum circuit. Huijgen et al. (2024) applied
786- this to QBM training: an inner loop runs β-VQE to approximate the
787- Gibbs state of H(\theta), while an outer loop updates \theta to
788- minimize the relative entropy to the data . This “nested loop”
789- algorithm effectively sidesteps direct sampling of the true quantum
790- Boltzmann state by variational approximation. It has been shown to
791- work on both classical and quantum target data, achieving
792- high-fidelity learning for up to 10 qubits .
793- </ p >
794-
795- < p > Other sophisticated ansätze exist. For example, Evolved Quantum
796- Boltzmann Machines (Minervini et al., 2025) prepare a thermal state
797- under one Hamiltonian and then evolve it under another, combining
798- imaginary- and real-time evolution. They derive analytical gradient
799- formulas and propose natural-gradient variants . There are also
800- “semi-quantum” RBMs (sqRBMs) which commute in the visible subspace and
801- treat the hidden units quantum-mechanically. Intriguingly, sqRBMs
802- were found to be expressively equivalent to classical RBMs, requiring
803- fewer hidden units for the same number of parameters . In practice,
804- however, variational optimization in high dimensions can suffer from
805- barren plateaus. Recent analysis shows that training QBMs using the
806- relative entropy objective avoids many of these concentration issues,
807- with provably polynomial complexity under realistic conditions .
808- </ p >
809-
810786<!-- !split -->
811787< h2 id ="implementation-with-pennylane " class ="anchor "> Implementation with PennyLane </ h2 >
812788
813789< p > As a concrete example, we outline how to implement an RQBM in
814- PennyLane. We consider n_v visible and n_h hidden qubits. The ansatz
790+ PennyLane. We consider \( n_v \) visible and \( n_h \) hidden qubits. The ansatz
815791can be, for instance, layers of parameterized single-qubit rotations
816792and entangling gates that respect the bipartite structure. Below is
817793illustrative code (in Python) using PennyLane’s default.qubit
@@ -822,17 +798,20 @@ <h2 id="implementation-with-pennylane" class="anchor">Implementation with PennyL
822798
823799<!-- !ec -->
824800
825- < p > This circuit takes a parameter vector params of length n_v+n_h and
826- returns the probabilities q_\theta(v ) of measuring each visible
827- bitstring v . Notice we measure only the visible wires (the
828- wires=list(range(n_v )) in qml.probs marginalizes out the hidden
801+ < p > This circuit takes a parameter vector params of length \( n_v+n_h \) and
802+ returns the probabilities \( p_{\Theta}(v) \ ) of measuring each visible
803+ bitstring \( v \) . Notice we measure only the visible wires (the
804+ wires=list(range(n$\_$v )) in qml.probs marginalizes out the hidden
829805qubit).
830806</ p >
831807
808+ <!-- !split -->
809+ < h2 id ="kullback-leibler-divergence " class ="anchor "> Kullback-Leibler divergence </ h2 >
810+
832811< p > Next, we train this model to match a target dataset distribution.
833- Suppose our data has distribution target = [p(00), p(01), p(10),
834- p(11)]. We can define the (classical) loss as the Kullback-Leibler
835- divergence D_{\rm KL}(p_{\rm data}\Vert q_\theta ) or simply the
812+ Suppose our data has distribution target = \( [p(00), p(01), p(10),
813+ p(11)] \) . We can define the (classical) loss as the Kullback-Leibler
814+ divergence \( D_{\mathrm{ KL}} (p_{\mathrm{ data}}\vert\vert q_\Theta) \ ) or simply the
836815negative log-likelihood. Then we update params by gradient descent.
837816PennyLane’s automatic differentiation can compute gradients via the
838817parameter-shift rule, but we show an explicit parameter-shift
@@ -869,19 +848,6 @@ <h2 id="implementation-with-pennylane" class="anchor">Implementation with PennyL
869848gradient, etc.). The above shows that PennyLane can seamlessly
870849integrate quantum circuit definitions with classical training logic.
871850</ p >
872-
873- $$
874- H = -\sum_{a=1}^N \Gamma_a,\sigma^x_a ;-;\sum_{a=1}^N b_a,\sigma^z_a ;-;\sum_{a < b } u_{ab},\sigma^z_a \sigma^z_b,
875- $$
876-
877- < p > where \( \sigma_a^x,\sigma_a^z \) are the Pauli matrices on qubit \( a \),
878- \( \Gamma_a \) is a “transverse” field, \( b_a \) are local fields (biases),
879- and \( u_{ab} \) are interaction strengths . Here each qubit can be
880- interpreted analogously to a classical unit, but the \( \sigma^x \) term
881- induces quantum superposition. One can also include \( \sigma^y \) or
882- more complex terms, but we focus on this common ansatz.
883- </ p >
884-
885851<!-- ------------------- end of main content --------------- -->
886852</ div > <!-- end container -->
887853<!-- include javascript, jQuery *first* -->
0 commit comments