Quantum Computing for the Uninitiated: The Basics
TL;DR: Cheat Sheet for Quantum Computing. Target audience is non-physicists and no physics background required. This is the very first post in the series presenting the very basics to get started. Long and technical.
Posts in this series (so far).
My apologies for incomplete references—this should merely serve as an overview.
This will be a series on quantum computing. Our perspective here will be a more mathematical or computer science one. I am not a physicist so I will not be able to provide sophisticated physical interpretations. Nonetheless, I will try to provide physics context here and there to highlight the difficulties when going from the rather abstract mathematical formalism of quantum mechanics and quantum computing to the real (physical) world, which leads to many challenging—sometimes philosophical—problems. Feel free to comment if you have suggestions for improvements.
In this first installment we will really just look at the basics of quantum computing and I end this post with a famous motivating example showing the power of quantum mechanics. Most of what we are going to see today is linear algebra with Dirac notation; consider this a warm-up to get used to the notation as well as a refresh on linear algebra basics in the context of quantum mechanics, which is the basis for quantum computing. For a more extensive introduction, check out [dW19], [M07], and [P21] which I heavily relied upon and from where some of the examples are taken. I also extensively used wikipedia, which has quite accessible articles on most of the basic stuff that we will see today.
Dirac notation
We will be working in Hilbert spaces over complex numbers. A very useful notation in quantum mechanics is the Dirac notation (also called: bra-ket notation), which is used to write quantum states, which in turn are nothing else but special vectors in that Hilbert space. Slightly abusing notions, following Dirac’s original intent, basically an element $\phi \in \mathcal H$ on the primal side is a ket, written as $\ket{\phi}$ and corresponds to a column vector and an element $\psi \in \mathcal H$ on the dual side is a bra, written as $\bra{\psi}$, corresponding to a row vector. This notation has many advantages as it ensures that we automatically distinguish between primal and dual and the inner product follows naturally; we will see all this in a second.
ket and bra
Usually we will have an orthonormal basis (say, $\ket{0}, \dots, \ket{N-1}$ of $N$ vectors) that generates our Hilbert space $\mathcal H = \langle \ket{0}, \dots, \ket{N-1} \rangle$ and each element $\ket{\phi} \in \mathcal H$ (abusing notation here), is given by:
\[\ket{\phi} = \sum_{i = 0}^{N-1} \alpha_i \ket{i} \qquad \text{ with } \qquad \alpha_i \in \CC,\]equivalently, due to the standard isomorphism in the finite dimensional case, we can write
\[\newcommand\vec[1]{\begin{pmatrix}#1\end{pmatrix}} \ket{\phi} = \vec{\alpha_0 \\ \vdots \\ \alpha_{N-1}},\]and naturally associated with each ket $\ket{\phi}$ is a bra $\bra{\phi}$, which is defined as the conjugate transpose of $\ket{\phi}$:
\[\newcommand\vec[1]{\begin{pmatrix}#1\end{pmatrix}} \bra{\phi} = \vec{\alpha_0^\esx, \dots, \alpha_{N-1}^\esx},\]where the $\esx$ denotes the conjugate operation here, mapping a complex number $\alpha = x + iy$ to $\alpha^\esx = x + i (-y) = x - i y$. Note, that the bra-ket notation is simply a different notation for vectors and in particular, it holds:
\[\ket{a \phi + b \gamma} = a \ket{\phi} + b \ket{\gamma} \qquad \text{ and } \qquad \bra{a \phi + b \gamma} = a^\esx \bra{\phi} + b^\esx \bra{\gamma}.\]However, the bra notation has the built-in conjugate for its coefficients, which ensures that basically all properties, e.g., of the inner product simply follow from applying the “Euclidean”-style inner product.
Inner product
With the above we naturally obtain our scalar product as our basis is orthonormal, i.e., \(\braket{i \mid j} = \delta_{ij}\). To this end, let
\[\newcommand\vec[1]{\begin{pmatrix}#1\end{pmatrix}} \ket{\phi} = \vec{\alpha_0 \\ \vdots \\ \alpha_{N-1}} \qquad \text{ and } \qquad \bra{\psi} = \vec{\beta_0^\esx, \dots, \beta_{N-1}^\esx},\]then we have that
\[\newcommand\vec[1]{\begin{pmatrix}#1\end{pmatrix}} \braket{\psi \mid \phi} = \vec{\beta_0^\esx, \dots, \beta_{N-1}^\esx} \cdot \vec{\alpha_0 \\ \vdots \\ \alpha_{N-1}} = \sum_{i = 0}^{N-1} \beta_i^\esx \alpha_i,\]where we have exploited the built-in conjugate in the bra.
Properties of $\braket{\psi \mid \phi}$.
a. $\braket{\psi \mid \phi}$ is a Hermitian form, i.e., $\braket{\psi \mid \phi} = \braket{\phi \mid \psi}^\esx$
b. linear in right-hand side: $\braket{\psi \mid a \phi + b \gamma} = a \braket{\psi \mid \phi} + b \braket{\psi \mid \gamma}$
c. anti-linear in left-hand side: $\braket{a \psi + b \delta \mid \phi} = a^\esx \braket{\psi \mid \phi} + b^\esx \braket{\delta \mid \phi}$
d. $\braket{\psi \mid \phi} \in \CC$
e. $\braket{\phi \mid \phi} \in \RR$ and $\braket{\phi \mid \phi} > 0$ iff $\ket{\phi} \neq 0$
We also obtain the squared norm $\norm{\ket{\phi}}^2$
\[\norm{\ket{\phi}}^2 = \braket{\phi \mid \phi} =\sum_{i = 0}^{N-1} \alpha_i^* \alpha_i = \sum_{i = 0}^{N-1} \abs{\alpha_i}^2 \in \RR.\]In the following let $A^\dagger$ denote the adjoint of the matrix $A$, which is nothing else but the conjugate transpose of $A$, i.e., $A^\dagger = (A^T)^\esx$. In particular, if $A$ corresponds to multiplication with $z \in \CC$, then $A^\dagger$ corresponds to the multiplication with $z^\esx$. It is useful here to extend the dagger notion also the vectors to render bras and kets dual to each other, i.e., $(\ket{\phi})^\dagger = \bra{\phi}$, which is in line with our definition of the bra as conjugate transpose of the ket. With the general rule that the adjoint of the product is equal to the reverse-order product of the adjoints, most of the below follow naturally; see also [M07] for a broader exposition. For some of those rules, we assume that we will be working with finite dimensional vector spaces.
Useful rules.
a. $\ket{A \phi} = A \ket{\phi}$ and $\bra{A\phi} = \bra{\phi} A^\dagger$.
b. $(A \ket{\phi})^\dagger = (\ket{A \phi})^\dagger = \bra{\phi} A^\dagger$.
c. $A (\alpha \ket{\phi} + \beta \ket{\psi}) = \alpha A \ket{\phi} + \beta A \ket{\psi}$ and $(\alpha \bra{\phi} + \beta \bra{\psi}) A = \alpha \bra{\phi} A + \beta \bra{\psi} A = \alpha \bra{A^\dagger \phi} + \beta \bra{A^\dagger \psi}$.
d. If $U$ is a unitary matrix, i.e., $U^\dagger U = U U^\dagger = I$ then $\braket{\psi \mid \phi} = \braket{U \psi \mid U\phi}$.
Pure States
With this we can now define what pure (quantum) states are. These are nothing else but linear combinations of elements from our orthonormal basis ${\ket{i}}_{i = 0, \dots, N-1}$.
\[\ket{\phi} = \sum_{i = 0}^{N-1} \alpha_i \ket{i} \qquad \text{ with } \qquad \alpha_i \in \CC,\]and additionally we require that
\[\norm{\ket{\phi}}^2 = \braket{\phi \mid \phi} =\sum_{i = 0}^{N-1} \alpha_i^* \alpha_i = \sum_{i = 0}^{N-1} \abs{\alpha_i}^2 = 1,\]and hence also $\norm{\ket{\phi}} = 1$.
Measurements
An important operation that we can apply to a state is a measurement with the aim to extract information from the state.
Projective Measurements
We first consider so-called projective measurements. To this end, let us briefly recall the definition and properties of an orthogonal projection matrix:
Definition and Properties: Orthogonal projection matrices. A square matrix $P : \mathcal H \rightarrow \mathcal H$ is an orthogonal projection matrix if:
\[P^2 = P = P^\dagger\]
Properties.
a. $\braket{\psi \mid P \phi} = \braket{\psi P \mid \phi}$
b. Eigenvalues of $P$ are $0$ and $1$ only
c. $\norm{\ket{P \phi}}^2 = \braket{P \phi \mid P \phi} = \braket{\phi \mid P^\dagger P \mid \phi} = \braket{\phi \mid P \mid \phi} = \tr(P \ketbra{\phi}{\phi})$. The matrix $\rho = \ketbra{\phi}{\phi}$ here is called density matrix and we will revisit it later.
See also wikipedia for more useful properties. With this we can define the measurement operation:
Definition: Measurement. A measurement with $m$ outcomes is a set of orthogonal projection matrices $P_1, \dots, P_m$ that decompose the identity matrix $I = \sum_{i = 1}^m P_i$.
Note, that the above definition implies that $P_i P_j = 0$ for $i \neq j$: Simply multiply $I = \sum_{i = 1}^m P_i$ with some $P_j$ from the right, then reorder to $0 = P_1P_j + \dots + P_j(P_j - I) + \dots + P_nP_j$. Since the images of two distinct $P_i$ only intersect in $0$ it follows that $P_1P_j = 0$ for all $i \neq j$; full proof left to the interested reader or see e.g., Theorem 2.13 here.
We can now write $\ket{\phi} = I \ket{\phi} = \sum_{i = 1}^m P_i \ket{\phi}$. As $\norm{\ket{\phi}}^2 = 1$ and since the projections are orthogonal, we have that $1 = \sum_{i = 1}^m \norm{P_i \ket{\phi}}^2$, as $P_iP_j = 0$ for $i \neq j$ and $P_i^2 = P_i$, i.e., we obtain a probability distribution. The process of measuring now samples an $i$ according to this probability distribution, i.e., with probability $\norm{\ket{P_i\phi}}^2$ and maps $\ket{\phi} \mapsto \ket{P_i \phi} / \norm{\ket{P_i\phi}}$, which is again a (valid) state. After measuring, the state $\ket{\phi}$ ends up in an eigenstate of the measurement and thus the state changes, except for when $\ket{\phi}$ is already in an eigenstate of the measurement in which case it does not change.
Note that measurements are invariant w.r.t. the global phase, i.e., $\ket{\phi}$ and $e^{ir} \ket{\phi}$ produce the same measurement outcomes and statistics and the obtained states after measurement are also identical up to $e^{ir}$-rotation. In fact the global rotation $e^{ir}$ only affects the phase of the complex coefficients but not their absolute value. This is not to be confused with the relative phase differences in superpositions which are important.
Measuring in the computational base
Later, we will be mostly concerned with the case where the $P_i$ are given as rank-1 projectors into the actual (computational) basis $\ket{0}, \dots, \ket{N-1}$, i.e., $P_i = \ketbra{i}{i}$. Let $\ket{\phi} = \sum_{i = 0}^{N-1} \alpha_i \ket{i}$. We then have:
\[P_j \ket{\phi} = \ketbra{j}{j} \ket{\phi} = \sum_{i = 0}^{N-1} \alpha_i \ketbra{j}{j} \ket{i} = \alpha_j \ket{j},\]we purposefully (only this time) did not clean up bra and ket double separators for the sake of exposition. Thus we obtain that we measure $P_j = \ketbra{j}{j}$ with probability $\norm{\ket{P_j\phi}}^2 = \norm{\alpha_j \ket{j}}^2 = \Abs{\alpha_j}^2$. Alternatively, just for the sake of getting used to the bra-ket notation:
\[\begin{align*} \norm{\ket{P_j\phi}}^2 & = \norm{\ket{j}\bra{j} \ket{\phi}}^2 = \braket{\phi \mid \ketbra{j}{j} \mid \phi} = \braket{\phi \mid j} \braket{j \mid \phi} \\ & = \braket{j \mid \phi}^\esx \braket{j \mid \phi} = \Abs{\braket{j \mid \phi}}^2 \\ & = \Abs{\sum_{i = 0}^{N-1} \braket{j \mid \alpha_i i}}^2 = \Abs{\sum_{i = 0}^{N-1} \alpha_i \braket{j \mid i}}^2 = \Abs{\alpha_j}^2. \end{align*}\]The resulting state after measuring $j$ via $P_j$ is
\[\ket{P_j\phi} / \norm{\ket{P_j\phi}} = \frac{\alpha_j}{\Abs{\alpha_j}} \ket{j},\]i.e., when measuring in the computational basis our superposition collapses to a classical state.
The Physics spin: Measurements, collapse of superpositions, and Schrödinger’s cat. While we quite non-chalantly applied our measurements, e.g., by simply multiplying with the projection matrix and renormalization, the physical reality seems to be much more complicated. In fact, up to today it is unclear when exactly the measurement happens that forces the quantum superposition to collapse to a classical state. The famous thought experiment of Schrödinger made this problem very apparent. Simplifying, the box with the cat is built so that the life of a cat in a box is linked one-to-one to a quantum superposition, i.e., it is a mechanism to upscale the effect from the atomic domain to the macroscopic one. Now when does the measurement take place that decides the fate of the cat? When you open the box? What if you can hear the cat being alive in the box? I.e., when exactly does the superposition cease to be a superposition and collapses to a classic state? There are tons of interpretation of quantum mechanics that give different answers to the questions posed by Schrödinger’s cat. The most prevalent one, which also seems to be the most unsatisfying one as it is basically stating the obvious, is the so-called Copenhagen interpretation: “A system stops being a superposition of states and becomes either one or the other when an observation takes place.” Now, what is an “observation”? For further reading check out wikipedia, but beware this easily becomes a rabbit hole.
Note that while we have seen only rank-1 projectors in this section, it is very well possible to also have higher rank projectors. For example consider the state:
\[\ket{\phi} = \frac{1}{\sqrt{3}} \ket{1} + \frac{2}{\sqrt{3}} \ket{N},\]and the projectors (assuming $N$ is even)
\[P_1 = \sum_{i = 1}^{N/2} \ketbra{i}{i} \qquad \text{and} \qquad P_2 = \sum_{i = N/2 + 1}^{N} \ketbra{i}{i}.\]Clearly, $I = P_1 + P_2$. We measure with the first projector $P_1$ with probability
\[\norm{P_1 \ket{\phi}}^2 = \tr(P_1 \ketbra{\phi}{\phi}) = 1/3\]and we end up in state $P_1 \ket{\phi} / \norm{P_1 \ket{\phi}} = \ket{1}$. Similarly, we measure the second projector $P_2$ with probability $\norm{P_2 \ket{\phi}}^2 = \tr(P_2 \ketbra{\phi}{\phi}) = 2/3$ ending up in state $\ket{N}$.
Remark (Probability of state transition). Finally, we consider a curiosity that we are going to revisit later. Let $\ket{\phi} = \sum_{i = 0}^{N-1} \alpha_i \ket{i}$ and $\ket{\psi} = \sum_{i = 0}^{N-1} \beta_i \ket{i}$ be two states expressed in our computational base and let us define the rank-1 projector $P = \ketbra{\psi}{\psi}$ and let $Q = I - P$ be its complementary projector. Now let us consider the probability of measuring $\phi$ with $P$. By the above this is: \[ \norm{P\ket{\phi}}^2 = \braket{\phi \mid P \mid \phi} = \braket{\phi \ketbra{\psi}{\psi} \phi} = \Abs{\braket{\psi \mid \phi}}^2, \] and the last statement can be expressed via the computational base by linearity \[ \Abs{\braket{\psi \mid \phi}}^2 = \Abs{\sum_{i = 0}^{N-1}\sum_{j = 0}^{N-1} \beta_i^\esx \alpha_j \braket{i \mid j}}^2 = \Abs{\sum_{i = 0}^{N-1} \beta_i^\esx \alpha_i}^2, \] using that $\braket{i \mid j} = \delta_{ij}$. Moreover, if we end up measuring with $P$ we obtain the post-measurement state: \[ \ket{P \phi} / \norm{\ket{P \phi}} = \ket{\psi}\braket{\psi \mid \phi} / \norm{\ket{P \phi}} = \ket{\psi}. \] So what did this exercise show us? In some meanigful way, the probability of $\phi$ transitioning to $\psi$ is equal to $\Abs{\braket{\psi \mid \phi}}^2 = \Abs{\sum_{i = 0}^{N-1} \beta_i^\esx \alpha_i}^2$. I am simplifying a little here because there is some arbitrariness why applying the measure $P$, $Q$ and not any another. We are going to discuss this a little later but keep this formula in mind. It will prove quite helpful.
Observables
Closely connected to projective measurements are observables.
Definition: Observable. A projective measurement with $m$ distinct outcomes $\lambda_1, \dots, \lambda_m \in \RR$ given by a set of orthogonal projection matrices $P_1, \dots, P_m$ that decompose the identity matrix $I = \sum_{i = 1}^m P_i$ form the observable $M = \sum_{i = 1}^m \lambda_i P_i$.
Observe that $M$ is Hermitian, i.e., $M = M^\dagger$ as $\lambda_i \in \RR$ and $P_i = P_i^\dagger$ are Hermitian themselves for $i = 1, \dots, m$ (recall: if $M$ is Hermitian, all its eigenvalues are real and eigenvectors of distinct eigenvalues are orthogonal). Moreover, any Hermitian matrix $M$ corresponds to an observable, simply by taking its spectral decomposition $M = \sum_{i = 1}^m \lambda_i P_i$ with $\lambda_i \in \RR$ as $M$ is Hermitian. Thus there is a correspondence between observables and Hermitian matrices.
Observables allow us to very easily compute the expected value of a measurement. As before we have that the probability of measuring outcome $j$ is simply $\norm{P_i \ket{\phi}}^2$, thus we obtain the expected value of the measurement as:
\[\tag{EObservable} \sum_{i = 1}^m \lambda_i \norm{P_i \ket{\phi}}^2 = \sum_{i = 1}^m \lambda_i \tr(P_i \ketbra{\phi}{\phi}) = \tr(M \ketbra{\phi}{\phi}).\]Positive-Operator-Valued Measure (POVM) measurements
The measurements above are so-called projective measurements as they use projection matrices. However if we are not interested in the resulting state after measuring there is another form of measurement, so-called Positive-Operator-Valued Measure (POVM) measurements. I will keep it brief for now until we need POVMs; for more details see wikipedia. Here we are given $m$ positive semidefinite matrices $E_1, \dots, E_m$ (effectively relaxing the 0/1 eigenvalue requirement of the projection matrices), so that $I = \sum_{i = 0}^{m-1} E_i$. Similar to what we have done before, given a state $\ket{\phi}$ the probability of measuring outcome $j$ is $\tr(E_j \ketbra{\phi}{\phi})$ however, and this is important, it might not hold that the probability is given by $\norm{E_j \ket{\phi}}^2$. In the derivation from earlier we basically used
\[\tr(E_j \ketbra{\phi}{\phi}) = \tr(E_j^2 \ketbra{\phi}{\phi}) = \tr(E_j \ketbra{\phi}{\phi} E_j) = \norm{E_j \ket{\phi}}^2\]in particular the first equality can easily fail if $E_j$ is not a projector, i.e., $E_j^2 = E_j$ might not hold.
There are a couple of things compared to projective measurement (also sometimes abbreviated PVM for projection-valued measure) that are different and we will look them in more detail below. Most importantly, the elements $E_1, \dots, E_m$ of the POVM do not have to be orthogonal anymore and as such in particular, we can have $m \geq N$ elements where $N$ is the dimension of the Hilbert space under consideration. This can be helpful in some application and was not possible for PVMs due to the orthogonality condition. In fact projective measurements are a special case of the POVMs together with the additional condition $E_i^2 = E_i$ and $E_i E_j = 0$ for $i \neq j$. On the other hand it is not obvious to characterize the post-measurement state. We might think of POVMs as being to PVMs what mixed states are to pure states.
Why do we care? The reason is that when two states we want to distinguish are orthogonal, we can simply use a PVM, however if they are not orthogonal then there is neither a PVM nor POVM that can separate these two with certainty; it it simply impossible. In fact this impossibility is used in several quantum applications. However there are POVMs that never make a mistake but sometimes return that they cannot distinguish the state, i.e., return “I don’t know”. As an example consider the two states:
\[\ket{0} \qquad \text{and} \qquad \ket{+} \doteq \frac{1}{\sqrt{2}}(\ket{0} + \ket{1})\]and we consider the three psd matrices (with $\ket{-} \doteq \frac{1}{\sqrt{2}}(\ket{0} - \ket{1})$):
\[E_0 \doteq \frac{1}{2}\ketbra{-}{-} \qquad \text{and} \qquad E_1 \doteq \frac{1}{2} \ketbra{1}{1} \qquad \text{and} \qquad E_3 \doteq I - E_0 - E_1,\]which are psd with eigenvalues \(\{0, 1/2\}\) for $E_0$ and $E_1$ and \(\{\approx 0.146, \approx 0.854\}\) for $E_2$ and by definition sum up to $1$. We obtain the following measurement outcomes. If the state is $\ket{0}$ and we measure with the POVM we have the outcomes
\[0 \text{ w.p. } \tr(E_0 \ketbra{0}{0}) = 1/4 \qquad 1 \text{ w.p. } \tr(E_1 \ketbra{0}{0}) = 0 \qquad 2 \text{ w.p. } \tr(E_2 \ketbra{0}{0}) = 3/4.\]If on the other hand the state is $\ket{+}$ and we measure with the POVM we have the outcomes
\[0 \text{ w.p. } \tr(E_0 \ketbra{+}{+}) = 0 \qquad 1 \text{ w.p. } \tr(E_1 \ketbra{+}{+}) = 1/4 \qquad 2 \text{ w.p. } \tr(E_2 \ketbra{+}{+}) = 3/4.\]While there is no PVM in original space that can achieve the same thing, by slightly extending the dimension of the space we can find a PVM that generates the same outcome distribution. This is known as Naimark’s dilation theorem (also Neumark’s Theorem; see also here for a formulation directly applicable to POVMs). This theorem is crucial as it allows to physically realize POVMs by means of PVMs. Moreover, there is also an interested twist in terms of the post-measurement state that we brushed aside so far: when measuring with a POVM the post-measurement state is actually not defined by the POVM but rather by the PVM that physically realizes it. There is an infinite number of such realizations of the POVM by means of PVMs simply via applying unitaries. Thus if we need the post-measurement state we need to realize the POVM by means of a PVM and compute its post-measurement state. Moreover, note that due to non-orthogonality when applying a POVM, the measurement is not repeatable in the sense that measuring twice can change the result the second time.
Pure states vs. Mixed states vs. Ensembles
We will now discuss pure states and mixed states. You might want to read this twice as there is something non-trivial going on here. We will later revisit pure vs. mixed states also for more complex setups but it is instructional to start with the simple case first.
Let us first consider the pure state
\[\ket{\phi} = \frac{1}{\sqrt{2}} (\ket{0} + \ket{1}).\]As stated above this is a pure state as it is a vector of norm $1$ in the Hilbert space generated by $\ket{0}$ and $\ket{1}$. Now let us further define the observable
\[M = \ketbra{0}{0} - \ketbra{1}{1}.\]If we now measure with $M$, we obtain that the expected value of the measurement is
\[\tr(M \ketbra{\phi}{\phi})\]and after measuring via M, we find the system in state $\ket{0}$ with probability $1/2$ and in state $\ket{1}$ with probability $1/2$.
We can also define a so-called ensemble which is a statistical mixture of states via a so-called density matrix $\rho$
\[\rho = \frac{1}{2} \ketbra{0}{0} + \frac{1}{2} \ketbra{1}{1}.\]It is easy to see that the density matrix is positive semidefinite, Hermitian, and has trace $1$ and density matrices are a generalization of the usual (pure) state description and can also capture mixed states and ensembles (as we do here); see wikipedia for more. In a nutshell, mathematically a mixed state is a convex combination of pure states. This ensemble describes our degree of knowledge stating that with probability $1/2$ we have that $\rho$ is the state $\ket{0}$ and with probability $1/2$ we have that $\rho$ is the state $\ket{1}$.
It is very important not to confuse a super position, which captures fundamental quantum uncertainty with ensembles which capture our degree of knowledge about the system. So in some sense we have two types of uncertanties: fundamental quantum uncertainty and statistical uncertainty. I found the following two statements helpful to differentiate the two:
Statistical mixtures represent the degree of knowledge whilst the uncertainty within quantum mechanics is fundamental. [wikipedia]
and
A mixed state is a mixture of probabilities of physical states, not a coherent superposition of physical states.
Note we can also measure an ensemble w.r.t. an observable $M$ via its density matrix $\rho$:
\[\tag{EEnsemble} \tr(M\rho),\]which is nothing else but the probability weighted average of the outcomes for the individual states comprising the ensemble.
The Physics spin: Ensemble interpretation. A way to think about ensembles is that if we have infinite copies of system then the ensemble captures the distribution of states. Closely related to this is the Ensemble Interpretation (EI) that considers a quantum state not being an exhaustive representation of an individual physical system but only a description for an ensemble of similarly prepared systems. This is in contrast to the Copenhagen Interpretation (CI). From wikipedia; see [B14] for more background:
CI: A pure state \(\ket{y}\) provides a “complete” description of an individual system, in the sense that a dynamical variable represented by the operator \(Q\) has a definite value (\(q\), say) if and only if \(Q \ket{y} = q \ket{y}\).
EI: A pure state describes the statistical properties of an ensemble of identically prepared systems, of which the statistical operator is idempotent.
Now you might be tempted to think that this is a more metaphysical problem than a mathematical one. Let me convince you with the next example that this is not the case and, in fact, quantum uncertainty behaves very differently than normal statistical uncertainty and probability theory.
Example: Superposition vs. mixture of states. Consider the following two states:
\[\phi_1 = \frac{1}{\sqrt{2}} (\ket{0} + \ket{1}) \qquad \text{and} \qquad \phi_2 = \frac{1}{\sqrt{2}} (\ket{0} - \ket{1}),\]
and let us define the observable
\[M = \ketbra{0}{0} - \ketbra{1}{1}.\]
With what we have seen so far, when measuring with $M$, for state $\ket{\phi_1}$ we end up in state:
\[
\ket{0} \text{ w.p. } \norm{\ketbra{0}{0} \phi_1}^2 = 1/2 \qquad\qquad \ket{1} \text{ w.p. } \norm{\ketbra{1}{1} \phi_1}^2 = 1/2,
\]
and for state $\ket{\phi_2}$ we end up in state:
\[
\ket{0} \text{ w.p. } \norm{\ketbra{0}{0} \phi_2}^2 = 1/2 \qquad\qquad \ket{1} \text{ w.p. } \norm{\ketbra{1}{1} \phi_2}^2 = 1/2,
\]
where we used “w.p.” as a short-hand for “with probability”. Although $\phi_1 \neq \phi_2$ under the observable $M$ we end up in states $\ket{0}$ and $\ket{1}$ uniformly and with the same distribution for $\ket{\phi_1}$ and $\ket{\phi_2}$.
Now let us first consider a uniform mixture of these two states via the density matrix:
\[\rho = \frac{1}{2} \ketbra{\phi_1}{\phi_1} + \frac{1}{2} \ketbra{\phi_2}{\phi_2}.\]
So if we measure with $M$ with what probability do we obtain state $\ket{0}$? With probability $1/2$, the system is in state $\ket{\phi_0}$ and we have just computed that in this case we measure $\ket{0}$ with probability $1/2$, i.e., by the product rule that is a probability of $1/4$. Moreover, with probability $1/2$ the system is in state $\ket{\phi_1}$ and we have just computed that in this case we measure $\ket{0}$ with probability $1/2$ as well. Thus again $1/4$ probability, so that we obtain a total probability of measuring $\ket{0}$ being $1/4 + 1/4 = 1/2$; basic probability calculation. Moreover, we can also compute the expected value of the observable via the rules from above. Via (EEnsemble) we have
\[
\tr(M\rho) = \frac{1}{2} \tr(M \ketbra{\phi_1}{\phi_1}) + \frac{1}{2} \tr(M \ketbra{\phi_2}{\phi_2}),
\]
and via (EObservable) we obtain
\[
\tr(M\rho) = \frac{1}{2} (\norm{\ketbra{0}{0} \ket{\phi_1}}^2 - \norm{\ketbra{1}{1} \ket{\phi_1}}^2) + \frac{1}{2} (\norm{\ketbra{0}{0} \ket{\phi_2}}^2 - \norm{\ketbra{1}{1} \ket{\phi_2}}^2) = 0.
\]
Now let us consider the “uniform” superposition of $\phi_1$ and $\phi_2$. Recall that both $\phi_1$ and $\phi_2$ are in state $\ket{0}$ and $\ket{1}$ with probability $1/2$ after measurement with $M$. We consider the superposition $\phi$ defined as:
\[\phi = \frac{1}{\sqrt{2}} (\phi_1 + \phi_2) = \ket{0}.\]
Now we have
\[\ket{0} \text{ w.p. } \norm{\ketbra{0}{0} \phi}^2 = 1,\]
and the expected value under $M$ is:
\[\tr(M\ketbra{\phi}{\phi}) = 1.\]
So what happened here and how is this possible? The key is that in a superposition the amplitudes can interact as is the case here. Slightly metaphysical: this interaction allows for something like “negative probabilities”, so that both $\phi_1$ and $\phi_2$ are maximally random but their superposition is not.
For those of you that like to implement things a quick computation with qutip
in python
of the above roughly looks as follows; see also this colab notebook:
from qutip import *
import math
N = 2
b0 = basis(N, 0) # |0>
b1 = basis(N, 1) # |1>
phi1 = 1/math.sqrt(2) * (b0 + b1)
phi2 = 1/math.sqrt(2) * (b0 - b1)
M = b0.proj() - b1.proj() # the observable
print("Probability: ", (b0.proj() * phi1).norm()**2) # prob of |0> when measuring |\phi_1> via M: 1/2
rho = 1/2 * phi1.proj() + 1/2 * phi2.proj() # density matrix
print("Expected Value Mixture: ", (M * rho).tr()) # expected value of M for mixed state: 0.0
phi = phi1 + phi2
phi = phi / phi.norm()
(b0.proj() * phi).norm()**2 # prob |0> when measuring |\phi> via M: 1.0
print("Expected Value State: ", (M * phi.proj()).tr()) # expected value of M for |\phi>: 1.0
So how do we know whether a state is a pure state or a mixed state? One of the easiest ways is looking at its density matrix $\rho$. The state given by the density matrix $\rho$ is pure if and only if $\tr(\rho^2) = 1$. This gives also rise to the notion of linear entropy of a state given by its density matrix $\rho$ defined as:
\[S_L(\rho) \doteq 1 - \tr(\rho^2),\]so that $\rho$ is pure if and only if $S_L(\rho) = 0$. Similarly we can define the von Neumann entropy of a state given by its density matrix $\rho$ as:
\[S(\rho) \doteq - \tr(\rho \ln \rho),\]where $\ln$ is the natural matrix logarithm (see wikipedia for more information). In case $\rho$ is expressed in terms of its eigenvectors, i.e., $\rho = \sum_{i = 0}^{N-1} \eta_i \ketbra{i}{i}$, the von Neumann entropy simply becomes the Shannon entropy of the eigenvalues, i.e.,
\[S(\rho) = - \sum_{i = 0}^{N-1} \eta_i \ln \eta_i.\]Similarly, we have $S(\rho) = 0$ if and only if $\rho$ is a pure state. In fact we can think of both the linear entropy as well as the von Neumann entropy as a measure of mixedness of the state. The latter notion we will also revisit in the context of the entropy of entanglements. For a maximally mixed state the linear entropy is $1 - 1/N$ and the von Neumann entropy is $\ln N$. The linear entropy is usually much easier to compute as it does not require a spectral decomposition and for measuring purity of a state it is often sufficient.
A note for those that have guessed already, the linear entropy is to the von Neumann entropy what the total variational distance is to Kullback-Leibler divergence or the mean-variance approximation to the entropy function; simply a Taylor/Mercator series approximation.
Finally, we close this section with a question: Why is the outcome of the measurement of $\ket{\phi_1}$ under $M$ not itself a mixed state of the form \[\tag{measureMixed}\tilde \rho = \frac{1}{2} \ketbra{0}{0} + \frac{1}{2} \ketbra{1}{1}?\]
The Bloch Sphere
The Bloch sphere is mostly a reparametrization of a $2$-level quantum system, e.g., generated by the base $\ket{0}$ and $\ket{1}$ that allows for easy visualization. Note that every state in that system corresponds to two complex numbers defined by their respective real and imaginary part, hence $4$ reals. Now what we can do, is to reparametrize by fixing the global phase of the state (as the global phase is meaningless with regards to the measurement distribution) effectively eliminating one dimension and allowing for representation on a three-dimensional sphere: the Bloch sphere. I will keep things super compact here; the interested reader is referred to the wikipedia article for further reading.
The easiest way to convert the coordinates is by starting from the density matrix $\rho$. Then we obtain the Bloch sphere coordinates as follows:
\[\rho = \begin{pmatrix} \rho_{11} & \rho_{12} \\ \rho_{21} & \rho_{22} \end{pmatrix} \mapsto 2 \begin{pmatrix} \re(\rho_{21}) \\ \im(\rho_{21}) \\ \rho_{11} - \frac{1}{2} \end{pmatrix}.\]Note that since $\rho$ is Hemitian, we have that $\rho_{11} \in \RR$.
Figure 1. Bloch sphere. (left) layout of Bloch sphere (middle) orthogonal vectors are antiparallel on the Bloch sphere (right) pure states (example in green) have length $1$ and are on the surface, mixed states (example in orange) have length strictly less than $1$ and are interior points.
Tensoring up
So far we have only considered a single particle or unipartite system. As the saying goes: “You need two points of reference to measure distance or speed” and by the same token, once we go from unipartite systems to bipartite (or more generally multipartite) systems, things get significantly more interesting, by e.g., allowing for entanglement, which is key to quantum’s expressive power. Multipartite systems are simply obtained by taking the tensor product of multiple unipartite systems. More specifically, suppose we have multiple unipartite systems and their associated Hilbert spaces $\mathcal H_1, \dots, \mathcal H_\ell$, then the space of the composite system $\mathcal H$ is given by their tensor product:
\[\mathcal H \doteq \bigotimes_{i = 1}^{\ell} \mathcal H_i,\]and an element in $\mathcal H$ can be written as $\ket{q_1} \otimes \dots \otimes \ket{q_\ell}$; similarly we can consider the tensor of density matrices $\rho_1 \otimes \dots \otimes \rho_\ell$ to capture mixed states in compososite systems. For a quick refresher, the tensor product is basically like the outer product (i.e., we form tuples), however with the additional structural properties of ensuring homogeneity w.r.t. to addition and scalar multiplication; see wikipedia for a recap. This homogeneity basically determines also how linear maps act on the space. We recall the most important rules below; for simplicity we formulate them for the tensor product of two spaces $\mathcal H_1 \otimes \mathcal H_2$ but they hold more generally with the obvious generalizations:
Useful rules for tensor products.
a. (Linearity w.r.t. “+”): $(\ket{\phi} + \ket{\psi}) \otimes \ket{\kappa} = \ket{\phi} \otimes \ket{\kappa} + \ket{\psi} \otimes \ket{\kappa}$.
b. (Linearity w.r.t. “·”): for $s \in \CC$, we have $\ket{s \phi} \otimes \ket{\kappa} = s (\ket{\phi} \otimes \ket{\kappa}) = \ket{\phi} \otimes \ket{s \kappa}$.
c. (Tensor of linear maps): $(A \otimes B) (\ket{\phi} \otimes \ket{\kappa}) = A\ket{\phi} \otimes B \ket{\kappa}$.
d. (Linear maps as concatenation:) $A \otimes B = (A \otimes I) \circ (I \otimes B) = (I \otimes B) \circ (A \otimes I)$.
An important operator will be the partial trace, which basically applies the trace operator to only some subset of tensor components. Skipping the formalism (see wikipedia for details), the partial trace w.r.t to $\mathcal H_1$ (in short: \(\ptr{\mathcal H_1}\)) is the unique linear operator such that for any two matrices $A: \mathcal H_1 \rightarrow \mathcal H_1$ and $B: \mathcal H_2 \rightarrow \mathcal H_2$ it holds
\[\ptr{\mathcal H_1} (A \otimes B) = \tr(A) B.\]This gives rise to the partial trace on any element $M \in \mathcal H_1 \otimes \mathcal H_2$. Computationally, the partial trace can be implemented by taking partial sums of coefficients along diagonals and it does not require an explicit (potentially non-existent) decomposition $M = A \otimes B$; see wikipedia for an explanation.
Now consider a density matrix $\rho$ on $\mathcal H_1 \otimes \mathcal H_2$. The partial trace of $\rho$ w.r.t. $\mathcal H_2$ denoted by $\rho_1$ is given by $\rho_1 \doteq \ptr{\mathcal H_2} (\rho)$ and $\rho_1$ is called the reduced density matrix of $\rho$ on system $\mathcal H_1$. This process is also referred as “tracing out” (or averaging out) $\mathcal H_2$. The tracing out basically captures the situation, where we have a composite system however we are unaware of it, e.g., we only know about $\mathcal H_1$ but not $\mathcal H_2$. If now $M$ is a measurement on $\mathcal H_1$, then we essentially measure on composite system with $M \otimes I$ and it holds with the above that
\[\tr(M \rho_1) = \tr((M \otimes I) \rho).\]In this sense $\rho_1$ is the “right state” as it generates the same measurement statistics on $\mathcal H_1$ as $\rho$ does on $\mathcal H_2$ provided we measure only the $\mathcal H_1$ part, i.e., we measure with matrices of the form $M \otimes I$.
For the sake of brevity, in the following we will often write $\ket{0}\ket{0}$ as a shorthand for \(\ket{0}_1 \otimes \ket{0}_2\), when the spaces etc are clear from the context; the same applies to multipartite systems.
Entanglement
Finally, we come to entanglement, this obscure term that makes quantum mechanics and quantum computing so special. In the following we will (mostly) consider bipartite systems $\mathcal H_1 \otimes \mathcal H_2$, each generated by the basis $\ket{0}$ and $\ket{1}$, to simplify the exposition but everything holds also for arbitrary multipartite systems. Let us consider the following state (which is also referred to as a Bell state)
\[\ket{\phi} = \frac{1}{\sqrt{2}} (\ket{0}\otimes \ket{0} + \ket{1}\otimes \ket{1}).\]Let us start with a few simple observations: the density matrix of $\ket{\phi}$ is given by:
\[\rho = \ketbra{\phi}{\phi} = \begin{pmatrix} 1/2 & 0 & 0 & 1/2 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1/2 & 0 & 0 & 1/2 \end{pmatrix},\]and moreover, we have
\[S_L(\rho) = 1 - \tr(\rho^2) = 0,\]i.e., $\rho$ is a pure state in the bipartite system. Now consider the measurement consisting of the two projective matrices $\ketbra{0}{0} \otimes I$ and $\ketbra{1}{1} \otimes I$. In the first case we end up with the post-measurement state
\[(\ketbra{0}{0} \otimes I) \ket{\phi} / \norm{(\ketbra{0}{0} \otimes I) \ket{\phi}} = \ket{0} \otimes \ket{0},\]and in the second case we end up with
\[(\ketbra{1}{1} \otimes I) \ket{\phi} / \norm{(\ketbra{1}{1} \otimes I) \ket{\phi}} = \ket{1} \otimes \ket{1},\]i.e., when measuring the first component of the bipartite system this might also collapse the second component and here via the entanglement the two components are forced to be the same. On the other hand if we would consider an alternative state (which is not entangled as we will see soon)
\[\ket{\mu} = \left(\frac{1}{\sqrt{2}} (\ket{0} + \ket{1})\right) \otimes \left(\frac{1}{\sqrt{2}} (\ket{0} + \ket{1})\right),\]and apply the same measurement we would obtain the post-measurement states
\[(\ketbra{0}{0} \otimes I) \ket{\mu} / \norm{(\ketbra{0}{0} \otimes I) \ket{\mu}} = \ket{0} \otimes 1/\sqrt{2} (\ket{0} + \ket{1}),\]and
\[(\ketbra{1}{1} \otimes I) \ket{\mu} / \norm{(\ketbra{1}{1} \otimes I) \ket{\mu}} = \ket{1} \otimes 1/\sqrt{2} (\ket{0} + \ket{1}),\]i.e., in this case the second component is “undisturbed” by the measurement on the first component.
Now let us ask a seemingly innocent question: Can we write $\ket{\phi} = \ket{\psi} \otimes \ket{\kappa}$ with $\ket{\psi} \in \mathcal H_1$ and $\ket{\kappa} \in \mathcal H_2$? To this end, let us express
\[\ket{\psi} = \alpha_0 \ket{0} + \alpha_1 \ket{1} \qquad \text{and} \qquad \ket{\kappa} = \beta_0 \ket{0} + \beta_1 \ket{1}\]and do some basic linear algebra transformations
\[\begin{align*} \ket{\psi} \otimes \ket{\kappa} & = (\alpha_0 \ket{0} + \alpha_1 \ket{1}) \otimes (\beta_0 \ket{0} + \beta_1 \ket{1}) \\ & = \alpha_0 \beta_0 \ket{0} \otimes \ket{0} + \alpha_1 \beta_0 \ket{1} \otimes \ket{0} + \alpha_0 \beta_1 \ket{0} \otimes \ket{1} + \alpha_1 \beta_1 \ket{1} \otimes \ket{1}. \end{align*}\]Thus the coefficients have to be in product form, in order to express $\ket{\phi} = \ket{\psi} \otimes \ket{\kappa}$. This however is not the case for $\ket{\phi}$.
Definition: Separable and entangled state. A state $\ket{\phi}$ is called separable if it can be written as $\ket{\phi} = \ket{\psi} \otimes \ket{\kappa}$ with $\ket{\psi} \in \mathcal H_1$ and $\ket{\kappa} \in \mathcal H_2$. A state that is not separable is called entangled. The same definition extends to density matrices covering the mixed state case.
Note that a priori this has nothing to do with pure vs. mixed states and in fact all four combinations are possible: entangled-pure, unentangled-pure, entangled-mixed, and unentangled-mixed.
In particular, the above suggests that while $\ket{\phi}$ is a pure state in the composite system there are no pure states in $\mathcal H_1$ and $\mathcal H_2$ that capture the individual components. This becomes evident when we trace out $\mathcal H_2$ and obtain $\rho_1$ with the reduced density matrix
\[\rho_1 = \begin{pmatrix} 1/2 & 0 \\ 0 & 1/2 \end{pmatrix} = \frac{1}{2} \ketbra{0}{0} + \frac{1}{2} \ketbra{1}{1},\]i.e., a mixed state. Note that $\rho_1$ has maximum linear and von Neumann entropy.
This is a good time to revisit our question (measureMixed) from above. We asked: Why is the outcome of the measurement […] not itself a mixed state of the form \(\frac{1}{2} \ketbra{0}{0} + \frac{1}{2} \ketbra{1}{1}?\)
The reason for this is a little subtle: In the case of (measureMixed), after the measured it is decided in which state we are in and hence it is not a probability distribution but a state (which arises from some probability distribution). On the other hand, when tracing out above $\mathcal H_2$ we are left with (statistical) uncertainty about the part of the state in $\mathcal H_1$ and we must explicitly account for this uncertainty, which is precisely what the reduced density matrix after tracing out does. This is closely related to the totalitarian principle in quantum mechanics which states “Everything not forbidden is compulsory.” Wikipedia explains this quite aptly:
The statement is in reference to a surprising feature of particle interactions: that any interaction that is not forbidden by a small number of simple conservation laws is not only allowed, but must be included in the sum over all “paths” that contribute to the outcome of the interaction. Hence if it is not forbidden, there is some probability amplitude for it to happen.
In some sense the totalitarian principle is the analog of the maximum entropy principle. In general, tracing out and/or measuring turns quantum mechanical uncertainty and quantum correlations (e.g., arising via entanglement) into statistical uncertainty.
How do we know whether a state is entangled?
In fact the above is no coincidence. A pure state $\ket{\phi}$ in the bipartite system $\mathcal H_1 \otimes \mathcal H_2$ is entangled if and only if the reduced density matrix $\rho_1$ is a mixed state if and only if the von Neumann entropy $S(\rho_1)$ of the reduced density matrix $\rho_1$ is non-zero. In fact $S(\rho_1) = S(\rho_2)$, so that it does not matter which one of the two reduced density matrices we are using. This entropy is also referred to as the entropy of the entanglement and if the entropy of the entanglement is maximal, we say the states are maximally entangled. In this case the reduced density matrix is also a diagonal matrix and by the fact that we compute the entropy of the reduced density matrices, it also implies that $\rho_1$ and $\rho_2$ are maximally mixed in this case.
It is tempting to generalize this to the mixed state case, however this is not easily possible. In fact, already deciding whether a mixed state in a bipartite system is entangled or not is NP-hard by a reduction from KNAPSACK as shown in a relatively recent result [G03]. In fact, for a mixed state in a bipartite system the entanglement entropy is no longer a measure of entanglement. As always check out wikipedia for some background reading.
Bell’s theorem
We will finish this first post with a first fascinating result that demonstrates that there is something special happening when using entanglement: Bell’s theorem. Being an umbrella for several different related insights and results and subject to various interpretations I will completely skip the physical side of things; see [P21] for a more in-depth treatment or as usual wikipedia is a great starting point. In a nutshell Bell’s theorem demonstrates that quantum mechanics/computing, can violate classical probability theory. The argument from below is a later example from [NC02], which is more accessible than Bell’s original argument [B64].
Our setup is as follows. We have three parties: Alice, Bob, and Cliff. Alice and Bob are spatially very far away from each other. Both Alice and Bob each have two binary measurements. Alice has $A_0$ that measures some property $a_0$ and $A_1$ that measures some property $a_1$. Similarly for Bob $B_0$ measures $b_0$ and $B_1$ measures $b_1$. The measurements output $\pm 1$ with $1$ if that particle that is measured carried the property and $-1$ if the property is absent; slightly abusing notation, let $a_0, a_1, b_0, b_1$ denote also the outcome of the measurement with the respective measure, which is ok as they are in one-to-one correspondence with the actual property.
Now Cliff prepares a pair of particles and sends particle $1$ to Alice and particle $2$ to Bob. Upon receiving their particles, each Alice and Bob pick one of their two measurements at random, e.g., by flipping a coin and measure their particle. By doing so we obtain $4$ measurement combinations and we consider the following linear combination (note the minus sign for the last summand):
\[a_0 b_0 + a_1 b_0 + a_0 b_1 - a_1 b_1 = a_0 (b_0 + b_1) + a_1 (b_0 - b_1).\]Now since the outcomes of the measurements are $\pm 1$ either $b_0 = b_1$ and then the second term on the right-hand side vanishes or $b_0 = -b_1$ and then the first term on the right-hand-side vanishes; in either case the remaining term in brackets is then equal to $2$, so that the right-hand side becomes $\pm 2$ and we obtain the valid inequality:
\[a_0 b_0 + a_1 b_0 + a_0 b_1 - a_1 b_1 \leq 2.\]Observe that the left-hand side cannot be measured with a single measurement as Alice and Bob have to pick one measurement each in a given trial. However, if we perform a large number of experiments (each time Cliff preparing a new state) then we also have
\[\mathbb E [a_0 b_0 + a_1 b_0 + a_0 b_1 - a_1 b_1] \leq 2,\]where $\mathbb E$ denotes the expectation and by linearity of expectation it follows:
\[\tag{CHSH} \mathbb E [a_0 b_0 ] + \mathbb E[a_1 b_0] + \mathbb E[a_0 b_1] - \mathbb E[a_1 b_1] \leq 2.\]This inequality is a so-called Bell inequality (one of many) and specifically the CHSH inequality; we will discuss these and the geometric properties etc in the next post.
Note that the argument above relies on two key assumptions (a) Realism: the properties of the particles exist irrespectively of whether they are observed/measured or not, this is referred to as realism, and (b) Locality: Alice’s choice of a measurement cannot influence Bob’s result and vice versa, which is often referred to as locality, i.e., if far enough away they do not interact/interfere with each other.
And now we will show that quantum mechanics can break this. We let Cliff prepare a bipartite quantum state of the form:
\[\ket{\phi} \doteq \frac{1}{\sqrt{2}} (\ket{0}\ket{1} - \ket{1}\ket{0})\]and then send one of the qubits to Alice and the other to Bob. Note that this is a pure state. Next we define Alice’s observables:
\[A_0 \doteq \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} \qquad \text{and} \qquad A_1 \doteq \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}\]and Bob’s observables:
\[B_0 \doteq \frac{1}{\sqrt{2}} (-A_1 - A_0) \qquad \text{and} \qquad B_1 \doteq \frac{1}{\sqrt{2}} (A_1 - A_0).\]It is easy to see that $A_0, A_1, B_0$, and $B_1$ have eigenvalues $\pm 1$ and as such are the measurement outcomes. Let Alice and Bob pick their measurements uniformly at random. We then obtain the measurement outcomes
\[\begin{align*} \tr(A_0 \otimes B_0 \ketbra{\phi}{\phi}) = \frac{1}{\sqrt{2}} \qquad & \tr(A_0 \otimes B_1 \ketbra{\phi}{\phi}) = \frac{1}{\sqrt{2}} \\ \tr(A_1 \otimes B_0 \ketbra{\phi}{\phi}) = \frac{1}{\sqrt{2}} \qquad & \tr(A_1 \otimes B_1 \ketbra{\phi}{\phi}) = - \frac{1}{\sqrt{2}}, \end{align*}\]and in particular:
\[\tr(A_0 \otimes B_0 \ketbra{\phi}{\phi}) + \tr(A_0 \otimes B_1 \ketbra{\phi}{\phi}) + \tr(A_1 \otimes B_0 \ketbra{\phi}{\phi}) - \tr(A_1 \otimes B_1 \ketbra{\phi}{\phi}) = 2 \sqrt{2},\]which violates (CHSH). One might wonder, where the specific observables come from and this will also be subject to the next post. For now however, observe that as the trace is linear we can combine the above observables into one:
\[\begin{align*} A_0 \otimes B_0 + A_0 \otimes B_1 + A_1 \otimes B_0 - A_1 \otimes B_1 & = A_0 \otimes (B_0 + B_1) + A_1 \otimes (B_0 - B_1) \\ & = \sqrt{2} \begin{pmatrix} -1 & 0 & 0 & -1 \\ 0 & 1 & -1 & 0 \\ 0 & -1 & 1 & 0 \\ -1 & 0 & 0 & -1 \end{pmatrix}. \end{align*}\]Note that so far we have not talk yet about any operations that we can perform on a state in order to perform computations. This will also be subject to another post soon.
Acknowledgement
I would like to thank Omid Nohadani for the helpful discussions and clarifications of the physics perspective of things.
References
[M07] Mermin, N. D. (2007). Quantum computer science: an introduction. Cambridge University Press.
[dW19] De Wolf, R. (2019). Quantum computing: Lecture notes. arXiv preprint arXiv:1907.09415. pdf
[B14] Ballentine, L. E. (2014). Quantum mechanics: a modern development. World Scientific Publishing Company.
[P21] Preskill, J. (2021). Physics 219/Computer Science 219: Quantum Computation. web
[G03] Gurvits, L. (2003, June). Classical deterministic complexity of Edmonds’ problem and quantum entanglement. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing (pp. 10-19). pdf
[B64] Bell, J. S. (1964). On the einstein podolsky rosen paradox. Physics Physique Fizika, 1(3), 195. pdf
[NC02] Nielsen, M. A., & Chuang, I. (2002). Quantum computation and quantum information. pdf
Changelog
05/09/2022: Fixed several typos as pointed out by Zev Woodstock and Berkant Turan.
06/13/2022: Fixed several typos as pointed out by Felipe Serrano.