A short intro to Special Relativity

TL;DR: This is a brief intro to special relativity for non-physicists.

Introduction

Special relativity, and more broadly relativity theory, is one of those fascinating topics that many people wish to grasp at least once in their lives, particularly its fundamental principles and their derivation. In this post, much like my previous one on quantum computing, I aim to distinguish the mathematical foundations from the physical interpretations. This approach will help us understand what is purely mathematical and what is influenced by physical concepts, which are often open to interpretation. Finally, we will try to do so by arguing as much as possible from first principles.

Special relativity, introduced by Albert Einstein in 1905, transformed our understanding of space and time. This revolutionary theory built upon the work of earlier physicists like Galileo and Newton, introducing the profound idea that the laws of physics are identical for all non-accelerating observers and that the speed of light in a vacuum is constant, regardless of the motion of the light source or observer. Unlike general relativity, which Einstein published in 1915 and which addresses gravity and acceleration by describing how massive objects curve spacetime, special relativity focuses on objects moving at constant speeds in straight lines (inertial frames of reference). In this post, we will concentrate on the principles of special relativity.

I will try to keep things as simple as possible and focus on the core ideas and concepts. The main goal is to derive key implications of special relativity from the notion of an invariant interval which can be considered a generalization of the Pythagorean theorem to four dimensions, however with the important twist of a subtle change in sign, as the underlying geometry is hyperbolic rather than Euclidean (as in the case of the Pythagorean theorem); we will soon see what this means.

Some basics

Inertial frame of reference. An inertial frame of reference is a perspective in which an object is either at rest or moving at a constant velocity. In such a frame, the laws of physics take their simplest form, and there are no external forces causing acceleration.

If not stated otherwise, when we talk about an observer, we mean an inertial observer, i.e., an observer at rest or moving at a constant velocity.

Note. There is no single, absolute inertial frame of reference. Each observer can choose their own frame of reference, and the laws of physics will appear the same in all inertial frames. This means there is no concept of absolute rest or absolute motion; all motion is relative to the observer’s frame of reference. This symmetry implies that if frame A is moving at speed $v$ relative to frame B, then frame B is also moving at speed $v$ relative to frame A (albeit in opposite direction), and the laws of physics will be identical in both frames. This is known as the principle of relativity.

The postulates of special relativity

The laws of physics are the same for all inertial frames of reference.
The speed of light in a vacuum is constant, ~~approximately~~ exactly 299,792,458 meters per second (I just learned that the speed of light now defines $1$m and not vice versa), and is independent of the motion of the light source or observer.

In fact, for deriving special relativity one only needs the first postulate, which implies that the speed of light $c$ is constant; basically via the associated invariances of the Lorentz transformation. The actual value $c$ then is what the second postulate really brings to the table.

Figure 1. Completely useless AI picture of the effects of relativity theory.

Lorentz transformation

In a nutshell, special relativity turns “time” into another dimension, so that together with “space” they form a new mathematical space called spacetime, which has four dimensions $(t, x, y, z)$ and follows a hyperbolic geometry. The Lorentz transformation is a set of linear equations that relate the space and time coordinates of two observers $(t, x, y, z)$ and $(t', x', y', z')$ moving at a constant velocity relative to each other. It ensures that the speed of light remains constant in all inertial frames of reference. The Lorentz transformation equations for an observer moving with velocity $v$ along the $x$-axis are given by:

\[\begin{align*} t' &= \gamma \left( t - \frac{vx}{c^2} \right), \\ x' &= \gamma (x - vt), \\ y' &= y, \\ z' &= z, \end{align*}\]

where $\gamma = \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}}$ is the Lorentz scalar. The inverse Lorentz transformation, which converts the coordinates from the moving frame back to the stationary frame, is given by (via obvious rearranging):

\[\begin{align*} t &= \gamma \left( t' + \frac{vx'}{c^2} \right), \\ x &= \gamma (x' + vt'), \\ y &= y', \\ z &= z'. \end{align*}\]

Remark. Here we use the simplified form as customary, where we only consider Lorentz boosts (i.e., no rotations) with movement along the $x$-axis only. This is without loss of generality with the usual generalizations. In the following, we will also often use this simplifcation, for ease of exposition.

Remark. We include the Lorentz transformation here for completeness, but we will avoid it in the derivations below. Instead, we will primarily rely on the concept of the invariant interval in our subsequent exposition.

Minkowski Spacetime Diagram

A Minkowski spacetime diagram is a graphical representation of the four-dimensional spacetime used in special relativity, helping visualize how different observers perceive the timing and positioning of events by typically having time (denoted as $ct$, where $c$ is the speed of light and $t$ is time) on the vertical axis and space (denoted as $x$) on the horizontal axis, thus reducing the four-dimensional spacetime to two dimensions, $ct$ and $x$. Each point on the diagram represents an event with a specific position in space and a specific moment in time.

The lines in our diagrams in the following have specific meanings:

Worldline (red): The path that an object takes through spacetime. For an object at rest, the worldline is a vertical line, as it moves through time but not space. For an object moving at a constant velocity, the worldline is a straight line with a slope determined by its speed.
Light cone (dashed lines): The set of all possible light paths that pass through a given event. It divides the diagram into regions that are causally connected to the event and those that are not. The light cone consists of two lines at 45-degree angles (since light travels at the same speed in all directions).
Timeline (green): I also added a “timeline”. This line (which can be tilted when the reference frame is moving) and its parallel lines connect simultaneous events, i.e., events that happen at the same time in the respective frame of reference.
Space-like and Time-like separations: The regions of the diagram where the interval is space-like or time-like are indicated by different colors; we will discuss what this means in more detail soon.

Our Minkowski diagrams visualize the original frame of reference (the stationary frame) in a grey grid and another moving frame of reference (the moving frame, which we also call the primed frame) in a dotted blue grid. The standard Minkowski diagram (both frames are at rest) is shown in Figure 2 and a configuration with a relative velocity of $v = 0.5c$ between the two frames is shown in Figure 3.

Figure 2. Standard Minkowski diagram. Here both the stationary frame (grey) and the moving frame (blue) coincide as the velocity $v$ is zero.

Figure 3. Minkowski diagram with moving frame at velocity $v = 0.5c$. We can see how the spacetime coordinates of the moving frame are “rotated” and “skewed” relative to the stationary frame. In particular, we can see the elongation of the segments on the light cone line on the top-right and the shortening of the segments on the light cone on the top-left.

Invariant Interval

Rather than trying to understand the Lorentz transformation, we will take a different route in this blog post, using the invariant interval between two events. The invariant interval is a fundamental concept in relativity and is a quantity that remains unchanged under Lorentz transformations (the interested reader might verify this by straightforward computation). It provides us with a way to compare (deltas of) positions and (deltas of) times between different observers. As mentioned earlier, this will be sufficient to explain most effects. One of the key advantages is that $v$ does not need to be specified but rather it will follow from the geometry of spacetime.

The (invariant) interval $\Delta s$ between two events, which is defined as

\[\Delta s^2 \doteq c^2 (t_1 - t_0)^2 - (x_1 - x_0)^2 - (y_1 - y_0)^2 - (z_1 - z_0)^2.\]

where $(t_0, x_0, y_0, z_0)$ and $(t_1, x_1, y_1, z_1)$ are the time and space coordinates of the two events, and $c$ is the speed of light. Equivalently and more concisely with $\Delta x = x_1 - x_0$, $\Delta y = y_1 - y_0$, $\Delta z = z_1 - z_0$, and $\Delta t = t_1 - t_0$:

\[\tag{InvariantInterval} \Delta s^2 \doteq c^2 \Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2.\]

The value $\Delta s^2$ does not depend on the observer’s motion and/or the inertial frame of reference, i.e., it is an invariant. In particular, when switching to a different inertial frame of reference the value $\Delta s^2$ remains the same, and only the space and time coordinates change. Before we look at properties of the invariant interval the following remark might be helpful.

Remark. The minus signs in the invariant interval are crucial and lead to a hyperbolic geometry rather than a Euclidean geometry, which is the foundation of special relativity. Moreover, it changes the interpretation compared to the standard “Pythagorean” Theorem $\Delta s^2 = \Delta t^2 + \Delta x^2 + \Delta y^2 + \Delta z^2$: suppose I fix the left-hand side value to a constant, then increasing, e.g., $\Delta t$ means decreasing $\Delta x$, $\Delta y$, or $\Delta z$ to keep the left-hand side fixed, i.e., we would be trading time for space: the obtained curve for a fixed $\Delta s^2$ is an ellipsoid. However, in the new geometry induced by the Lorentz transformation and the associated invariant interval this is no longer the case. While this tradeoff is still true between the space-coordinates $\Delta x$, $\Delta y$, and $\Delta z$, between space and time we have no longer a tradeoff, but rather if $\Delta s^2$ is fixed, increasing the time delta requires increasing the space deltas proportionally with factor $c$ “expanding space with time”: the obtained curve for a fixed $\Delta s^2$ is a hyperbola. With our simplification from before, assuming motion along the $x$-axis only and hence $\Delta y = 0$ and $\Delta z = 0$, we thus have $\Delta s^2 = c^2 \Delta t^2 - \Delta x^2$ for fixed $\Delta s^2$, which is a hyperbola vs. $\Delta s^2 = c^2 \Delta t^2 + \Delta x^2$ for fixed $\Delta s^2$ being an ellipse. The following Figure 4 illustrates the difference between the two geometries.

Figure 4. Hyperbola for fixed $\Delta s^2$ (expanding space with time) vs. ellipse for fixed $\Delta s^2$ (trading space for time)

Working with the invariant interval is often more convenient than working with the Lorentz transformation, while still being able to derive most (all?) key insights of special relativity from first principles. In particular, we do not explicitly have to choose frames of references or velocities, etc and we directly work with spacetime coordinates and deltas and then exploit the invariance of the interval.

Properties of the invariant interval

So now let us consider some properties of the invariant interval. First of all, observe that $\Delta s^2$ can be negative, zero, or positive, each of which gives rise to different physical interpretations.

Time-like interval/separation: $\Delta s^2 > 0$. In this case the events are separated by more time than space. In particular, we have $\Delta s^2 = c^2 \Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2 > 0$ which implies $\Delta t > \frac{1}{c} \sqrt{\Delta x^2 + \Delta y^2 + \Delta z^2}$ i.e., the two events are separated by more time than the time it takes light to travel the spatial distance between the two events. Exploiting the invariance of the interval we can set $\Delta x = \Delta y = \Delta z = 0$ by choosing a different frame of reference and we obtain $\Delta t'$ in that frame of reference with

\[c^2 \Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2 = \Delta s^2 = c^2 \Delta t'^2,\]

which implies $\Delta t' = \frac{\Delta s}{c}$ i.e., in the new frame of reference the two events happen in the same spatial location but at different times. Observe that $\Delta t'$ is minimal in this case and the time measured by a clock at rest and for any other frame of reference (with potentially moving clock) the time will be larger, i.e., $\Delta t \geq \Delta t'$ as $\Delta s^2$ is invariant. It is also, somewhat counterintuitively, the maximum possible time separation for any observer, as the observer’s clock could be running slower when at speed. This time $\Delta t'$ is achieved when the two events are at the same spatial location in space, and is called the proper time $\tau$ of the two events, i.e., $\tau = \Delta 't = \sqrt{\frac{\Delta s^2}{c^2}} = \frac{\Delta s}{c}$; in contrast to coordinate time $t$ as measured by the observer’s clock.

Light-like interval/separation: $\Delta s^2 = 0$. In this case the events are separated by a distance that light could travel in the given time. In particular, we have $\Delta s^2 = c^2 \Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2 = 0$ which implies $\Delta t = \frac{1}{c} \sqrt{\Delta x^2 + \Delta y^2 + \Delta z^2}$ i.e., the two events are separated by a distance that light could travel in the given time, and moreover

\[c = \sqrt{\frac{\Delta x^2 + \Delta y^2 + \Delta z^2}{\Delta t^2}},\]

i.e., the speed of light is the same in all inertial frames of reference.

Space-like interval/separation: $\Delta s^2 < 0$. In this case the events are separated by more space than time. In particular, we have $\Delta s^2 = c^2 \Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2 < 0$ which implies $c\Delta t < \sqrt{\Delta x^2 + \Delta y^2 + \Delta z^2}$ i.e., the two events are separated by more space than the space light can travel in the time between the two events. Similar to the time-like case we can pick a different frame of reference, where $\Delta t = 0$ to obtain $\Delta x'$, $\Delta y'$, and $\Delta z'$, in that frame of reference with

\[c^2 \Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2 = \Delta s^2 = - \Delta x'^2 - \Delta y'^2 - \Delta z'^2,\]

which implies $\sqrt{\Delta x'^2 + \Delta y'^2 + \Delta z'^2} = \sqrt{-\Delta s^2}$ i.e., in the new frame of reference the two events happen simultaneously albeit in different places. This is the minimum possible spatial separation for any observer and is achieved when the two events occur at the same time in space, and is called the proper length $L$ of the two events, i.e., $L = \sqrt{-\Delta s^2}$.

Causality. The invariant interval also allows us to determine whether events could be causally related.

If two events are separated by a time-like interval, there exists a frame of reference in which $\Delta x = \Delta y = \Delta z = 0$ and $\tau > 0$. And in particular, in all frames of reference $\Delta t > 0$, so that there is agreement among all observers in terms of order of events.

If, however, the events are space-like separated, no such relationship exists, as not all observers will agree on the order of the events: as we have seen in this case $\Delta s^2 < 0$ and we can always pick a frame of reference where $\Delta t = 0$, so that the events happen simultaneously but in different places. This however leads to the relativity of simultaneity which postulates that whether two spatially separated events occur simultaneously is not absolute but depends on the observer’s frame of reference. The following Figure 5 shows the relativity of simultaneity.

Figure 5. Relativity of Simultaneity.

Note, that this is the case for all space-like separated events. In particular, to assess causality, we would need to transform the events coordinates into a frame of reference where the events are in the same position however this is not possible as $\Delta s^2 < 0$.

First-order consequences: Time Dilation and Length Contraction

For simplicity, let us assume moving along the $x$-axis only in the following.

Time dilation. From the invariant interval we can derive the time dilation effect as follows. Suppose we have a time-like interval. Then its proper time $\tau$ is given by

\[\begin{align*} \tau = \frac{\Delta s}{c} &= \frac{1}{c} \sqrt{c^2\Delta t^2 - \Delta x^2} \\ & = \Delta t \sqrt{1-\frac{1}{c^2}\frac{\Delta x^2}{\Delta t^2}} \\ & = \Delta t \sqrt{1-\frac{v^2}{c^2}} = \frac{\Delta t}{\gamma},\\ \end{align*}\]

where $\gamma = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}$ is the Lorentz scalar and $\Delta x = v \Delta t$ or equivalently $v = \frac{\Delta x}{\Delta t}$. As $\sqrt{1-\frac{v^2}{c^2}} < 1$ for $v < c$, we see that $\tau \leq \Delta t$ i.e., the proper time (the time an observer in a reference frame experiences) is always smaller than the coordinate time. This effect is known as time dilation.

Length contraction. Length contraction is the phenomenon that the length of an object is different for different observers. This effect is the symmetric version of time dilation but in space now and we use different events to derive it. To this end suppose we have a rod of length $L$ at rest in the unprimed frame $S$ and we see an observer in the primed frame $S'$ speeding past us. Let both origins of $S$ and $S'$ coincide when the observer in the primed frame passes the first end of the rod, so the first event is $(0,0)$ in both frames.

In the unprimed frame, the far end of the rod is at $x = L$ and we see that the speeding observer passes by it at $t = \frac{L}{v}$ and so his second event has spacetime coordinates $(\frac{L}{v}, L)$ and so the (invariant) interval is:

\[\Delta s^2 = c^2 \left(\frac{L}{v}\right)^2 - L^2\]

In the primed frame, the stationary observer (in that frame) sees the rod of some (to be determined) length $L'$ approaching him with speed $v$. The $x'$ coordinate of both events is zero (he is just waiting things out while the rod is passing by), and the time of the second event is $t' = \frac{L'}{v}$. So the associated (invariant) interval is:

\[\Delta s'^2 = c^2 \left(\frac{L'}{v}\right)^2\]

As both intervals are the same, i.e., $\Delta s^2 = \Delta s'^2$, it follows that

\[c^2 \left(\frac{L'}{v}\right)^2 = c^2 \left(\frac{L}{v}\right)^2 - L^2\]

and hence, via rearranging, we obtain

\[L' = L \sqrt{1 - \frac{v^2}{c^2}} = \frac{L}{\gamma}.\]

Alternatively, we can derive the length contraction from the time dilation effect. With the same setup from above, the stationary observer measures $L' = v \Delta t$ and $L = v \Delta t'$ and hence

\[L' = L \frac{\Delta t'}{\Delta t} = \frac{L}{\gamma},\]

with $\Delta t' = \frac{\Delta t}{\gamma}$ from time dilation.

Reciprocity of time dilation and length contraction. What is important to understand here is that “who is moving” and “who is at rest” is relative and depends on the observer’s frame of reference, i.e., there is a symmetry between the two observers, and relative to their own frame of reference each observer will always find the other observer to experience time dilation and length contraction.

Note. Both time dilation and length contraction are at the core of the twin paradox as well as the interstellar travel question that we will address next.

Interstellar Travel

As the speed of light is the “absolute speed limit”, one may naively think that one could travel no further than say $100$ years times the speed of light and hence really exploring space is not possible. However, special relativity has a surprising consequence for interstellar travel: Assuming that we can travel close to the speed of light (without exceeding it), vast distances can be covered within one’s own lifetime. Note that the assumption of close-to-light-speed travel while not in conflict with current physics is still not possible with our current technology and there are apparently many challenges to make this possible. Anyways, suppose we can travel close to the speed of light. Our starting point is the invariant interval and for simplicity let us again consider the case of traveling along the $x$-axis only. Then we have

\[\Delta s^2 = c^2 \Delta t^2 - \Delta x^2.\]

Now suppose we can travel at a constant speed $v$ for a time $\Delta t$ and a distance $\Delta x = v \Delta t$ in space. This is a time-like interval due to our assumption that $v < c$ and, as before, the associated proper time $\tau$ (as experienced by the traveler) is given by

which is exactly the time dilation effect with the Lorentz scalar $\gamma = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}$ we derived before. If now assume $v = (1-\varepsilon)c$, then equation above simplifies to

\[\tau = \Delta t \sqrt{1- (1-\varepsilon)^2} = \Delta t \sqrt{2\varepsilon - \varepsilon^2}.\]

Similarly, the distance covered from Earth if the traveler is moving with speed $v = (1-\varepsilon) c$ for proper time $\tau$ is given by

\[\Delta x = v \Delta t = (1-\varepsilon)c \Delta t = \frac{1-\varepsilon}{\sqrt{2\varepsilon - \varepsilon^2}} c \tau,\]

which is equivalent to

\[c \tau = \Delta x \frac{\sqrt{2\varepsilon - \varepsilon^2}}{1-\varepsilon},\]

which is again the length contraction effect we have seen before; it is as if the traveler would have been traveling faster than light in effect, with a “pseudo-speed” of $\frac{1-\varepsilon}{\sqrt{2\varepsilon - \varepsilon^2}} c$ which is larger than the speed of light; see also these two Wikipedia articles here and here for more on this.

The following graphics and table shows the time dilation effect for different values of $\varepsilon$ and the corresponding time elapsed on Earth $\Delta t$ as well as distance covered within $1$ traveler year (ty).

$\varepsilon$	$\Delta t$ (years) on Earth	$\Delta x$ (light-years) within $1$ ty
0.5	1.1547	0.5774
0.2	1.6667	1.3333
0.1	2.2942	2.0648
0.05	3.2016	3.0416
0.001	22.3607	22.3383
0.0001	70.7107	70.7036

Figure 6. The effect of time dilation on interstellar travel, showing how close-to-light-speed travel allows vast distances to be covered within a traveler’s lifetime. Traveler is moving with speed $v = (1-\varepsilon)c$ for $1$ traveler year (ty), i.e., in his own time.

Twin Paradox

From our interstellar travel example, we relatively easily end up with the twin paradox: dude’s gotta get home. Again, the interval invariance saves the day. So here is the setup:

Let us consider two twins, Alice and Bob. Alice travels at a constant speed $v = \frac{\sqrt{3}}{2} c \approx 0.866025 c$ (chosen so that the time dilation factor is $\gamma = 2$) from Earth to a distant star for $5$ years and back for another $5$ years. Bob stays on Earth. We denote the respective events in spacetime coordinates as $(t,x)$.

Let us first consider Bob’s “trip”.

Bob’s initial event is $(0,0)$.
Bob waits for $5$ years until Alice starts her return journey. The event is $(5,0)$.
Bob waits for another $5$ years until Alice returns to Earth. The event is $(10,0)$.

The associated invariant intervals are $s_1^2 = c^2 5^2 = 25 c^2$ and $s_2^2 = c^2 5^2 = 25 c^2$ and hence the proper time for Bob is

\[\tau_B = \frac{s_1}{c} + \frac{s_2}{c} = 10 \text{ years}.\]

Now consider Alice’s trip.

Alice’s initial event is $(0,0)$.
Alice travels to the star. The event at arrival is $(5,5 \cdot \frac{\sqrt{3}}{2} c)$.
Alice travels back to Earth. The event is $(10, 0)$.

However, the interval arithmetic is quite different now:

\[s_1^2 = c^2 5^2 - (5 \cdot \frac{\sqrt{3}}{2} c)^2 = 25 c^2 - 25 c^2 \cdot 0.75 = 25 c^2 (1 - 0.75) = 6.25 c^2\]

and

\[s_2^2 = c^2 5^2 - (5 \cdot \frac{\sqrt{3}}{2} c)^2 = 25 c^2 - 25 c^2 \cdot 0.75 = 25 c^2 (1 - 0.75) = 6.25 c^2\]

hence the proper time for Alice is

\[\tau_A = \frac{s_1}{c} + \frac{s_2}{c} = \sqrt{6.25} + \sqrt{6.25} = 5 \sqrt{1 - 0.75} + 5 \sqrt{1 - 0.75} = 5 \cdot 0.5 + 5 \cdot 0.5 = 5 \text{ years}.\]

Thus Alice has aged only $5$ years while Bob has aged $10$ years. So how is this possible in light of the symmetry between Alice and Bob that we claimed before? The resolution is that on the outbound journey of Alice, both Alice and Bob are symmetric, but when Alice turns back, she undergoes acceleration, which breaks the symmetry and the reciprocity of time dilation and length contraction; check out [B] for a great explanation.

Practical Speeds and Implications

Before switching gears, let us make a few more remarks on the implications of time dilation and length contraction for practical speeds that we typically observe. Also, note that we only consider time dilation and length contraction arising from special relativity; there are additional effects from general relativity (gravity and more generally acceleration induced) that we will not consider here.

Mode of Transportation	Speed in km/h	Speed as fraction of c	Time Dilation Factor	Length Contraction Factor
Car	100	9.26567e-14	~1.0	~1.0
Plane	900	8.3391e-13	~1.0	~1.0
Rocket	40,000	3.70627e-11	~1.0	~1.0
Satellite	28,800	2.66851e-11	~1.0	~1.0
Parker Solar Probe	60,0000	5.5594e-10	~1.0	~1.0

Figure 7. Time dilation and length contraction factors for various modes of transportation, illustrating the negligible effects at practical speeds.

Put differently, for all practical purposes, the effects of time dilation and length contraction are negligible for our fastest modes of transportation. However, they do manifest and matter for ultra-precision computations, e.g., in GPS. Relativistic effects would lead to losses of about $38$ms per day in the clock synchronization of GPS satellites if not accounted for, while the required precision is deviations of no more than $100$ns to deliver a positioning with about $1$m accuracy.

Einstein’s Famous Formula

While $E = mc^2$ is probably the most famous equation in physics and strongly associated with Einstein, its origin and derivation are not as clear-cut with discussion on whether Einstein’s original derivation was correct or not (see longer discussion on Wikipedia here). Also, there are a couple of subtleties in terms of what exactly is meant and there are a couple of similar but different equations and notions floating around. Let us start with what was actually written by Einstein [E2] in his 1905 paper:

which roughly reads as: “If a body gives off the energy $L$ in the form of radiation, its mass diminishes by $L/V^2$” (Einstein used $L$ for the energy and $V$ for the speed of light). He essentially matched the known form of the kinetic energy of $\Delta K = \frac{1}{2} \Delta m v^2 = \Delta m \frac{v^2}{2}$ against what he obtained from his computations $\frac{L}{V^2}\frac{v^2}{2}$ to conclude: $\Delta m = \frac{L}{V^2}$. Adjusting for notation this corresponds to $\Delta m = \Delta E / c^2$ or equivalently $\Delta E = \Delta m c^2$. But what does this exactly mean and what does this have to do with special relativity?

Usually written as $E = mc^2$ it can be interpreted in a couple of different ways:

Mass-Energy Equivalence at rest. $E_0 = m_0 c^2$: This is the famous equation that relates the energy of an object to its mass and the speed of light. The key is that $E_0$ and $m_0$ are the energy and mass at rest. This one was the object of study in [E2] and it is important to note that Einstein’s [E1] paper already contained the kinetic energy of a particle:

Here $W$ is the kinetic energy and $\mu$ is the mass. Note the $-1$ in brackets is because this is only the kinetic energy and not the total energy to avoid confusion further below. In our language, this becomes $E_{kin} = (\gamma - 1) m_0 c^2$, where $\gamma$ is the Lorentz scalar from above. If we use the definition of relativistic mass $m_{rel} = \gamma m_0$, the equation becomes $E_{tot} = m_{rel} c^2$, which leads us to the next point.

Relativistic Mass-Energy Equivalence. $E_{tot} = m_{rel}c^2$: This is the equation that relates the total energy $E_{tot}$ of an object to its relativistic mass $m_{rel}$ and the speed of light $c$. Note that the notion of relativistic mass is somewhat controversial and deprecated(?); at least Einstein did not like it (see here on Wikipedia).

Energy-Momentum Relation. $(m_0 c^2)^2 = E_{tot}^2 - (pc)^2$: This is the energy-momentum relation which relates the rest mass $m_0$ of an object to its total energy $E_{tot}$ and momentum $p$. This one is a little different than the other two but will play a key role below.

Why One More Postulate?

While the two postulates of special relativity address the invariance of physical laws and the constancy of the speed of light, deriving mass-energy equivalence necessitates an additional postulate as special relativity primarily deals with the relationships between space and time. There are many different ways to make the link with energy and mass—some are more rigorous than other—I stress that as a non-physicist I will not make any claims regarding the “physical correctness” of different derivations, however they are the most common ones I found out there and also in some sense the most natural ones, at least to me.

So in summary, it is the introduction of this additional postulate that makes the derivations “softer” and also led to criticism of the original derivation of Einstein by some of his contemporaries.

Derivation from the Invariant Interval

Sticking to the spirit of the invariant interval that we have used before, there is a similar invariance that connects energy, mass, and momentum.

Recall (InvariantInterval) from above:

\[\Delta s^2 = c^2 \Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2,\]

In fact, 3. (Energy-Momentum Relation) also induces an invariant interval, i.e.,

\[\tag{EnergyMomentumRelation} (m_0 c^2)^2 = E_{tot}^2 - (pc)^2.\]

Now, we can simply choose $p = 0$, so that $E_{tot} = E_0$ to obtain

\[E_0 = m_0 c^2.\]

In some sense, this is cheating as we already used the total energy to derive the rest energy and in particular how do we know (EnergyMomentumRelation) in the first place? While the full derivation is beyond the scope here, in the following I will provide two “more direct” derivations from first principles.

First Principles Derivation 1: Conservation of Momentum and Energy

This is essentially the original derivation of Einstein in [E2], which was later criticized by Max Planck and others but it seems that it is generally accepted and believed to be correct today; we follow the exposition of Wikipedia here. Einstein imagined a body emitting two light pulses in opposite directions. Before the emission, the energy is $E_0$ and after the emission, it is $E_1$ in the body’s rest frame. When viewed from a moving frame, $E_0$ becomes $H_0$ and $E_1$ becomes $H_1$. In modern terms, Einstein found:

\[(H_0 - E_0) - (H_1 - E_1) = E \left( \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}} - 1 \right).\]

He then suggested that $H - E$ can only differ from the kinetic energy, $K_0$ and $K_1$ respectively, by an additive constant, leading to:

\[K_0 - K_1 = E \left( \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}} - 1 \right).\]

Now we can use the Taylor approximation of the factor on the right, which is essentially the Lorentz scalar $\gamma - 1$, and ignoring high-order terms greater than third order in $\frac{v}{c}$, we get:

\[\Delta K = \frac{E}{c^2} \frac{v^2}{2} = \frac{1}{2} \Delta m_0 v^2,\]

where the right-hand side is the kinetic energy of the body, doing the same matching as above, which gives $\Delta m_0 = \frac{E}{c^2}$ as claimed, which is equivalent to

\[\Delta E = \Delta m_0 c^2.\]

First Principles Derivation 2: Relativistic Kinetic Energy

The relativistic kinetic energy ($E_{kin}$) of an object is given by:

\[E_{kin} = (\gamma - 1)m_0 c^2\]

where $\gamma = \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}}$ is the Lorentz scalar from above. Now the total energy $E_{tot}$ is given by the sum of its rest energy and kinetic energy:

\[E_{tot} = m_0 c^2 + E_{kin} = \gamma m_0 c^2.\]

Using the same Taylor approximation as before, we get

\[\gamma \approx 1+ \frac{1}{2} \frac{v^2}{c^2}\]

and hence

\[E_{tot} \approx m_0 c^2 + \frac{1}{2} m_0 v^2,\]

where the term on the right is the kinetic energy of the object, so that we recover $E_0 \approx m_0 c^2$.

Note. In all derivations, we explicitly used the subscript $0$ to denote the rest mass and energy. And also, all derivations are a little hand-wavy in terms of approximations and logical arguments.

Acknowledgement

Thanks to Sébastien Designolle for providing valuable references and insights.

References

[P] Prideout, J. (n.d.). Relativity. Retrieved from https://prideout.net/blog/relativity/

[M] Mao, L. (n.d.). Special Relativity. Retrieved from https://leimao.github.io/blog/Special-Relativity/

[OD] O’Dowd, M. (2017). The Twin Paradox in Special and General Relativity. Retrieved from https://www.physicsmatt.com/blog/2017/1/18/the-twin-paradox-in-special-and-general-relativity

[W] Wikipedia. (n.d.). Proper time - Example 1: The twin “paradox”. Retrieved from https://en.wikipedia.org/wiki/Proper_time#Example_1:The_twin%22paradox%22

[MIT] MIT Game Lab. (n.d.). A slower speed of light. Retrieved from https://gamelab.mit.edu/games/a-slower-speed-of-light/

[E1] Einstein, A. (1905). Zur Elektrodynamik bewegter Körper. Retrieved from https://doi.org/10.1002/andp.19053221004

[E2] Einstein, A. (1905). Ist die Trägheit eines Körpers von seinem Energieinhalt abhängig? Retrieved from https://doi.org/10.1002/andp.19053231314

[E3] Einstein, A. (1946). E = mc2: the most urgent problem of our time. Science Illustrated. Vol. 1, no. 1. Bonnier Publications International. pp. 16–17.

[B] Buckley, M. R. (2017). The Twin Paradox in Special and General Relativity. Retrieved from https://www.physicsmatt.com/blog/2017/1/18/the-twin-paradox-in-special-and-general-relativity