COVID‑19’s impact on musicians includes everything from canceled tours and gigs to lost teaching and studio opportunities. It’s also affected our ability to get together and rehearse or jam for fun. For many purposes, like work meetings or conversations with friends, “getting together” remotely is as simple as getting dressed and firing up a Zoom call. That’s not true for music‑making – it turns out that even small amounts of latency introduced by the network are a big problem for musicians trying to play together.
When shelter-in-place began, my social feeds were flooded with questions from musicians like “Why can I play online video games with 200 people, but I can’t play music with one or two friends?” As both a professional drummer and web developer, I have a unique perspective on why this is a problem and what latency does to music.
As networks get faster with newer hardware and better software, interacting with a mobile app via its API starts to feel natural, more real‑time. In fact, APIs often respond faster than we as humans can. Human response time is not as fast as we think, but our sense of timing is enhanced by our ability to compensate and predict.
Baseball is a great example. It takes about 300–400ms for a fastball to travel from the pitcher to the plate, roughly the same amount of time it takes to blink an eye. Batters pride themselves on their timing, and for good reason: they only have about 150ms to decide whether to swing. Any longer, and it’s too late to initiate the motion. Baseball players spend a lifetime learning how to interpret the ball’s early motion so they can predict where to swing.
Batting is not so different from a server listening for incoming data, processing it, and then returning a response. The batter knows the ball is coming, has multiple responses prepared (swing or don’t swing), and needs to choose the right one as quickly as possible. But responding to an API call within 30ms – which NGINX has previously defined as “real‑time” API performance – is actually 10x faster than swinging a bat. In the context of a payment terminal, a transaction can be completed before the shopper finishes swiping his or her card.
The Effect of Latency in Music
The key difference between music and batting (or handling API calls) is that in music correct timing doesn’t fundamentally involve reacting. Music is not based on serial interactions like sending and receiving data packets – it’s played by generating sound at specified intervals of time in perfect synchrony.
The concept of playing music online with others is not new. There are audio‑only devices that minimize the inevitable latency that’s added by digital encoding, transmission, and decoding. They don’t work perfectly, but are better than video platforms which have to encode, encrypt, and buffer the much more complex visual signal along with the audio. There are a few impressive systems with top‑down design specifically for music (take a look at LoLa for inspiration), but unfortunately nothing out there is perfect. When timing is off in music, even by milliseconds, it’s best described as a feeling…and an unsettling one.
Practiced musicians begin to feel discrepancies in time starting at latencies as low as 10ms. We know this because latency is monitored in recording studios and the maximum acceptable latency is 10–12ms. Higher latencies tend to distract musicians from their performance. Of course, musicians are not perfect and play notes in the wrong place all the time, but we also have the ability to adjust and compensate. Fluctuation is okay and actually makes certain kinds of music feel more natural. Classical music is known for a fluid approach to time, with speeding up and slowing down used for expressive purposes, whereas more popular styles of music tend to play to a metronome and aim to be as precise as possible. In all kinds of music, the goal when playing with others is to land on the beats together, even when the timing is flexible. That’s really hard to do when the sound reaching you from the other musicians is delayed by more than a small amount.
To help illustrate what latency sounds like, I’ve put together a couple of examples. The clip used here comes from a group I play with in NYC called Up & Orange. This was the last rehearsal we had before my studio was closed as a non‑essential business.
Here’s the first version for context, without any added latency. The groove is nice and funky.
In the second clip, I’ve muted the original drum part and replaced it with a copy where latency against the beat established by the bass increases over time. You’ll probably notice starting around 30ms that the drums feel a little behind, by 50ms they sound sloppy, and at 70ms they feel outright sluggish.
Finally, the best way to hear how far the drums are actually moving from the established beat with each increase in latency is to play both the original drums and delayed drums. At 90ms, the doubling effect actually sounds cool, but by the end the groove is completely falling apart.
Did you also notice that at 10ms of latency it barely sounds different from no latency? That’s because sound takes time to travel even in ideal acoustic environments, so musicians are used to 5–10ms of latency – it actually feels natural.
Unlike data through an optic cable, sound doesn’t travel through the air at a uniform speed. In very simple terms, frequencies of sound vary in size; higher frequencies have smaller wave patterns than lower frequencies, and as a result move faster at a given air pressure/density. For the sake of providing a number, I am pulling from this chart showing latency in a generic room as sound travels from a speaker to a microphone placed at increasing distances. It’s not exact, but acoustic sound travels at about 10 feet every 9ms.
Playing with other musicians on a small stage feels great, and this explains why: at that distance, the natural latency is about 5–10ms. We perceive some amount of latency as real‑time.
As distance increases, so does latency (and other aspects of the sound). Anyone who has played a large stage can attest that it’s not always easy to hear the other players. In popular music, the most common solution is to use in‑ear headphones, or monitors which are speakers pointed toward each musician. Orchestras and acoustic bands use a conductor to dictate time and dynamics visually. All of these options allow musicians that would otherwise be distracted by latency to focus on keeping their performance synchronized with the ensemble.
For readers that are unfamiliar with how time works in music, time is interpreted as a number of beats per minute (bpm), called a tempo. Sixty beats per minute equals 1 beat per second, 120 bpm equals 2 beats per second, and so on. If the tempo of a song is 120 bpm, this means there is a beat every 500ms (1 second being 1000ms).
On a large stage without monitors, being separated by 20 feet creates latency of 18ms, causing that 500ms/120bpm to feel like a longer 518ms/115bpm. Longer distances create the feeling that the other players are performing at a slower speed than you, even when they are not. The common term for this is “dragging” and it feels terrible, like trying to lift your foot out of wet cement.
To further complicate matters, each beat can be subdivided into smaller beats: a whole note can be divided into four quarter‑notes, those quarter‑notes can be divided again into thirds, eighths, sixteenths, etc. While there might be one big beat every 500ms, that 500ms can contain several subdivisions with syncopated placement. The space between a sixteenth‑note and a sixteenth‑note triplet is only about 40ms.
Perhaps the most challenging example is a marching band. The average college marching band has about 220 musicians spread out across a football field. Players on the edges are literally separated by…well, a football field. For added difficulty, the distance between players constantly changes as they make different formations. Without a conductor, trying to perform in this kind of band would be impossible. For this reason, marching band musicians are generally instructed not to listen to each other for timing and to rely only on the conductor.
Jamming online with multiple players, unfortunately, feels like being in different parts of a football field. Not only is the amount of delay different for each player, the latencies can fluctuate because of congestion and jitter on the public network.
A Network Is Only as Fast as Its Slowest Component
Networks will continue to be improved, but we can also expect more traffic as streaming services use more bandwidth and more IoT objects become connected. Still, video calls will continue to feel pretty natural and payment terminals will feel instant. But we will never get there with music.
According to mathematician and physicist Philippe Kahn, there is still one main challenge that prevents musicians from being able to achieve a real‑time experience: Einstein’s theory of relativity, which states that nothing can travel faster than the speed of light. In addition to mathematics, one of Philippe’s many passions is his life‑long practice of classical and jazz music.
As Phillipe says, “No matter how efficient the network and equipment, latency is unavoidable. Therefore the problem of real‑time remote music performance comes down to ‘What is the acceptable latency?’ My personal opinion is that a consistent 10ms is a minimum to serve all musical styles. The less the better. But there is always going to be some latency. You can’t beat Einstein and the laws of physics, except in science fiction books where we travel in time, which is a lot of fun!”
Simply put: even for light, travel time is not zero. There is a latency of about 5ms per 1500km (about 930 miles). Even under perfect conditions a signal traversing a network cannot go faster than this. While musicians expect some latency, it needs to be consistent and similar to the speed of sound on a small stage. In my opinion the real killer is spikes or fluctuations between multiple latencies, especially because the difference between subdivisions has so little room for error. One network spike can change the meaning behind a note. Time in music is complex, and more than anything, it’s very specific. In my mind that’s a feature, not a bug. It keeps the emphasis on playing live and performing together. It reminds us that it’s important to practice, and that it’s a special thing to be revered when a musician is so developed that they can discern milliseconds.
Musicians will probably never get to experience truly acceptable latency when there are multiple players in different locations, at least not one good enough to replace getting together in person. That doesn’t mean we can’t eventually take advantage of future network improvements to hop online and try to have a little fun.
Guest blogger Caleb Dolister is a web developer, musician and composer, and consultant. You can hear (and download!) recordings of his compositions at Daily Thumbprint Collection.