SPECIAL REPORT OF THE NEC VOICE AND SOUND ANALYSIS LABORATORY
SoundJack: The Unofficial Guide to Low Latency Online Music Making
current update 16 June 2020
By Ian Howell
Table of Contents
Introduction
What is SoundJack and What Else is Out There?
Cultural & Philosophical Issues
Requirements
Understanding the Installation
Optimizing Your Computer & Internet Connection
Computer
Internet Connection & Router Settings
Optimizing SoundJack’s Settings
Session Settings
Connection buffer settings
Calculating actual latency
Conclusions
Introduction
Hello. This page is an effort to gather in one place all the guidance and experience my collaborative team at and beyond NEC has accumulated regarding the SoundJack app. The demand for this information is high enough that we have decided to release it as we go. This page will be updated as we learn more; please check back frequently. There are almost certainly typos or instructions that could be clarified, and we would be happy to address either if you let us know about them. This guide is written as an addition to, not a replacement for the video tutorials found on the tech tutorial page on SoundJack.eu.
I am aware that this document is long, but please take the time to sit with it. Low latency, real time distributed music making is as much an idea as it is the actual technology. You get the best results by understanding the problems and reacting with a set of tools. You will not be satisfied if you just mess around with these solutions, casually picking at the settings.
What follows is based on several weeks worth of exploration, experimentation, and trial and error. No one associated with this research group is a professional network technician. But as methodical end users, we have discerned patterns that dependably return the best result. Much of this is due to the openness of SoundJack’s developer, Dr. Alexander Carôt, who has been very generous with his time.
This guide was facilitated by the contributions of:
Ian Howell, NEC, Kayla Gautereaux, NEC, Theodora Nesterova, NEC, Chelsea Whitaker NEC, Nicholas Kitchen, NEC, Eric Engler, NEC, RPTS department at NEC led by Lisa Nigris, Gregory Ristow, Oberlin, Nicholas Perna, Mississippi College, Joshua Glasner, Clarke, and Chadley Ballantyne, Stetson
To understand how this video was made, please read Best Practices for High Quality, Technology-Enabled, Applied Music Teaching.
What is SoundJack and What Else is Out There?
SoundJack is a low latency audio/video communication platform created by Dr Alexander Carôt. It exists within a marketplace of similar platforms, which all have different pricing structures, hardware, and software requirements, and network architectures. At the extreme high end, you will find solutions like LoLa and Dante. These are site-specific installations with specific bandwidth, software, and hardware requirements. You are much more likely to find these installed on college campuses than in someone’s home. There are a number of companies sprouting up seeking to seamlessly implement this sort of technology for the masses. One that I am immediately aware of, which appear promising but currently has limitations, is Digital Stage. Solutions that currently work, even if they require varying degrees of technical know how to set up or effectively tweak, include JackTrip, Jamkazam, Jamulus, and SoundJack. They are all free.
There are three network architecture models these platforms use. The first two fall into what can best be thought of as a hub and spoke model. In this situation a server receives audio streams from every participant and then that server routes all audio streams back to all participants. In one implementation, this takes place with a pre-established public server at some distance. In the other implementation, you turn your computer into such a server for a group of collaborators. This option should be particularly attractive to college network administrators, as they already have the network infrastructure and distributed ethernet ports available to connect students to one another while on campus. However, anybody can technically do this with any computer, so long as that computer is powerful and fast enough. The third option is to establish direct peer to peer communication between two or more computers. In this scenario, which is the fastest, every participant in a collaborative music making session establishes a direct connection to every other participant. The upside is this affords the absolute lowest latency of any option. The downside is that every peer to peer connection takes that much more bandwidth and processing power.
Jamulus currently offers only server-based solutions, although you have the ability to set your own computer as the server. The interface is clean and easy to understand. I personally think this solution would be best if one were dealing only with a local area network, like on a college campus. The downside with using a server-based solution is that the latency between any two people is going to be the latency between each of those people to the server added together. If everyone is close and the network is fast, that may be negligible anyway. Jamkazam advertises itself as a way for people to meet each other to play music. It is true peer to peer with audio and video, and I have seen very impressive demonstrations. I personally find the user interface to be a bit distracting and more than what I need to connect people who already know they want to work together. JackTrip offers or will soon offer multiple setups: local server-based, cloud server-based, and peer to peer. It is one of the oldest solutions, is actively in development, and I am very excited to see what that project releases. In its current form it is highly effective, but one of the more complicated solutions to use.
For me, SoundJack is the best compromise of features, flexibility, and complexity. One can create multiple peer to peer connections with much lower latency than any of the server-based options, one can turn a single computer into a hub to distribute audio among a group of participants, or soon (in beta) one will be able to set up a remote server. The interface is as complex as it needs to be given the flexibility of the controls. I like that the layout is designed for people in pre-organized groups to connect, with private “rooms” and tools to block communication from strangers. The availability of one way video (also in beta but functional) is attractive. My working group has spent the most time with it, and the guide that follows is specifically for this program. We have no financial interest in any of these solutions.
Cultural & Philosophical Issues
For musicians, lag (delay, latency, etc) is the single greatest challenge to spontaneous online music making. Digital information is bound by physical constraints. It literally travels. The farther away two people are, the longer it takes for information to arrive. In the early 19th century it would take years to deliver a message around the world by hand. The telegraph knocked that down to minutes. The telephone carried a voice and reduced that down to a few hundred milliseconds. A Zoom call can send audio and video in less time than that. But none of these technologies replicate standing in front of someone and singing.
Sound also takes time to travel. It is a wave phenomenon in a physical medium that travels about one foot per millisecond (in air). Really it is ~1.125 feet, but you will do fine if you imagine a foot per millisecond and then add another foot every eight milliseconds. So if we map the time it takes for sound to travel from one singer to another onto the travel time of digital information, we can begin to imagine what sort of physical spaces Zoom, FaceTime, and the like represent.
If Zoom has a 100ms one way delay, that means singing over Zoom is like singing with someone 112.5 ft away. This is why collaborating in real time over Zoom is such a terrible experience. It would be equally terrible separated by that real distance on a stage. In practice, musicians ignore sound and watch a conductor when separated by such distances. Unfortunately, online such distances cause the video to be delayed as well. Which brings us back to lag. Ugh… lag.
Super low latency platforms give performers the chance to take several steps toward one another in a virtual space. A stable, 20ms one way delay is like standing 22.5ft away from a collaborator. We frequently make music with colleagues 22.5ft away. And 22.5ft starts to seem more manageable when the other option is to make no live music at all. Our recent experiments using SoundJack suggest we can pull in even closer. Over home Fios connections separated by greater than 20 miles, we have achieved approximately 12ms one way delays. We have established stable connections of 27ms and 32ms one way between Boston and Oberlin, Ohio and Boulder, Colorado respectively. This starts to boggle the mind.
The question then becomes not what the technology can do, but what we can do with it. What follows is an orientation to explore best practices with SoundJack; to help you get the most out of it that you can, given the limitations of your hardware and internet connection. But there are deeper issues at play here, especially within academia.
The Covid19 pandemic presents specific challenges to music making, especially instruments like voices, woodwinds, and brass which produce droplets and aerosols. By virtue of geographical location, some schools will be better positioned to provide high speed internet connections than others. Like any other resource, e.g. physical rehearsal and performance facilities, libraries, or audio technology, different schools are able to provide different levels of service. However, in the past century we have not seen such a sudden and uniform challenge to academia’s ability to provide safe spaces for making music together. Even accepting that different institutions provide different levels of quality, at this moment, as the fall of 2020 approaches, schools that offer real time online collaboration as an option will likely provide a radically different, and more nimble level of service. This technology exists now. Whether one chooses SoundJack specifically or not, any school that fails to provide access to this kind of platform is making a choice to deprioritize live music making for this school year. Or at minimum is betting on the behavior of a virus that shows no signs of disappearing.
Requirements
Many new to SoundJack want to start tinkering without investing in any new equipment. I understand wanting to get your feet wet, but I need to underline that the following are requirements for a real-time experience. The program can run your system right at the edge of what it can do and any compromise is, well, compromising. Unfortunately for Windows users the options are a little more restrictive than for MacOS or Linux users. No solution exists for iOS or Android devices. To run SoundJack effectively, you will need:
An Ethernet cable connection directly between your computer and router LAN port. You will be able to connect via WiFi but the signal quality will be lower and the latency higher.
Headphones that are wired. Using Bluetooth headphones will increase overall latency of the sound getting to your ears. A speaker will cause an echo or feedback.
On MacOS: An external mic. On a MacOS computer you can use the built-in microphone with headphones to start, although Apple’s hardware adds a noticeable delay over an external audio interface. On a MacOS computer you can use a USB mic like a Blue Yeti. Ideally you will have an external audio interface (e.g. Focusrite Scarlett 2i2 and a wired microphone). MacOS routes all audio through the operating system using a driver called “Core Audio.” Using nicer equipment will get both better sound qualiy and less delay.
On a Windows computer: ideally an external audio interface and a wired mic. Unlike MacOS, Windows has a variety of audio drivers, typically: MME, WASAPI, and ASIO. Only ASIO will work with SoundJack. This means that a Blue Yeti (no support for ASIO) will not work out of the box. Your internal mic may work, but like the Mac internal audio options, will introduce additional delay. There is a solution, called ASIO4ALL, which converts your non ASIO mic output into an ASIO signal. But be warned, it is another step to add to a complicated process, and you need to both download ASIO4ALL and configure it properly. Also be warned that we currently have found a bug with the popular Focusrite Scarlett 2i2 and SoundJack specifically on Windows. We do not recommend this combination.
Set BOTH input and output in SoundJack to your audio interface. Do not use the headphone out on your computer.
I am hesitant to recommend any one specific audio interface, beyond advising that you (a) avoid Zoom recorders that claim to function as a USB interface and (b) avoid the Focusrite Scarlet 2i2 if you use Windows. Otherwise, any modern USB audio interface that advertises itself as “low latency” should be fine
At least one party in your ensemble likely needs access to their router configuration page. This is complicated if you’ve never done it, but once you have, it is simple. It is conceivable that you can successfully connect without doing this, but be prepared in case. Router configuration is likely not necessary on a private network or campus intranet, but check with your network administrator.
Understanding the Installation
//////////
under construction
No matter what, please watch the entire sequence of tech tutorial videos on SoundJack.eu. 98% of the problems people have run into are related to not following the instructions.
For both MacOS and Windows, first go to SoundJack.eu on a computer and select “register.” It can be found on the upper right below the menu. Choose a username you are comfortable displaying to the world, as this is what people will see when you are on SoundJack. Choose a strong password not associated with another account. Follow the registration instructions and complete the registration via email.
Once you have registered, return to SoundJack.eu and login. Go to the now visible downloads tab and select the file appropriate for your operating system.
MacOS:
You will download a dmg file. As of version 200616, when you open it you will see a blue folder, a short cut to the applications folder, and a black circle icon that is the app. Drag the app icon to the blue folder and drop it in. This will copy the file to your real applications folder. If you have used a previous version of SoundJack on a Mac, note that there is no starter script anymore. You will launch SoundJack directly from the app icon in the applications folder.
Go to the applications folder. This is typically found in the left shortcut column of a finder window or in the dock. Double click the SJC 200616 icon. Depending on your security settings, MacOS will protest, throwing up a window that claims to have stopped the installation of an app from an unknown developer. If you have downloaded an app from a web browser rather than the App Store, your have experienced this. Go to system preferences—>security—> general tab. Toward the bottom you will see a note that MacOS blocked the installation and a button to allow the installation anyway. Install it.
///////
Optimizing Your Computer & Internet Connection
While SoundJack is a network transmission tool, and the problems it solves are network problems, the challenges the end user must deal with appear to be closely related to managing the computer CPU (Central Processing Unit) load. Currently tested computers have all had 8gb or more of RAM.
In my mind there are two primary ways the sound quality can fail: (a) a garbled underwater sound and (b) clicks or pops. The former appears to be related to network issues, the latter appear to be related to processor power (spent either sending or receiving the data). Every adjustment that increases transmission speed costs computing power. It is possible to increase transmission speed beyond the network’s capacity to carry it (garbled underwater sound). It is possible to push the CPU so hard that it cannot keep up (clicks and pops). The first step then must be to optimize the computer itself, followed by understanding how to troubleshoot a poor connection based on the sound and CPU behavior.
Unless you can upgrade it, the CPU’s power is fixed. The number of cores in the CPU appears more important than the clock speed (expressed as a gHz value). So far, we have found that a 1.6gHz dual core i5 processor in a MacBook Air is incapable of utilizing SoundJack's fastest settings. A six core 2.2gHz i7 in a MacBook Pro has plenty of power. We do not yet know where the breakpoint is, but a 2.3gHz dual core i5 in a MacBook Pro seems sufficient for at least one connection at the most CPU intensive settings. Unfortunately this means the five year old dual core i5 MacBook Air commonly owned by current college students may not be powerful enough to ever use the fastest possible SoundJack settings. This does not mean it will not work at all, just that the quality may be lower, latency higher, and there will be fewer possible simultaneous connections and a smaller potential geographical reach. On a college campus network, connecting to others on campus, these CPU limitations would likely become acceptable.
Most consumer software is built to accommodate the strengths and weaknesses of a range of computers. We may have to wait longer for iMovie on a MacBook Air, but for the most part we can assume it will run. SoundJack is built on a different premise. SoundJack will let you break it, because slowing down for a slower computer is antithetical to its purpose. The art of a good SoundJack connection lies in understanding how close to that edge you can come given the specific circumstances of your connection. If you drive the computer too hard you will crash SoundJack.
Computer
Given that the CPU is likely fixed for your computer, the end user must relieve the computer of as many jobs as possible. Our suggestions are:
Make a new user. The easiest way to turn off all the ongoing, normally helpful background processes that have crept into your user is to make a clean user.
Download a light browser. [UPDATE: as of version 200616, Chromium browsers have a bug that causes pop up tooltip windows to stick and cover some controls. Until this is addressed, pleased use a non chrome-based browser like Safari.] We have experimented with Vivaldi and Brave, both of which decrease CPU overhead by 2-3% over Firefox or Safari. Midori and Opera are also lightweight browsers worth exploring (not all browsers work on all operating systems). As explained below, test the browsers using the CPU monitoring apps built into MacOS and Windows. You should use whatever browser runs the fastest on your specific computer.
Run NOTHING else, even background processes like Dropbox sync. SoundJack requires a single browser window and the SJC app to run. Literally run no other apps at the same time. Do not work on a Word document at the same time. Do not use Logic or Pro Tools to record SoundJack’s output. On MacOS, QuickTime Player for recording paired with Rogue Amoeba’s internal audio routing software, Loopback, appears to be a lightweight internal recording option if you need one. If you use Loopback to route audio virtually, close the app once the virtual device is set up. VB-Audio Software’s Virtual Audio Device is a similar program that can be used on Windows or Mac. Rogue Amoeba’s Audio Hijack (macOS) combines the loopback device and audio recorder into one program and appears to have the smallest CPU footprint.
Except… when getting to know your setup, you should run Activity Monitor—>CPU tab (MacOS) or Task Manager—>Performance tab (Windows). Until you understand how the browser and SoundJack impact your CPU, it is best to see the effect of your choices. Your target should be to keep the processor idle >=90% This suggestion came straight from the developer and appears to be a good rule of thumb. If your CPU idle drops below 90%, you may start to experience audio dropouts in the form of clicks and pops. (see image from MacOS below)
For the adventurous, consider constructing a dedicated Linux OS Raspberry Pi. Several of my colleagues are working on this right now as a cost effective way to give students access. Watch this space.
For Windows users specifically, download the app Latency Mon. It will run tests to see if your system settings are too restrictive. You will almost certainly need to go into your power settings control panel and switch to a high performance option. This includes going to the advanced settings and setting both the minimum and maximum CPU management to 100%. (see image below)
Internet Connection & Router Settings
Log into your router settings by entering your Router Address (typically either 192.168.1.1 or 10.0.0.1, depending on your Internet service provider) into the address bar of an internet browser and enter your account information. As every router is different, the next few steps are the hardest to explain. The best case scenario is that you will figure out how to change the following settings by some combination of intuition and google. I cannot tell you exactly where in your router settings you will find these options.
Must: You must connect to your router with an Ethernet cable and turn off the WiFi on your computer. If you skip this step, you will likely not get good results. If you are waiting for the Ethernet cable and adapter to arrive, play away with only WiFi. But do not form any opinions about how well SoundJack works, as you’re using it incorrectly.
Should: Set a static IP address for your Ethernet connection. Every device that connects to your network is assigned an Internet Protocol (IP) Address. This is how your router knows what traffic is from or for your iPhone versus laptop. If you have logged into your router, click over to the advanced settings and find the local IP address list. This should include every device connected to your network. Select your computer, edit the entry, and make it a static IP address. This makes everything else that follows infinitely easier. If you are stuck, google how to do this with your router.
Should: Adjust your Quality of Service (QoS) settings on your router. QoS is a way of telling the router what data packet traffic to prioritize. Typically consumer routers prioritize media (movies, streaming music), and de-prioritize background processes (email, crash reports, etc). If you are able to adjust these QoS settings, most likely in the advanced page of your router settings, prioritize traffic to and from your computer’s specific static local IP address. This way your SoundJack data will be ahead of your roommate’s Netflix data.
Should (but may be ok without): Open UDP port 50050. Look for the “port forwarding” setting, usually found on the firewall page of your router settings. Depending on your router this setting will look different, but there are three basic values you must enter. (1) your computer’s static IP address, (2) the port type, which is UDP only. Not TCP. Not TCP/UDP. (3) the port number, which is 50050. There may be an option for incoming and outgoing as separate fields, or a single 50050 value for both. Make sure to click whatever button makes the port forwarding entry read as “active” in the list. This setting may take some time to come into effect. You may need to restart your computer and/or turn on/off your router.
There is an exception to the previous rule: In most cases, only one participant in SoundJack needs to have the port forwarding set up. It is conceivable, for example, that if the teacher is set up correctly, the student can skip this step. It is also conceivable that neither party needs to set up port forwarding. Almost all of the time this is true, but I want to caution you to verify on a case-by-case basis. Firewall settings vary from router to router. Maximum default firewall settings may prevent this from working.
Keep in mind that network traffic (on the shared internet infrastructure outside your house) will be heavier during the workday. If you have problems using the fastest connection settings, try after 6pm or on a weekend.
Optimizing SoundJack’s Settings
The following presumes that at least one party has configured their port forwarding correctly or that router defaults make that unnecessary, and that you are connecting two people.
SoundJack can be thought of as a complex system of balances that outputs a final number: the number of milliseconds between when my microphone responds to a sound and when your headphones reproduce that sound some distance away. The final product can be magical, as it is able to bring the one way latency (delay) below perceptual thresholds. Once the ear cannot tell there is a delay, I am not sure it much matters if it is a millisecond faster or slower. So please keep in mind the goal is not to reduce the delay to zero. Remember, it takes sound a millisecond to travel 1.125 feet anyway. Most of our in person conversations involve a delay of 3-10ms. Some delay is normal in music making. The goal is to make the number low enough that it does not negatively impact that music making. When this imperceptible delay is achieved, the most common reactions I have observed from musicians of the Zoom Video Conferencing Era are amazement, disbelief, and weeping.
SoundJack has a number of settings that are user adjustable. Some of these we set based on the needs of the session. Some of these we adjust to optimize the delay/processor load. I have had successful connections with someone on a cable connection at 1Mbps upload and had to throttle back someone on a much better plan. Once you understand how to manipulate the data buffers, you can make the best of what you have. We will discuss these settings in this order.
Please keep in mind that SoundJack is a versatile and powerful app. With a second computer and multiple multichannel sound cards, one could structure incredibly complex remote recording sessions. This guide focuses on helping you to understand how to set up a simple, low-latency, real-time connection between 2-4 people using one computer and one audio interface each. The goal is to enable real time collaboration. Anything else we might do with SoundJack is gravy.
And finally, this walk-through only describes peer to peer connections. You may explore the “send local audio + mix” option to imitate the local server option found on other platforms. This allows you to connect multiple slower computers to one faster computer (the faster computer sends the mix) as it saves those slower computers the processor power required for multiple peer to peer connections. The material that follows will help you discern whether or not you need to consider this option. Please be careful not to use this feature if connecting to the localhost or one of the mirrors, as you will hear feedback. This and other advanced features will be explained more thoroughly in the future.
Session Settings
Before beginning, close all apps and launch your CPU monitoring app (activity monitor’s CPU tab for macOS or task manager’s performance tab for windows 10). You want to make sure that you have minimal CPU use. Somewhere around 97-99% CPU free while running nothing would be best. MacOS shows the % of potential CPU power left idle. Windows shows the opposite, it shows the % utilized.
Launch the SJC app as instructed (exe file in Windows, as of v616 click the SJC app in MacOS). Open SoundJack.eu in a single browser window; again we recommend Vivaldi or another light browser app rather than Chrome, Firefox, or Safari. Login and go to the stage. You may get pop up information windows that generally may be dismissed unless they force an update.
If you ever run into issues where the stage thinks the SJC is not running in the background, log out of the website, close the browser and SJC app, relaunch the SJC app, relaunch the browser and log back in. If you actually crash the SJC app, you were most likely driving the CPU too hard. If this happens repeatedly, consider whether (1) you optimized your computer or (2) whether it is capable of running this app.
Check your CPU usage again. Opening SJC and a single window of Vivaldi shouldn’t bring your CPU load below ~94-96% idle (or above 4-6% utilized). Leave this window open and move it somewhere on your screen that allows you to keep an eye on it.
PLUG YOUR MIC AND HEADPHONES INTO YOUR AUDIO INTERFACE. TURN ANY EXTERNAL SPEAKERS OFF. SOUNDJACK HAS NO ECHO OR FEEDBACK CANCELLATION.
As you look at the left column of SoundJack, called “Settings,” make or confirm the following adjustments. Many of them are the defaults. For now we will focus on only these settings. (1) From the first drop down select “expert settings.” This reveals the rest of what we will adjust. (2) Select the public stage. (3) select your local IP address in the IP connection drop down. Generally do not use the VPN option. (4) Select your audio interface from the input (arrow pointing in) and (5) output (arrow pointing out) drop downs. If you plugged in your audio device after launching the stage, refresh the page to find it in the list. Normally your audio interface will appear once in each list. If it appears twice, choose the top instance for input and the bottom instance for output.
Unless you are using a stereo microphone, (6) select “send channels 1”. Be aware there is a known bug (may have been updated by now) if one party sends one channel and the other stereo. Either both send mono or both send stereo. Rule of thumb is never send more information than you need.
(7) Turn the green horizontal slider in the settings area all the way to the right. This is the volume for your own audio input loopback through your headphones. You could turn this all the way down and the other party would hear you fine. This is just for you. This fader, like the faders in the participants list to the right, are only subtractive. That means that all the way to the right will never distort unless you send in a distorted signal. The only reason to turn one down is if you have multiple streams coming in and you want to adjust their relative loudness. Otherwise, I recommend turning the fader all the way to the right for any active participant.
SoundJack offers a number of (8) audio compression codecs. After talking with the developer, I recommend using either Opus 96k or 192k. Even if your ISP can handle the Linear 768k setting, the Opus codec itself tries to manage the sort of audio dropouts that make clicks and pops.
Leave everything else at its default for now.
To start a connection you click the play button to the far right of any given person’s entry in the participants’ list. Typically you will first connect to the localhost entry (which tests the connection between your computer and your router), followed by connecting to the closest mirror server. Both of these options bounce your sound back to you, which will give you a good sense of what the real world round trip delay is for the various buffer settings.
Connection buffer settings
There are three buffer settings that you will tweak to optimize most connections. They are the sample buffer, network buffer, and jitter buffer. Throughout this guide we will use the shorthand of sample/network/jitter. For example a sample buffer of 64, network buffer of 256, and jitter buffer of 4 can be quickly expressed 64/256/4. I will take each of these settings and turn and explain what they are and why you might adjust them.
In digital audio, a buffer is a place that data can accumulate before the processor has to spend capacity moving it somewhere else or applying a process.
Sample Buffer
The sample buffer is the first of SoundJack’s buffers. Digital audio works by breaking an analog sound wave (continuous change) up into discrete moments in time that have a specific positive or negative pressure value (digital). A sample is basically one of those time/pressure values. So depending on how low or high you set the sample buffer, you accumulate more or fewer “pieces of the sound” before the processor moves them onto the next step of the program. You should set the sample buffer value as low as it can go without causing the audio quality to suffer. Even if it costs processor power, the lower the sample buffer the faster your audio interface transfers data, which directly lowers the latency contribution of the audio interface itself. Any modern low latency USB audio interface should be able to work at 64 samples. If the SoundJack developer adds 32 or 16 samples, try these options too.
Note that at this point, the ubiquitous Zoom handheld recorders do not seem to work well at 64 samples. They are not true low latency audio interfaces and we do not recommend them for SoundJack. Beware if an interface advertises “low latency monitoring.” This means the device quickly routes your own audio signal back to your headphones, not that the transit time to and from the computer is low.
Next in line comes the loopback moment where your sound is returned from SoundJack through your audio interface’s headphone jack. If you make a short sound like a clap or snap, you’ll hear the delay increase as the sample buffer increases. Keep in mind this delay represents twice the delay of the sound as it moves on to the network buffer, because your audio interface adds some delay going in and going out.
Network Buffer
The network buffer accumulates samples from the samples buffer. If the sample buffer is 64 and the network buffer 128 (64/128), two complete sample buffers will fill the network buffer before the network buffer sends the data on. I would like for you to think of the network buffer in two ways. Or at least as a tool to solve two different problems. You want to send your sound as fast as your processor will allow. The lower your network buffer, the faster your sound moves. However, this also requires the processor to work faster.Additionally, the more audio streams you send and receive the harder your CPU has to work. Assuming favorable network conditions (your actual connection to the other party is good), think of the network buffer as a relief valve for your CPU. If you have an older or slower computer, raising the network buffer to 256 may help to avoid dropouts, clicks, and pops. Especially if you are receiving multiple streams. So while ideally we keep the network buffer as low as possible, the two potential reasons we would raise it are: (1) if we are hearing or transmitting (the other person hears in our sound) clicks or pops, or if the other person hears your sound interrupted by garbles. If the latter, they may be able to address this with their jitter buffer, but you may have to adjust the network buffer to 256 on your end. The more peer to peer connections you make the harder the CPU load and the higher the likelihood of needing to revisit whether the network buffer still works at 128. So remember, unless there is garble in the sound you are transmitting, the network buffer is fundamentally a switch to relieve your CPU when it has exceeded 10% utilization (or dropped below 90% idle). This could happen because you have a single connection and a slow CPU. Or a fast CPU and multiple connections. Always use the computer metrics as a guide and your ear as the judge.
Jitter Buffer
The jitter buffer is last in line. It is currently not labeled (as of SJC200611), however there are pop up tooltips that will name it if you mouse over. Note that there appears to be a bug with Chrome based browsers, including Vivaldi that causes the tooltip on the jitter buffer to stick on the screen, covering whatever is below. You can solve this by using another browser (e.g. Safari or Firefox). You will find it in the entry line of the participant you have connected to. It is the right most numerical dropdown menu. It defaults to 4 and when connected will light up either green, flashing green and red, or red. Green is good. Red flashes mean connection issues that may or may not be audible. The more red, the worse the quality.
The jitter buffer is a little challenging to wrap your head around. If you connect to one of the mirrors, think of the jitter buffer as a final, incoming network buffer. So your sound goes through your sample buffer and network buffer, is flung out across the Internet to a server and returned to you. Before the returned audio plays back it is buffered by the jitter buffer. Why might we want to do this? The same two reasons we might adjust the outgoing network buffer. (1) If your CPU is overloaded and you hear clicks, raising this jitter buffer will decrease load. And (2) if there is some sort of delay in the transit of your data, raising this buffer ensures you don’t hear garbled sounds. It allows the incoming stream to be a little jumbled because the buffer accumulates the data faster than you play it back. There is a direct relationship between your network buffer and jitter buffer in this respect. A low network buffer may need a higher jitter buffer. A higher network buffer will allow you to lower the jitter buffer.
When you connect to another person, rather than to a mirror, the buffer relationships change. You will see a jitter buffer in your entry in the participants list. Ignore this; you will never interact with your own entry. You want to adjust the jitter buffer of the person you have connected with. When connected to the mirror, your network buffer has a relationship with the jitter buffer in the mirror’s participants’ list entry. When connected to another person, the jitter buffer you see in their entry in the participants list has a relationship with their network buffer, not yours. Similarly, they will adjust a jitter buffer on your entry in the participants list that they see, and that has a relationship with your network buffer.
So, if for whatever reason you raise your network buffer (perhaps your CPU load jumped because you added a third or fourth connection), the people you have connected with should take that as a chance to lower the jitter buffer they find in your participants list entry. Similarly, if the other person lowers their network buffer, you may need to raise the Jitter buffer you see in their entry in the participants list. You can see the current network buffer settings for every person you have connected to in their entry in your participants’ list. It will be a number inside an icon that looks like a clear cylinder with a black top and bottom. From a pure latency perspective, it appears that 64/128/3 is a little faster than 64/256/1, but CPU management may force somebody to raise their network buffer to 256 regardless.
Examples
Here are a few common set ups and solutions that we have encountered so far. Ideally, all parties would set their buffers to 64/128/1 and the sound would be perfect. Keep in mind that if there is an issue with the audio quality, it will likely not be uniform. That is to say, these case studies explore how to adjust settings to fix the problem for the specific participant experiencing or causing the problem. Everyone on the same session will likely not use the same settings.
Case one: two participants, both on high-speed fiber connections, one with a fast computer, one with a slow computer. The primary loss of audio quality is likely going to be clicks and pops. As mentioned above, using the opus 96 or 192 codec will help smooth over some of these interruptions. However, it is likely that the participant on the slower computer will want to set her network buffer to 256 to relieve pressure on her CPU. The participant on the faster computer should be able to set the buffers to 64/128/1 or 64/128/2. The participant on the slower computer will have to adjust their jitter buffer based on the quality of the connection. They may not be able to set it as low as one, but two or three should be fine. Again, she should watch the CPU load.
Case 2: 2 fast computers where one participant is on a fiber connection and the other a cable connection. In our experiments so far, it does not really matter whether someone on a cable connection has a one megabit per second upload, 5 Mb per second upload, or higher. Simply being on RCN, Comcast, or Xfinity exposes one to some level of ongoing network disruption and traffic congestion. The participant on the fiber connection will likely be able to set their network buffer to 128. The participant on the cable connection will likely need to raise their network buffer to 256 and raise the incoming jitter buffer. This will likely be a situation where clicks and pops will not be the issue that suggests this correction. Rather, the person with the cable connection will sound garbled or like they’re underwater. Once these network buffers are set, adjust the jitter buffers as low as possible. The person with cable receiving the signal sent with a network buffer of 128 may need a slightly higher jitter buffer and the person with fiber receiving the signal sent with a network buffer of 256 will be able to set a lower jitter buffer.
Case three: both parties with slow computers on cable connections. In this case, I would recommend starting with a network buffer of 256 and a jitter buffer of four. Start to slowly step the jitter buffer down by increments of one on both sides. Watch for flashes of red, but also listen for either clicks or pops, or garbled sound. You will likely see red flashes before you actually hear an issue.
Case four: any of the above cases, but adding additional peer to peer connections. Setting up multiple, stable connections can require a little bit of craft. If you have understood everything in this guide up to this point, you are prepared to troubleshoot any issues. If all parties have incredibly fast computers and fiber connections, you should be able to set the buffers to 64/128/1, and add multiple additional connections without changing those settings. Keep in mind that a trio really means every participant is connecting to two other people and a quartet means every participant is connecting to three other people. Each of these connections adds to the processor load. If your upload or download speeds are low, and especially if you have not set the quality of service (QoS) settings mentioned above and other people in your house are using the Internet, you may find that you need to lower the quality of the audio codec itself to 48. You will notice a drop in the quality though, so use this sparingly. In most cases, the best course of action is to establish a single stable connection first, then add the second or third while watching your CPU usage. If you find that there are challenges with additional connections, try to solve them as holistically as possible. Keeping in mind how powerful each user's computer is, what type of connection they have, and what time of day it is (are all the doctors, lawyers, and teachers also using the internet?). Those with older computers may keep their network buffers at 256 and may need to keep all of their jitter buffers at a slightly higher number for the sake of their processor. Similarly, a participant with a cable connection may need to keep their network buffer at 256 just for the quality of the network they’re sending over.
Calculating actual latency
[This section updated 3 July 2020]
For each connection in the participants list you will see an estimated latency to the right. As a good rule of thumb, this indicates the one way network latency. Add the latency of your interface and multiply this number by 1.225 and you will have an estimated distance in feet for the equivalent acoustical delay. So long as the displayed estimate is at or below 14-20ms, you probably will be able to make music effectively. The more rhythmical the music, the better a lower number is.
The actual latency is a little more complicated to calculate [this section is revised as we have been able to run more tests with new equipment]. Several steps in the process increase the delay. Your distance to the microphone adds 1ms per 1.125ft. Your audio interface adds latency as it processes the sound and converts it from analog to digital. SoundJack has both processor overhead and one way network transit time, which are basically captured in the displayed estimate. And finally the receiving participant's audio interface introduces delay converting the digital signal from digital back to analogue.
Most consumer usb audio interfaces that advertise as “low latency” have a roundtrip latency below ~10ms at 64 samples. So to calculate the actual latency, take your audio interface latency plus two times the network one way latency (displayed in SoundJack) plus the other participant’s audio interface latency and divide by two.
As an example: the Focusrite Scarlett 2i2 has a real world (mic to headphones) roundtrip latency of around 6ms at 64 samples. If both parties have this audio interface and the estimated SoundJack latency is 18ms, that is:
(6+18+18+6)/2=24ms
We had to go through all that to arrive back at the latency plus one of the interfaces displayed being a good one way estimate.
Note that the way SoundJack appears to manage latency is a little more complex. As buffers rise, the estimated latency appears to be less accurate; the estimates appear to be under by more the higher the buffers are. However, in practice this does not matter as the overhead remains low at any settings that would produce fast enough connections to make synchronous music. Measuring actual latency is also complicated as both the localhost and mirror estimations appear to be closer to actual roundtrip latencies. We will continue to explore this. What matters most is what the connection feels like, so aim to get the numbers as low as possible, and use as fast an interface as you can, but fly by feel rather than the instruments.
If you want to dive deeper, consider this figure:
This is an impulse test carried out using the SoundJack east coast mirror with buffer settings of 64/512/4. A click was introduced into the mic with an external source. The initial click and the return of that click through headphones next to the mic was measured with a third device. Ta1 represents the initial click. Ta3 is when the click returned to the headphones after traveling to and from the east coast mirror. This represents the total travel time: the network round trip plus the latency of the audio device. (In this case only one audio device.)
The lower waveform was captured internally using Rogue Amoeba’s Loopback app. Ti1 is the moment the click reached the SoundJack loopback. Sorry, I know that I just used the same term twice to describe two different things. A reminder that the SoundJack loopback is when the data passes the sample buffer and is fed back to the headphones. This moment was captured internally via another app also called Loopback. Ta2 represents the moment that the loopedback (in the sense of SoundJack) audio reached the headphones. Notice again that there is a delay between Ti1 and Ta2 caused by the latency of the audio interface itself. Ti3 is the moment that the audio was returned to SoundJack (post jitter buffer). Ta3 is the moment that sound reached the headphone. Ti2 and Ti4 are echos caused by the setup itself and can be disregarded.
Worth further study is the interaction of the various buffers and the accuracy of the estimated network latency. At low settings, e.g. 64/128/1, the estimates are almost perfect. At much higher settings, e.g. 64/512/4, there appears to be an unaccounted for additional latency. We have not explored this thoroughly (only measures using the mirror servers) and will update this question as we are able. Within the range of settings from 64/128/1 to 64/256/3 the additional overhead does not appear to exceed around 4ms. So long as you avoid network buffers greater than 256 and jitter buffers greater than 3 or 4, you may disregard this.
Conclusions
I cannot think of a concise way to close this document, as the story of SoundJack and similar platforms is going to continue to unfold as more and more musicians start to use them. We will continue to update this document and cover more of the advanced features, both currently implemented and planned. However, for now, just go start to use it. See what you can do. Reestablish the ability to make music in real time in your daily life. Share your results. Inspire others to do the same. Go.