Is it possible for you to share some measurements or information on the latency on the hardware? For example, I’d expect the audio CODEC to add up at least 1ms to the audio chain.
I understand the 1ms latency mentioned on the website is due to the audio engine only, right? Does it also consider the ping pong buffering and, if so, what is the block length? I was wondering if this is configurable to the user as well.
It is clear that the CPU load and memory consumption is dependent on the HW to which Elk OS is ported. However, could you please share some numbers/estimation on the resource consumption on the Elk Board?
The formula for the digital latency on the RPi3 is:
total_latency = (2 * BUFFER_SIZE + 4) / Fs
The system can run down to 16 samples on the RPi3, which at 48kHz gives you 0.75ms of round-trip digital latency. On other boards we are able to go down to 8 samples.
The audio codec we put on the Elk Pi board is currently configured in a way that will add around 1ms of extra I/O latency at 48kHz. It’s possible to tune it so that it will go lower, not sure now exactly how much. Other codecs can go down to 0.3-0.5ms but usually there’s a tradeoff between latency and quality of the reconstruction filters.
Regarding CPU usage etc., we want to prepare a nice blog post with comparison between ALSA / PREEMPT_RT and Elk / Xenomai for the Raspberry Pi 3. But it will take us time to do a “fair comparison” making sure that all the other variables are the same - it’s very easy to just run an unoptimized Raspbian and make Elk shine with an unfair comparison
The 2 * BUFFER_SIZE I can understand, but I still wonder where the additional 4 samples come from You mentioned the minimum block size. Is it possible to set it in Sushi or does one need to rebuild the framework? What is the default value for it?
Looking forward to this blog post! It is surely interesting to have a comparison. However, maybe you have some absolute numbers for the standard Elk Board configuration? This would give us an idea on how many filters, delays, compressors, etc… could be used, as well as how much resource the audio engine alone takes.
The extra 4 samples come from some internal requirements for being able to run the driver with DMA transfer for best CPU efficiency and they are related to low-level stuff in the FIFOs of the serial audio controller of the Broadcom SOC. You can easily set up the board to run at 16, 32, 64 or 128 samples buffer size without rebuilding the image. The default is 64 samples.
That is a hard question to answer because really it depends on which filters, which delays, which compressors, etc. The audio engine alone takes 4% CPU (of a single core) to handle the 8 channels and the rest. To give you some other numbers, the MDA JX-10 synthesizer takes around 10% of a single core to run at full polyphony. Other synths are way more CPU expensive, same things with FX, the performance can vary a lot.
We took a lot of effort into optimizing the toolchain and preparing a 64-bit distro which gives you better performance on the RPi3. 64-bit on the Pi3 with audio running was not easy indeed