I think this is going now too far away from the frontiers of my knowledge as I don’t understand most of this hehe. I think because the plugin still works fine, I’ll continue development for now without fixing this and maybe have a deeper look when/if it becomes a problem.
Said that, looking at the stack trace it occurred to me that maybe mode switched are happening because tracktion engine is trying to parallelize some tasks and the RT kernel does not support that. Do you think this makes any sense? Because if it does, I guess there’s some easy way to tell tracktion not to parallelize.
I was able to compile libusb as part of my project, pretty easily
Now doing some tests deploying my app in elk board. things don’t work so far as expected, seems to hang, but I need to do further investigation before posting any specific errors. My app is trying to communicate with push using USB midi ports and using JUCE’s MidiInput::openDevice functions. I seem to recall you mentioning this could be a problem? (my app works fine when compiled in macos and with raspbian)
Also another question I have is what part of the plugin is run in the RT thread and what part in the rest. Is it only JUCE’s PluginProcessor::processBlock that happens in RT kernel?
Anything that uses “normal” Posix threads for parallelization inside an RT thread will definitely cause mode switches. The solution is either disable them as you suggested, try to implement it using TWINE’s worker thread pool or use Xenomai’s Cobalt APIs to convert Posix calls.
The issue might be there if a plugin opens up MIDI ports, which should be the host’s responsibility. I think you should be able to debug and test this using the AppImage of SUSHI for normal Linux, too.
Correct, that’s the only callback in a plugin that runs in the RT context.
Thanks, I’ll continue investigating and try to narrow down the issues, maybe opening new forum threads if needed. I’ll also publish code and instructions to run tracktion engine on ELK as soon as I get in a position to do so.
Thanks, I’ll let you know. I’m doing some progress, but I’m stumbling across weird behaviours like ELK AudioOS hanging the 2nd time in a row I run my app, but working in the first (and stuff like that). Also there are some kernel mode switches (MSW) that need to be fixed. I set tracktion engine to use a single thread, and now the MSW have reduced a lot, but still there are some I need to debug.
I think I’m getting closer but still more work to be done
Actually @Stefano, about this issue of AudioOS hanging the 2nd time I run sushi with my plugin, do you have any guess about what might be happening? Clearly my app is quitting and leaving some stuff in a weird state… is there any command to “reset” audio hardware I could run as a tests between sushi runs (might be related to Distorted/noisy audio)? you ever found anything similar to that?
I’ll dot that (maybe tonight if I get some time). I’m testing with 2 tracktion engine based projects, a simpler one and another one a bit more complex (not too much). It only happens with one of them. In any case I’ll do more tests and share the output of sudo mesg. Thanks!
I tried the sudo dmesg but I don’t get too far because when ELK hangs I can not run it anymore. If I have it running before (in watch mode) I see nothing special. In any case, I see audio open and audio close messages where it makes sense:
[Apr28 17:09] audio_rtdm: audio_open. - here I run sushi
[ +4.664678] rtdm_event_wait failed - here I hit ctrl+c to stop sushi
[ +0.007131] audio_rtdm: audio_close.
[ +4.159085] audio_rtdm: audio_open. - here I run sushi for the second time
- here OS freezes and I have to power off
This is one issue, but for this one I’m not super worried now. The other issue is what has to do with the MSW. I can see MSW growing with watch -n 0.5 cat /proc/xenomai/sched/stat although the plugin works. But after a while it crashes/hangs OS sometimes. I have the feeling this could be caused by too many MSWs? In any case @Stefano said MSW should be completely avoided so I continued my investigations. Here are some things I learned so far:
Running the tracktion EngineInPluginDemo project (with a small modification of adding a background audio file), I get many MSW, at a rate of “hundreds” per second.
If I configure tracktion engine to do audio computations in a single thread, the number of MSW is reduced a lot, now grows at a rate of ~5 per second. I found out about this potential problem with
If I use gdb to catch catch signal SIGXCPU the debugger seems to stop execution in different points. Here are the code functions I see triggered in the backtrace:
There are probably more, but these are the different ones I came across. Originally there was one related to parallelisation (the one I reported above in this thread), but was fixed by setting num threads to 1 for the tracktion engine.
I can find other time functions in JUCE codebase that call gettimeofday, but I don’t know if this is a problem.
I’m not sure how to continue to fix the MSWs. I guess first one would be to replace nanoseconds system call for something RT safe, but I don’t know how to do that using twine. Then if the juce::ScopedLocks are a problem, I don’t know what to do with them, maybe talk to JUCE/tracktion engine guys.
Hmmm looks like a clean use of tracktion engine in ELK is no going to be easy at all…
Crashes of this kind are usually related to use of timer-related functions in the RT thread, from our experience. Having many MSWs will produce audio dropouts but shouldn’t crash the system.
the timer call here looks the most probable candidate for the crashes…
If called from a RT context it is absolutely an issue.
The function to call to get current time in nanoseconds with TWINE is twine::current_rt_time().
Locks are not possible in a RT thread, period. They shouldn’t be used in a Desktop application, either but probably here the Tracktion team (who know all those issues very well, their latest ADC talks are the best reference ever for the topic) might have been using them between two RT threads having the same priority, which will be fine in a normal OS.
Thanks @Stefano, I thought get_current_rt_time was a sort of replacement for clock_gettime (CLOCK_MONOTONIC, &t); (sorry I’m really noob in this hehe). Also I’ll talk to the tracktion guys for the locks thing. However, I double checked my text and saw I was wrong with the nanoseconds thing. It is not nanoseconds but nanosleep the function that I need to patch from juce:
Also, is there a way to run sushi in the ELK board and all audio ins/outs without using the RT kernel? Would be useful for testing. I imagine the downside would be non guaranteed performance because the RT kernel would not be there with priority, etc, but that would be completely fine for testing purposes.
Ok, we don’t have any sleep equivalent mapped in TWINE yet - typically RT threads never sleep but there is a use case for e.g. spinlocks.
If you want to implement it yourself quickly, take a look at the Xenomai / Coblat headers that are included in the TWINE’s implementation files and then simply replace nanosleep with __cobalt_nanosleep. That should do the job if you are already linking against TWINE, I think…
Not easily on the Elk Pi shield. There is a ALSA driver for that codec but you’d need to tweak it for our CPLD and other things.
The easiest way to compare on the same Hardware Elk w/ Xenomai VS Linux ALSA w/ PREEMPT_RT could be to get a HiFiBerry shield, that we just supported and for which there are already ALSA / PREEMPT_RT distributions that can be set up relatively easily.
Thanks, I’ll try the nanosleep thing.
About the testing without RT I don’t think HiFiBerry option would work because I’m interested in the multichannel audio i/o (and midi). In fact, I could test using raspbain (and of course without ELK pi hat) and my app works perfectly, including communication with ableton’s push 2, drawing on screen, and tracktion engine as a plugin. but then of course I don’t have multichannel audio i/o. Let’s hope I can fix all RT issues and have it working nicely in ELK board which is the goal
Thanks for the super quick response @Stefano. And just to re-confirm, locks are bad and will cause MSW but should not make the app crash or anything like that right? Will just reduce “performance”. Because now I seem to be hitting some code using locks in the RT thread which seems to work fine (although I see MSW being triggered, at an acceptable rate probably).
I promise it will all be worth at the end because I’ll make it work and others will also benefit form that
FYI, I’m still struggling with the same problems, looks like there are file locks which sushi does not like and well, I need to fix. Nevertheless I published my code so far and the app is usable. I made some scripts to facilitate cross-compilation on macOS which might be interesting as well. All is in the repo. https://github.com/ffont/octopush