Should one wonder, from the number of acronyms/technologies mentioned in this article, there actually is a core theme from which the others sprang and this is ASIO (the audio-related one from Steinberg, not the network-related Boost library).
Having a passion for demanding low-level C++ programming and high fidelity digital audio, fiddling with ASIO seemed like a good idea to test and improve my skills. This is not a purely theoretical article; there is an actual binary product of the following story, a working ASIO output plugin for WinAMP. Achieving that goal while trying not to just reinvent the wheel or repeat existing solutions but to do it better and/or differently makes the meat of this article and the source code behind it.
The sad reality of most libraries, that there is always ‘something wrong with them’ (be it a matter of objective flaws or subjective preferences), can actually turn out to be a ‘positive’ driving force in learning projects like this one. The most important ‘flaw,’ which inspired this project and the wish to overcome it, was the official statement of the ASIO SDK that it does not support multiple (active) ASIO drivers. The second, or better, a ‘parallel’ one, is the ugliness and the bad/outdated/’typical C-style’ design of most APIs ‘out there’ or, on the other hand, the bloatiness of many, more modernly designed, libraries that usually reek of the ‘oh who cares, today’s computers are fast enough’ mentality. I tried to test and prove that the two poles are not the only available approaches and that code that is both efficient AND follows modern design patterns and idioms, or in other words, that is relatively easy to read and handle for both the developer AND the hardware, is actually possible. For the said reason you’ll see me ignoring Knuth, Hoare, and Sutter and doing the ‘root of all evil’, with pleasure. But before you dive into it all, a ‘little’ background is in order…
In short, ASIO is an acronym for Audio Stream Input Output and is a technology developed and publicly released by Steinberg Media Technologies GmbH that tries to overcome inherent flaws in previous APIs for audio streaming that prevented the creation of low latency, sample accurate software applications for the personal computer (the sample accurate part of course being more important for simple playback applications like this one). This article supposes that the reader already has some knowledge of the API and the terms related to it.
The ‘effective’ side of the story
Give me my this
As mentioned in the introduction, one of the first things that I noticed in the SDK documentation was the one about the ‘unfortunate’ support for only a single active device/driver. Looking closely, two culprits are identified. The first one was that the buffer switch callbacks that typically lack the ‘void * pUserData‘ ’emulation’ of the this pointer and the other one being the implementation of the SDK itself that, in typical C fashion, use a global variable to store the pointer to the active driver’s COM interface.
The first problem is not directly solvable because it is part of the binary specification of the API, so a workaround must be used. As each driver/device creates its own thread in which the buffer switch callbacks run, the usual and obvious method is to use thread-local storage to store a this pointer (to an ASIO device instance). For this ‘thread local singleton’ pattern of sorts, the TLSObject<> class is used. That class only wraps the TLS implementation and synchronization of the set-global, then the set-thread-local usage pattern, and does not solve the problem of when to set the TLS value. If you discard the option of checking whether the TLS pointer is set every time in the buffer switch callback and setting it if it is not, you are left with two solutions. One is to use the ‘works for everything’ approach and hook the CreateThread function kernel32.dll import in the driver’s DLL. Theoretically, this is not perfectly bullet-proof as the driver’s DLL; theoretically, it need not be a dedicated COM server and host only the driver so you could intercept unwanted CreateThread calls. This method uses the ImportsHooker class (which wraps the API/imports hooking code) and sets the TLS value in a CreateThread pass-through helper function.
The other solution is provided by the way ASIO callbacks are set/passed to the driver (you fill a struct of function pointers and pass it to the driver). The window of opportunity comes from the fact that some drivers do not copy this struct but only the pointer to the struct. This does cost the driver an additional pointer dereference on each callback call and us/’hosts’ the fuss of having to save the struct for the entire duration of the playback, but it also provides the possibility to switch/change the callbacks at runtime while the playback thread is active. This is used to start the playback with helper pass through callbacks that set the TLS value the first time they are invoked and then switch the callback pointers to point to the proper buffer switch functions. This method, of course, does not work for drivers that copy the ASIOCallbacks struct.
Only later in the development did the idea of thunks come to my attention and this will probably be the solution that will be implemented and used in the next release because it seems both simpler to use and more efficient (maybe not so compared to a native TLS implementation, but unfortunately a native TLS cannot be used anyway on pre-Vista Windows in DLL projects like this one).
The second problem, related to the ASIO SDK, is solvable by bypassing/discarding the ASIO SDK’s C wrappers around the IASIO COM interface, using the COM interface directly and/or wrapping it properly in a class. Because of other flaws in the original ASIO SDK, like the rather ugly driver list that only ‘reinvents the wheel’ (std::list), I decided to completely discard it and try to make a better one from scratch, using only enums and type definitions from the original SDK.
WA <-> ASIO interaction
Before going into the details and usage of the ‘ASIO++ SDK’, you should first look at the background of the rest of the Wasiona code. Most of the meat comes from the interaction with and between the WinAMP SDK (‘yet another ugly C API’) and the ASIO(++) SDK. As one might expect, the two do not just fit in nicely with each other but require a certain level of adaptation. The biggest problem comes from the fact that WinAMP uses the outdated system of polling. It (the input plugin) constantly polls the output plugin (‘us’) “How much can I write?” and then writes if the size returned by the CanWrite() callback is big enough or sleeps for an unspecified number of milliseconds if it’s not and polls again. This of course does not work properly with ASIO buffer switch callbacks (or any low latency use, for that matter) because they must return within latency-number-of-milliseconds and that can be as low as one millisecond, far shorter than the usual sleep period used by input plugins in their polling and writing loop.
Secondly, the WinAMP write callback uses the interleaved single-buffer format to transfer the samples whereas the ASIO API uses a separate buffer for each audio channel, thus requiring a deinterleaving process. And thirdly, ASIO devices/drivers have a fixed output sample type so a sample type conversion process must also take place (for example, 16-bit integer to 32-bit float), ‘exotic’ sample rates also might require resampling. The usual unhappy solution for all three problems is, of course, to introduce a new buffer between the two, doing all the conversions necessary in the WinAMP Write() callback and saving the completely processed data into the intermediate buffer minimizing the work needed in the buffer switch callbacks.
As in all audio applications, the intermediate buffer, of course, needs to be a circular buffer. The implementation chosen here (wrapped/provided by the MultiChannelBuffer class) does not use double pointers (for example, like DirectSound) because this incurs additional if-then clauses/code paths and doing the copying in two passes and that diminishes the potential and benefits of using vector/SIMD instructions to process/copy the data. Instead, it relies on using a buffer size several times larger than the chunks being read/written and then copying/moving the remaining one or two ‘small’ chunks at the end of the buffer to the beginning of the buffer when the end of the buffer is reached. This uses more memory at the benefit of CPU time as the extra memcpy() call (per channel) should be negligible if it occurs rare enough (the total buffer size is much larger than the chunks being read/written) and the benefit is of always having a linear/’unbroken’ buffer ‘ahead’. By adjusting the buffer size, therefore, you adjust the tradeoff between memory and CPU usage. A single physical buffer is used for all channel buffers (sizeof() = sizeof( channel ) * numberOfChannels) to improve locality of reference.
However, it turned out that this design (with a single pointer and a ‘sporadic’ memcpy() instead of two pointers) has a problem with WinAMP and its visualization system. It seems that there is some odd coupling between the visualization timing and the ‘timing’ of the stream of writes from the input plugin to the output plugin. In the described MultiChannelBuffer (MCB) design, this stream does not ‘flow’ at a constant pace. Instead, what happens is that the input plugin fills the MCB relatively quickly and then waits (as the MCB denies any further writing) until the data is consumed and the MCB does the move to the beginning, at which point the (almost) entire buffer becomes available again for the input plugin to fill, which is exactly what it does, again in a relatively short time. So, periods of close together writes alternate with idle periods and that seems to ‘confuse’ WA’s vis logic and it starts to stutter. Currently, a quick workaround has been added to the CanWrite() callback that does not allow any further writing if written data is more than 50 ms ahead of played data; this keeps the written time and output time values close enough so that visualization is smooth again.
Having multiple ASIO devices means that the buffer will be read from by more than one reader/consumer from different threads at different moments and in differently sized chunks. This requires special logic to keep track of the state of the buffer (the read part, the written part, and the unused part) and to solve issues with access from different threads. The problem of multiple readers is modelled with a MultiChannelBuffer::Reader member class through which the state/position of each reader is tracked (in a relation similar to that of an iterator and a container in the secure version of Dinkumware’s STL) so that each reader can access its unread data and the buffer can know which data has been read by all readers (and is safe to overwrite).
There is also a part of every WinAMP plugin, that as such is not directly connected to ASIO; that is the (configuration) GUI. While investigating lightweight/efficient solutions, I decided to give WTL a try. Having the potential to be what MFC should have been, I had high hopes for it but it did kind of disappoint me at the end. It looks as if ‘compatibility’/similarity to MFC was a higher priority than modern design, so it has dragged in some of the ugliness on the design level. It still uses ATL constructs instead of STL, non-encapsulated public member access, (leftover) pointers instead of references (and similar C-style legacy), predeclared global ‘module objects’, and it (as do, of course, most other libraries) makes certain, non-configurable, design-vs-efficiency choices that are not necessarily ‘good for everyone’ or needed for smaller projects like this one (ranging from things like ‘manually’ precreating property sheet pages, always switching code paths depending on the state of a window even if that is compile-time knowledge to the developer to bringing in SEH, ATL container code for various COM, window, ATL etc. data, virtual functions, critical sections, and custom memory allocators.
As is understandable/normal, it also has certain bugs that also affected this project. For instance, the AtlCreateSimpleToolBar() / CFrameWindowImplBase<>::CreateSimpleToolBarCtrl() function does not work correctly with vertical toolbars (it does not properly set the button wrap state) but the most important issue was to get the PropertySheet ‘control’ to work as a dialog box child (and afterwards to also make it use the XP style white background). Fixes for these problems as well as other GUI/WTL helpers and/or ‘nicer’ implementations can be found in the functions and classes in the GUI module. I will not clutter this text with the details as I think the code and its usage should be clear enough from the interface, comments and usage in the ConfigDlg module.
Finally, for the more important issues at least, there was the problem of volume control. ASIO drivers are not obliged to support the kAsioSetOutputGain ‘futureCall()‘, so an alternate method must be used to change the volume. Windows provides the waveOut/MME and mixerAPI APIs and waveOutSetVolume() seemed like the ‘default’ choice but it turns out that some (ASIO) sound cards drivers do not support the legacy MME API so those functions have no effect with them (even the Windows volume control does not function properly). Using the mixerAPI to control the main volume in such cases can be the ‘alternate alternate’ solution. So, I included all three options (MME, mixerAPI, and ASIO) from which the user can choose which API to use to control volume and panning. Unfortunately, the mixerAPI is one of the ugliest APIs I had to work with and I could not get the panning to work with it no matter what (if anyone has managed to do it, I will appreciate the help/input). The original idea was to allow configuring the volume setting method for each chosen device independently, but as it turned out, that is no easy task to accomplish. As the name of an ASIO device/driver, in theory, does not have to resemble the MME device/driver name and because an ASIO device does not have to have a MME driver at all or can have more than one MME driver or it can even be an ‘ASIO emulator’ for more than one device simultaneously (like ASIO4All) it follows that there is no deterministic link that would allow deducing an ASIO driver’s corresponding MME driver to be used for volume control. For the said reason the current, suboptimal and hopefully temporary solution is that the waveOut and mixerAPI methods control the volume of (only) the default Windows audio device.