This video demonstrates the current state of my real-time neural audio synthesis project. After training on 100s of hours of MIDI / audio pairs the model can generate high quality audio with extremely low CPU resource utilization at 0.5% on typical consumer hardware with some optimisations still to be applied. This model is a hybrid of DDSP with a subtractive synthesis method inspired by neural diffusion, additionally I've modified the loss calculation with the use of the fractional Fourier transform which balances the time/frequency penalty at a reduced computational cost. This is the first of a kind demonstration showing MIDI-controllable real-time neural audio synthesis of a polyphonic musical instrument. Visualisation is performed using my work-in-progress game engine. From left to right: - Midi vector timeseries below and final audio mix above - Piano showing the oscillators represented as individual strings - A plot of the recurrent neural network states - Noise (via diffusion) and oscillator parameters, with isolated audio waveforms above each plot Version with voiceover for OpenAI Startup Fund application: These results were achieved towards the end of 2020 but I've only just visualised and publicised them now. I'm actively seeking seed funding, business partners and collaborators so do reach out if you're interested at @
Hide player controls
Hide resume playing