Anyone know open source models or papers for splitting audio (that contains human speech and other sounds) into two separate audio files, one containing human speech only and one containing other sounds?
I’ve been using Spleeter in my workflow for my own music! Not sure about other audio source separation models https://github.com/deezer/spleeter
I’ve tried to find a good tool for this to help with speech synthesis to get the human speech track but didn’t find anything that worked well
Demucs v4 from facebook research was the open source state of the art at this (last I checked).