Anyone know open source models or papers for splitting audio (that contains human speech and other sounds) into two separate audio files, one containing human speech only and one containing other sounds?

I’ve been using Spleeter in my workflow for my own music!

Not sure about other audio source separation models

https://github.com/deezer/spleeter

Have you tried Audio Shake?

https://www.audioshake.ai

I’ve tried to find a good tool for this to help with speech synthesis to get the human speech track but didn’t find anything that worked well

https://speechbrain.github.io/

https://arxiv.org/abs/1611.06265

Demucs v4 from facebook research was the open source state of the art at this (last I checked).

Searchcaster