Hanami live translator that captures any audio that comes from a WINDOWS speaker and microphones. It uses lightweight multiprocessing to process audio in chunks, taking about 3 – 5 seconds per chunk. The app uses soundcard to capture the audio signal, SpeechRecognition to convert binary audio to text, and selenium to simulate web calls for deepl servers without API calls. The app has three modes: listening, red (no audio detected), and green (audio captured correctly). It has features such as day/night mode, pinning, and a translation navigator.
#speech to text