Whisper: Speak to Emacs and Have It Type Text for You
Install open-ai whisper
https://github.com/openai/whisper
pip install -U openai-whisper
Install ffmpeg
sudo apt update && sudo apt install ffmpeg
Install whisper.el
https://github.com/natrys/whisper.el
I have installed mine in /home/red/Source/whisper.el
My (old) configuration
Please note: You may also need to adjust the arguments below.
;; whisper configuration (use-package whisper :load-path "/home/red/Source/whisper.el" :bind ("M-s r" . whisper-run) :config (setq whisper-model "base" whisper-language "en" whisper-translate nil) (setq whisper-arecord-device "hw:2,0") (setq whisper-arecord-args '("-f" "cd" "-c" "1")))
My Latest Configuration
The following code is a direct implementation using whisper.cpp
binary with custom Elisp functions. whisper.el
is not required.
Install steps:
# Clone and build whisper.cpp git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp make # Download model ./models/download-ggml-model.sh base.en # Verify sox is installed which sox # Should show path
Add this code to your init.el
:
;;; my-whisper.el --- My speech to text ;; Copyright (C) 2025 Raoul Comninos ;; Author: Raoul Comninos ;; Keywords: whsiper, speech, speech-to-text ;;; Code: (defun run-whisper-stt () "Record audio and transcribe it using Whisper, inserting text at cursor position." (interactive) (let* ((original-buf (current-buffer)) (original-point (point-marker)) ; Marker tracks position even if buffer changes (wav-file "/tmp/whisper-recording.wav") (temp-buf (generate-new-buffer " *Whisper Temp*"))) ;; Start recording audio (start-process "record-audio" nil "/bin/sh" "-c" (format "sox -d -r 16000 -c 1 -b 16 %s --no-show-progress 2>/dev/null" wav-file)) ;; Inform user recording has started (message "Recording started. Press C-g to stop.") ;; Wait for user to stop (C-g) (condition-case nil (while t (sit-for 1)) (quit (interrupt-process "record-audio"))) ;; Run Whisper STT (let ((proc (start-process "whisper-stt" temp-buf "/bin/sh" "-c" (format "~/whisper.cpp/build/bin/whisper-cli -m ~/whisper.cpp/models/ggml-base.en.bin -f %s -nt -np 2>/dev/null" wav-file)))) ;; Properly capture `temp-buf` using a lambda (set-process-sentinel proc `(lambda (proc event) (when (string= event "finished\n") (when (buffer-live-p ,temp-buf) (let* ((output (string-trim (with-current-buffer ,temp-buf (buffer-string))))) ;; Trim excess whitespace (when (buffer-live-p ,original-buf) (with-current-buffer ,original-buf (goto-char ,original-point) (insert output " ") ;; Insert text with a single space after (goto-char (point))))) ;; Move cursor to end of inserted text ;; Clean up temporary buffer (kill-buffer ,temp-buf)))))))) (global-set-key (kbd "C-c v") 'run-whisper-stt)
Use C-c v
to start recording and C-g
to stop.