Whisper: Speak to Emacs and Have It Type Text for You

Install open-ai whisper

https://github.com/openai/whisper

pip install -U openai-whisper

Install ffmpeg

sudo apt update && sudo apt install ffmpeg

Install whisper.el

https://github.com/natrys/whisper.el

I have installed mine in /home/red/Source/whisper.el

My (old) configuration

Please note: You may also need to adjust the arguments below.

  ;; whisper configuration
(use-package whisper
  :load-path "/home/red/Source/whisper.el"
  :bind ("M-s r" . whisper-run)
  :config
  (setq whisper-model "base"
    whisper-language "en"
    whisper-translate nil)
  (setq whisper-arecord-device "hw:2,0")
  (setq whisper-arecord-args '("-f" "cd" "-c" "1")))

My Latest Configuration

The following code is a direct implementation using whisper.cpp binary with custom Elisp functions. whisper.el is not required.

Install steps:

# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make

# Download model
./models/download-ggml-model.sh base.en

# Verify sox is installed
which sox  # Should show path

Add this code to your init.el:

;;; my-whisper.el --- My speech to text
;; Copyright (C) 2025  Raoul Comninos
;; Author: Raoul Comninos
;; Keywords: whsiper, speech, speech-to-text
;;; Code:

(defun run-whisper-stt ()
  "Record audio and transcribe it using Whisper, inserting text at cursor position."
  (interactive)
  (let* ((original-buf (current-buffer))
         (original-point (point-marker))  ; Marker tracks position even if buffer changes
         (wav-file "/tmp/whisper-recording.wav")
         (temp-buf (generate-new-buffer " *Whisper Temp*")))

    ;; Start recording audio
    (start-process "record-audio" nil "/bin/sh" "-c"
                   (format "sox -d -r 16000 -c 1 -b 16 %s --no-show-progress 2>/dev/null" wav-file))
    ;; Inform user recording has started
    (message "Recording started. Press C-g to stop.")
    ;; Wait for user to stop (C-g)
    (condition-case nil
        (while t (sit-for 1))
      (quit (interrupt-process "record-audio")))

    ;; Run Whisper STT
    (let ((proc (start-process "whisper-stt" temp-buf "/bin/sh" "-c"
                               (format "~/whisper.cpp/build/bin/whisper-cli -m ~/whisper.cpp/models/ggml-base.en.bin -f %s -nt -np 2>/dev/null"
                                       wav-file))))
      ;; Properly capture `temp-buf` using a lambda
      (set-process-sentinel
       proc
       `(lambda (proc event)
          (when (string= event "finished\n")
            (when (buffer-live-p ,temp-buf)
              (let* ((output (string-trim (with-current-buffer ,temp-buf (buffer-string))))) ;; Trim excess whitespace
                (when (buffer-live-p ,original-buf)
                  (with-current-buffer ,original-buf
                    (goto-char ,original-point)
                    (insert output " ")  ;; Insert text with a single space after
                    (goto-char (point))))) ;; Move cursor to end of inserted text
              ;; Clean up temporary buffer
              (kill-buffer ,temp-buf))))))))

(global-set-key (kbd "C-c v") 'run-whisper-stt)

Use C-c v to start recording and C-g to stop.

Return to Home