System-wide speakup

Is it possible to play audio 'globally' or system-wide in Linux? This article is the notes and ramblings of someone who tries to do that.

Background

The Linux kernel includes the Speakup screen reader. Unlike most desktop screen readers, Speakup will aid in reading the Linux virtual TTYs that you can switch to with Ctrl+Alt+F1 through Ctrl+Alt+F6 on most systems.

This functionality is vitally important for troubleshooting in a Linux system and running command line applications, which is unfortunately one of the more accessible experiences on Linux.

Speakup alone is not enough to implement a screen reader: It still needs some speech synthesizer to turn text and control signals in to audio. Once upon a time hardware synthesizers connected over serial were used to do this task, but as computers became more capable software synthesizers became viable.

Speakup supports software synthesizers by exposing a character device in /dev/softsynthu. A connector such as connects to this device and implements the synthesizer protocol and handles playback of speech. In practice the software synthesizer is espeakup which plays speech using eSpeak NG.

The problem

Linux is a multi-user system, but hardware and other resources are not. There's only one sound card and only one Speakup device. So these resources have to be shared somehow.

Since around 2007 with the release of ConsoleKit the Linux desktop has handled sharing resources using the idea of seats. To put it simply:

Devices get allocated to seats
Logging in gives you a session
Only one session can use a seat at a time
Switching sessions requires handing over devices to another session

The Speakup device has no support for sharing, but the sound card does.

This leads to the following situation:

Linux boots
systemd gives root the current audio device
systemd starts starts espeakup as root
You can read the login prompt or login as root
You log in as your own user
systemd gives your user the current audio device
Your PulseAudio instance claims the device
The root espeakup can no longer speak
You can no longer use Speakup
You might have no sound at all if PulseAudio couldn't claim the device if espeakup was still talking while PulseAudio was starting
You switch to another TTY using Ctrl+Alt+F2
systemd gives root the current audio device
Your PulseAudio instance frees the device
espeakup can talk again using the sound card

Because espeakup can only speak when root is using the current seat effectively becomes useless outside logging in as root on your computer.

It's also important to note that PulseAudio stores settings for volume and outputs. You might be using Bluetooth headphones at low volume but then switch to a root TTY and have espeakup blare loudly out your desktop speakers.

It really makes me wonder why PulseAudio doesn't have a per-seat instance that lets you switch between sessions and preserves audio configuration but still swaps out which user's applications can use audio.

Solutions

TODO: ?

Run espeakup as your user

- udev rule

- linger

- root can't use it

- no early boot access

Running PulseAudio system-wide

- pulseaudio runs at boot and all is well

https://www.freedesktop.org/wiki/Software/PulseAudio/Documentation/User/SystemWide/

- security!!

TODO: does this even work?

- no early boot access

Ideas

TODO: ?

Locking PulseAudio after boot

TODO: ...

Sharing Speakup

- have a proxy that sits between speakup and espeak

- send messages to espeakup instances based on current active UID

- during a switch between instances, wait for the current instance to finish talking OR the stop talking control is sent. then start feeding the new instance data

- have a shim that blocks pulseaudio from starting until it has permission, but also don't consume the buffer

i do not like how i'm basically reinventing flow control but poorly

ok so it turns out i was WRONG: you can't share speakup protocol between multiple synths! the protocol is stateful! ie if you tell it to change voice and switch synth the voice change won't be applied. YAY

Sharing espeakup

so this makes the only viable solution to send PCM data from a root espeakup instance.

on top of that this also means i have to modify espeakup to handle some flow control AND output to a buffer instead of the sound card