Reason

Hardware Setup

  1. Install Raspberry https://www.raspberrypi.com/software/
  2. Burn the image to an SD card
  3. Boot the Raspberry Pi
  4. Connect to the internet
  5. Open a terminal

Software Setup

# Update the system
sudo apt update && sudo apt upgrade -y

# Install dependencies
sudo apt install portaudio19-dev \
                  python3-poetry \
                  libsdl2-dev \
                  vim \
                  neovim \
                  libssl-dev \
                  liblzma-dev \
                  cmake -y

# Python dev deps
sudo apt install make \
          build-essential \
          libssl-dev \
          zlib1g-dev \
          libbz2-dev \
          libreadline-dev \
          libsqlite3-dev \
          wget \
          curl \
          llvm \
          libncursesw5-dev \
          xz-utils \
          tk-dev \
          libxml2-dev \
          libxmlsec1-dev \
          libffi-dev \
          jackd2 \
          qjackctl \
          screen \
          liblzma-dev -y

# Ensures Jack works
sudo usermod -aG audio $(whoami)

curl https://pyenv.run | bash

# Add the following to `~/.bashrc`:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init - bash)"' >> ~/.bashrc
source ~/.bashrc

Add the following to ~/.bashrc :

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init - bash)"' >> ~/.bashrc
source ~/.bashrc

Links

  • https://github.com/ggerganov/whisper.cpp/issues/10
  • https://github.com/ggerganov/whisper.cpp
  • https://ahmetoner.com/whisper-asr-webservice/run/
  • https://github.com/floneum/floneum/tree/main/interfaces/kalosm (Rust)

Speech Node

Setup from Ubuntu server:

sudo apt-get update && sudo apt-get upgrade -y
sudo ufw allow ssh

Reads from Speech Node

speech-node

A text-to-speech server to inclusion in AI pipelines

Links

  • https://github.com/edwko/OuteTTS/blob/main/docs/interface_v2_usage.md
  • https://github.com/edwko/OuteTTS
  • https://huggingface.co/hexgrad/Kokoro-82M
  • https://github.com/spotify/pedalboard/blob/master/examples/streaming_encode_mp3.py

Python Audio Libraries

  • https://github.com/librosa/librosa
  • https:// github .com/bastibe/python-soundfile

Voices

  • https://publicdomainreview.org/collection/orson-welles-show-1941/
  • https://librivox.org/

Dependencies

Bark Specific

I had to onlly the patch linked here: - https://github.com/suno-ai/bark/issues/626

Linux

Ubuntu

sudo apt update
sudo apt install libglslang-dev

Manjaro

sudo pacman -S ffmpeg glslang

# Check for version mismatch
find /usr -name "libglslang-default-resource-limits.so*"
# If version mismatch
sudo ln -s /usr/lib/libglslang-default-resource-limits.so.15 /usr/lib/libglslang-default-resource-limits.so.14

# Check for version mismatch
find /usr -name "libSPIRV.so*"
# If version mismatch

sudo ldconfig

If NVIDIA is not working:

sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm

MacOS

brew install ffmpeg
brew install glslang

TTS Models

  • https://huggingface.co/hexgrad/Kokoro-82M#releases

Other Models

Upsampling

  • https://github.com/ming024/FastSpeech2?tab=readme-ov-file
  • https://rhasspy.github.io/piper-samples/

Headless Install on Raspbian

Please note, when trying to run poetry install it may appear to hang in a headless install. This is caused by the OS prompting the user on the graphical desktop to enter the keyring password. To bypass it, run the following before:

export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
# Then
poetry install

See https://github.com/python-poetry/poetry/issues/8623#issuecomment-1793624371 and https://github.com/explosion/spaCy/issues/6021 * For spacy , one should run:

BLIS_ARCH="generic" poetry add spacy

This ensures blis , a spacy dependency, can be built on arm.