Reason
Hardware Setup
- Install Raspberry https://www.raspberrypi.com/software/
- Burn the image to an SD card
- Boot the Raspberry Pi
- Connect to the internet
- Open a terminal
Software Setup
# Update the system
sudo apt update && sudo apt upgrade -y
# Install dependencies
sudo apt install portaudio19-dev \
python3-poetry \
libsdl2-dev \
vim \
neovim \
libssl-dev \
liblzma-dev \
cmake -y
# Python dev deps
sudo apt install make \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
wget \
curl \
llvm \
libncursesw5-dev \
xz-utils \
tk-dev \
libxml2-dev \
libxmlsec1-dev \
libffi-dev \
jackd2 \
qjackctl \
screen \
liblzma-dev -y
# Ensures Jack works
sudo usermod -aG audio $(whoami)
curl https://pyenv.run | bash
# Add the following to `~/.bashrc`:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init - bash)"' >> ~/.bashrc
source ~/.bashrc
Add the following to
~/.bashrc
:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init - bash)"' >> ~/.bashrc
source ~/.bashrc
Links
- https://github.com/ggerganov/whisper.cpp/issues/10
- https://github.com/ggerganov/whisper.cpp
- https://ahmetoner.com/whisper-asr-webservice/run/
- https://github.com/floneum/floneum/tree/main/interfaces/kalosm (Rust)
Speech Node
Setup from Ubuntu server:
sudo apt-get update && sudo apt-get upgrade -y
sudo ufw allow ssh
Reads from Speech Node
speech-node
A text-to-speech server to inclusion in AI pipelines
Links
- https://github.com/edwko/OuteTTS/blob/main/docs/interface_v2_usage.md
- https://github.com/edwko/OuteTTS
- https://huggingface.co/hexgrad/Kokoro-82M
- https://github.com/spotify/pedalboard/blob/master/examples/streaming_encode_mp3.py
Python Audio Libraries
- https://github.com/librosa/librosa
- https:// github .com/bastibe/python-soundfile
Voices
- https://publicdomainreview.org/collection/orson-welles-show-1941/
- https://librivox.org/
Dependencies
Bark Specific
I had to onlly the patch linked here: - https://github.com/suno-ai/bark/issues/626
Linux
Ubuntu
sudo apt update
sudo apt install libglslang-dev
Manjaro
sudo pacman -S ffmpeg glslang
# Check for version mismatch
find /usr -name "libglslang-default-resource-limits.so*"
# If version mismatch
sudo ln -s /usr/lib/libglslang-default-resource-limits.so.15 /usr/lib/libglslang-default-resource-limits.so.14
# Check for version mismatch
find /usr -name "libSPIRV.so*"
# If version mismatch
sudo ldconfig
If NVIDIA is not working:
sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm
MacOS
brew install ffmpeg
brew install glslang
TTS Models
- https://huggingface.co/hexgrad/Kokoro-82M#releases
Other Models
Upsampling
- https://github.com/ming024/FastSpeech2?tab=readme-ov-file
- https://rhasspy.github.io/piper-samples/
Headless Install on Raspbian
Please note, when trying to run
poetry install
it may appear to hang in a headless install. This is caused by the OS prompting the user on the graphical desktop to enter the keyring password. To bypass it, run the following before:
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
# Then
poetry install
See https://github.com/python-poetry/poetry/issues/8623#issuecomment-1793624371
and https://github.com/explosion/spaCy/issues/6021
* For
spacy
, one should run:
BLIS_ARCH="generic" poetry add spacy
This ensures
blis
, a
spacy
dependency, can be built on arm.