Compare commits

...

34 Commits

Author SHA1 Message Date
97733cf7b8 fix: use navidrome-v8 and pin to opti1 for migration
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 44s
2026-04-28 13:28:35 -07:00
5c1fedd379 fix: use file-bind-mount for DB to allow local WAL files
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 43s
2026-04-28 13:14:58 -07:00
bb18672bfc fix: use local DataFolder with symlinks to LiteFS DB
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 39s
2026-04-28 13:08:56 -07:00
48a005cfbc fix: add auto-seeding from backup
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 41s
2026-04-28 12:25:01 -07:00
94d8e290bf fix(entrypoint): wait for DB file before bind mounting
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 4m0s
2026-04-28 11:33:21 -07:00
3232d6568d fix: use bind mount for DB to support SMB shares
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 47s
2026-04-27 14:24:16 -07:00
1117fb178b fix: use symlink for DB and move DataFolder to /shared_data to avoid LiteFS root write errors
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 46s
2026-04-27 14:14:11 -07:00
e678120572 fix: revert to original data paths and add ND_ARTISTIMAGEFOLDER
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 43s
2026-04-27 14:07:08 -07:00
92f9209dcd fix(entrypoint): restore consul registration and cleanup logging
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 43s
2026-04-27 11:04:00 -07:00
33b84be0a5 test(entrypoint): use local data folder and new DB name
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 41s
2026-04-27 10:36:54 -07:00
45e40bf273 debug(entrypoint): add logging to check_primary
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 40s
2026-04-27 10:23:30 -07:00
8acb098918 fix(litefs): increase consul lease TTL and lock-delay
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 41s
2026-04-27 10:19:36 -07:00
dd413d1342 fix(cluster): use new litefs key and local volume, exclude odroid7
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 41s
2026-04-27 10:15:49 -07:00
7ea127f9cb test(entrypoint): disable consul registration to isolate leadership issue
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 40s
2026-04-27 10:10:23 -07:00
9232aeccc5 test(entrypoint): use /data/navidrome.db to bypass LiteFS
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 40s
2026-04-27 10:08:49 -07:00
0200afdc0f test(entrypoint): use test.db to isolate issue
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 43s
2026-04-27 10:06:46 -07:00
e0262dc88b fix(litefs): disable proxy to avoid DB locks
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 43s
2026-04-27 10:00:45 -07:00
107e37cb3e fix(entrypoint): simplify DB connection string
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 50s
2026-04-27 09:56:37 -07:00
5311f0069a fix(entrypoint): use ND_DBPATH env var and remove set -e
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 44s
2026-04-27 09:33:34 -07:00
af8ce0ef2b fix(entrypoint): use /info instead of /status for LiteFS 0.5 status API
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 48s
2026-04-27 09:26:14 -07:00
5f9e4d23fb fix: use --dbpath CLI flag to isolate database on LiteFS mount
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 48s
2026-04-27 08:57:19 -07:00
6e7c729c5e fix: use standard Navidrome variables to isolate DB on LiteFS and metadata on host volume
Some checks failed
Build and Push Docker Image / build-and-push (push) Has been cancelled
2026-04-27 08:56:22 -07:00
37f0dcb1e7 fix: revert to robust manual leadership detection to prevent multiple Navidrome instances
Some checks failed
Build and Push Docker Image / build-and-push (push) Has been cancelled
2026-04-27 08:54:55 -07:00
402553a674 fix: move to native LiteFS leadership management with if-candidate: true
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 41s
2026-04-27 08:52:41 -07:00
c04c00143e fix: support both flat and nested LiteFS status JSON and add robust type checking
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 42s
2026-04-27 08:41:21 -07:00
3e6a4d1704 fix: correct jq path for LiteFS 0.5 status API and add robust error handling
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 42s
2026-04-27 08:31:47 -07:00
362f838f7c fix: robust leadership detection via LiteFS API and resolve Navidrome deprecation warnings
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 43s
2026-04-27 08:25:40 -07:00
a8e02ae063 fix: improve leadership detection using 'litefs status' to prevent redundant Consul registrations
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 47s
2026-04-27 08:22:43 -07:00
538ee01b72 fix: add SQLite connection parameters to ND_DBPATH and wait for DB file
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 43s
2026-04-27 08:15:22 -07:00
25885ea4f0 fix: use ND_DBPATH to point to LiteFS database directly, avoiding symlink errors
Some checks failed
Build and Push Docker Image / build-and-push (push) Has been cancelled
2026-04-27 08:13:37 -07:00
a586d60682 debug: add verbose logging and error checks to setup_data_dir
Some checks failed
Build and Push Docker Image / build-and-push (push) Has been cancelled
2026-04-27 08:11:24 -07:00
59f406d3b7 fix: relocate LiteFS mount to /litefs and use /data for persistent artwork
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 4m5s
2026-04-27 08:04:06 -07:00
f08c715d75 fix(nomad): Move variable definition to top-level
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 25s
2026-04-08 10:58:39 -07:00
8f1565b1af fix(deploy): Replace failing setup-nomad action with manual install
All checks were successful
Build and Push Docker Image / build-and-push (push) Successful in 28s
2026-04-08 10:56:20 -07:00
5 changed files with 101 additions and 88 deletions

View File

@@ -23,9 +23,13 @@ jobs:
uses: actions/checkout@v4
- name: Setup Nomad CLI
uses: hashicorp/setup-nomad@v2
with:
version: '1.10.5'
run: |
NOMAD_VERSION="1.10.5"
curl -sSL https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip -o nomad.zip
unzip nomad.zip
sudo mv nomad /usr/local/bin/
rm nomad.zip
nomad version
- name: Set Container Version
id: container_version

View File

@@ -18,6 +18,9 @@ RUN chmod +x /usr/local/bin/entrypoint.sh
# Copy LiteFS configuration
COPY litefs.yml /etc/litefs.yml
# Create mount points and data directories
RUN mkdir -p /litefs /data
# LiteFS becomes the supervisor.
# It will mount the FUSE fs and then execute the command defined in litefs.yml's exec section.

View File

@@ -1,14 +1,10 @@
#!/bin/bash
set -e
# Configuration from environment
SERVICE_NAME="navidrome"
# Use Nomad allocation ID for a unique service ID
SERVICE_ID="${SERVICE_NAME}-${NOMAD_ALLOC_ID:-$(hostname)}"
PORT=4533
CONSUL_HTTP_ADDR="${CONSUL_URL:-http://localhost:8500}"
NODE_IP="${ADVERTISE_IP}"
DB_LOCK_FILE="/data/.primary"
NAVIDROME_PID=0
# Tags for the Primary service (Traefik enabled)
@@ -16,29 +12,43 @@ PRIMARY_TAGS='["navidrome","web","traefik.enable=true","urlprefix-/navidrome","t
# --- Helper Functions ---
# Backup Database (Only on Primary)
run_backup() {
local backup_dir="/shared_data/backup"
local timestamp=$(date +%Y%m%d_%H%M%S)
local backup_file="${backup_dir}/navidrome.db_${timestamp}.bak"
# Check if this node is the LiteFS Primary
check_primary() {
local status=$(curl -s http://localhost:20202/info || echo "{}")
local is_primary=$(echo "$status" | jq -r 'if type == "object" then (.isPrimary // false) else false end' 2>/dev/null || echo "false")
echo "Backing up database to ${backup_file}..."
mkdir -p "$backup_dir"
if litefs export -name navidrome.db "$backup_file"; then
echo "Backup successful."
# Keep only last 7 days
find "$backup_dir" -name "navidrome.db_*.bak" -mtime +7 -delete
echo "Old backups cleaned."
else
echo "ERROR: Backup failed!"
if [ "$is_primary" = "true" ]; then
return 0 # We are the primary
fi
return 1 # We are a replica
}
# Wait for LiteFS to settle and determine its role
wait_for_litefs() {
echo "Waiting for LiteFS to settle..."
local timeout=60
local count=0
while [ $count -lt $timeout ]; do
local status=$(curl -s http://localhost:20202/info || echo "null")
local is_primary_val=$(echo "$status" | jq -r 'if type == "object" then (.isPrimary // "null") else "null" end' 2>/dev/null || echo "null")
if [ "$is_primary_val" != "null" ]; then
local role="replica"
if [ "$is_primary_val" = "true" ]; then role="primary"; fi
echo "LiteFS initialized. Role: $role"
return 0
fi
sleep 2
count=$((count + 2))
echo -n "."
done
echo "ERROR: LiteFS failed to settle after ${timeout}s"
return 1
}
# Register Service with TTL Check
register_service() {
echo "Promoted! Registering service ${SERVICE_ID}..."
# Convert bash list string to JSON array if needed, but PRIMARY_TAGS is already JSON-like
echo "Registering service ${SERVICE_ID} with Consul..."
curl -s -X PUT "${CONSUL_HTTP_ADDR}/v1/agent/service/register" -d "{
\"ID\": \"${SERVICE_ID}\",
\"Name\": \"${SERVICE_NAME}\",
@@ -59,7 +69,7 @@ pass_ttl() {
# Deregister Service
deregister_service() {
echo "Demoted/Stopping. Deregistering service ${SERVICE_ID}..."
echo "Deregistering service ${SERVICE_ID} from Consul..."
curl -s -X PUT "${CONSUL_HTTP_ADDR}/v1/agent/service/deregister/${SERVICE_ID}"
}
@@ -68,11 +78,48 @@ start_app() {
echo "Node is Primary. Starting Navidrome..."
# Ensure shared directories exist
mkdir -p /shared_data/plugins /shared_data/cache /shared_data/backup
mkdir -p /shared_data/plugins /shared_data/cache /shared_data/backup /shared_data/artist_images /shared_data/artwork
# SEEDING LOGIC: If DB doesn't exist in cluster, restore from backup
if [ ! -f /data/navidrome.db ]; then
echo "Database /data/navidrome.db not found. Looking for backups to seed..."
local latest_backup=$(ls -t /shared_data/backup/navidrome.db_*.bak 2>/dev/null | head -n 1)
if [ -n "$latest_backup" ]; then
echo "Seeding from $latest_backup..."
litefs import -name navidrome.db "$latest_backup"
else
echo "No backups found. Navidrome will start with a fresh database."
fi
fi
# Wait for LiteFS to expose the DB file
echo "Waiting for /data/navidrome.db to be exposed by LiteFS..."
local db_timeout=30
local db_count=0
while [ ! -f /data/navidrome.db ] && [ $db_count -lt $db_timeout ]; do
sleep 1
db_count=$((db_count + 1))
done
# Setup local data folder with BIND MOUNT for the DB
# This allows SQLite to create -wal/-shm files in the local writable directory
# while the main DB file is managed by LiteFS.
rm -rf /local/navidrome_data
mkdir -p /local/navidrome_data
touch /local/navidrome_data/navidrome.db
mount --bind /data/navidrome.db /local/navidrome_data/navidrome.db
# Configuration
export ND_DATAFOLDER="/local/navidrome_data"
export ND_CACHEFOLDER="/shared_data/cache"
export ND_BACKUP_PATH="/shared_data/backup"
export ND_PLUGINS_FOLDER="/shared_data/plugins"
export ND_ARTISTIMAGEFOLDER="artist_images"
/app/navidrome &
NAVIDROME_PID=$!
echo "Navidrome started with PID ${NAVIDROME_PID}"
echo "Navidrome running (PID: $NAVIDROME_PID) with DataFolder at /local/navidrome_data (DB bind-mounted)"
}
# Stop Navidrome
@@ -82,13 +129,13 @@ stop_app() {
kill -SIGTERM "${NAVIDROME_PID}"
wait "${NAVIDROME_PID}" 2>/dev/null || true
NAVIDROME_PID=0
umount /local/navidrome_data/navidrome.db 2>/dev/null || true
fi
}
# --- Signal Handling (The Safety Net) ---
# If Nomad stops the container, we stop the app and deregister.
# --- Cleanup ---
cleanup() {
echo "Caught signal, shutting down..."
echo "Shutting down..."
stop_app
deregister_service
exit 0
@@ -99,55 +146,23 @@ trap cleanup TERM INT
# --- Main Loop ---
echo "Starting Supervisor. Waiting for leadership settle..."
echo "Node IP: $NODE_IP"
echo "Consul: $CONSUL_HTTP_ADDR"
# Small sleep to let LiteFS settle and leadership election complete
sleep 5
LAST_BACKUP_TIME=0
BACKUP_INTERVAL=86400 # 24 hours
wait_for_litefs || exit 1
while true; do
# In LiteFS 0.5, .primary file exists ONLY on replicas.
if [ ! -f "$DB_LOCK_FILE" ]; then
if check_primary; then
# === WE ARE PRIMARY ===
# 1. If App is not running, start it and register
if [ "${NAVIDROME_PID}" -eq 0 ] || ! kill -0 "${NAVIDROME_PID}" 2>/dev/null; then
if [ "${NAVIDROME_PID}" -gt 0 ]; then
echo "CRITICAL: Navidrome crashed! Restarting..."
fi
start_app
register_service
fi
# 2. Maintain the heartbeat (TTL)
pass_ttl
# 3. Handle periodic backup
CURRENT_TIME=$(date +%s)
if [ $((CURRENT_TIME - LAST_BACKUP_TIME)) -ge $BACKUP_INTERVAL ]; then
run_backup
LAST_BACKUP_TIME=$CURRENT_TIME
fi
else
# === WE ARE REPLICA ===
# If App is running (we were just demoted), stop it
if [ "${NAVIDROME_PID}" -gt 0 ]; then
echo "Lost leadership. Demoting..."
stop_app
deregister_service
# Reset backup timer so the next primary can start fresh or we start fresh if promoted again
LAST_BACKUP_TIME=0
fi
# No service registration exists for replicas to keep Consul clean.
fi
# Sleep short enough to update TTL (every 5s is safe for 15s TTL)
sleep 5 &
wait $! # Wait allows the 'trap' to interrupt the sleep instantly
sleep 10
done

View File

@@ -8,29 +8,19 @@ data:
# Use Consul for leader election
lease:
type: "consul"
candidate: true
promote: true
advertise-url: "http://${ADVERTISE_IP}:20202"
consul:
url: "${CONSUL_URL}"
key: "litefs/navidrome"
key: "litefs/navidrome-v8"
ttl: "30s"
lock-delay: "5s"
# Internal HTTP API for replication
http:
addr: "0.0.0.0:20202"
# The HTTP Proxy routes traffic to handle write-forwarding
proxy:
addr: ":8080"
target: "localhost:4533"
db: "navidrome.db"
passthrough:
- "*.js"
- "*.css"
- "*.png"
- "*.jpg"
- "*.jpeg"
- "*.gif"
- "*.svg"
# Commands to run only on the primary node.
exec:
- cmd: "/usr/local/bin/entrypoint.sh"

View File

@@ -1,12 +1,12 @@
variable "container_sha" {
type = string
default = "045fc6e82b9ecb6bebc1f095f62498935df70bbf"
}
job "navidrome-litefs" {
datacenters = ["dc1"]
type = "service"
variable "container_sha" {
type = string
default = "045fc6e82b9ecb6bebc1f095f62498935df70bbf"
}
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
@@ -63,10 +63,11 @@ job "navidrome-litefs" {
PORT = "8080" # Internal proxy port (unused but kept)
# Navidrome Config
ND_DATAFOLDER = "/data"
ND_DATAFOLDER = "/shared_data"
ND_PLUGINS_FOLDER = "/shared_data/plugins"
ND_CACHEFOLDER = "/shared_data/cache"
ND_BACKUP_PATH = "/shared_data/backup"
ND_CACHEFOLDER = "/shared_data/cache"
ND_BACKUP_PATH = "/shared_data/backup"
ND_ARTISTIMAGEFOLDER = "artist_images"
ND_BACKUPSCHEDULE = ""
ND_SCANSCHEDULE = "0"