Files
navidrome-litefs/conductor/archive/implement_ttl_heartbeat_20260208/spec.md

2.1 KiB

Specification: Implement TTL Heartbeat Service Registration (implement_ttl_heartbeat)

Overview

Replace the current "register and forget" Consul registration logic with a robust "TTL Heartbeat" pattern. This ensures that only the active Primary node is registered in Consul, and service entries are automatically removed (deregistered) if the node crashes, failover occurs, or Nomad stops the allocation.

Functional Requirements

  • Supervisor Script (entrypoint.sh):
    • Refactor to implement the "Self-Registration" pattern with TTL checks.
    • Leadership Detection: Monitor /data/.primary (LiteFS 0.5).
      • Primary: Absence of file. Start Navidrome, register service with TTL.
      • Replica: Presence of file. Stop Navidrome, deregister service.
    • Heartbeat: Periodically (e.g., every 5-10s) PUT to Consul to pass the TTL check while Primary.
    • Signal Handling: Trap SIGTERM/SIGINT to gracefully stop Navidrome and deregister immediately.
  • Docker Image:
    • Ensure curl and jq are installed (prerequisites for the script).
  • Nomad Configuration:
    • Ensure NOMAD_IP_http and NOMAD_PORT_http are accessible to the task (standard, but verifying).

Non-Functional Requirements

  • Resilience: The script must handle Consul unavailability gracefully (retries) without crashing the application loop.
  • Cleanliness: No "ghost" services. Replicas must not appear in the service catalog.

Acceptance Criteria

  • Navidrome runs ONLY on the Primary.
  • Only ONE navidrome service is registered in Consul (pointing to the Primary).
  • Stopping the Primary allocation results in immediate deregistration (via trap).
  • Hard killing the Primary allocation results in deregistration after TTL expires (approx 15s).
  • Replicas do not register any service.

Implementation Details

  • Script Name: We will stick with entrypoint.sh for consistency with litefs.yml configuration, refactoring its content.
  • Service ID: Use navidrome-${NOMAD_ALLOC_ID} to ensure uniqueness and traceability.