2.1 KiB
2.1 KiB
Specification: Implement TTL Heartbeat Service Registration (implement_ttl_heartbeat)
Overview
Replace the current "register and forget" Consul registration logic with a robust "TTL Heartbeat" pattern. This ensures that only the active Primary node is registered in Consul, and service entries are automatically removed (deregistered) if the node crashes, failover occurs, or Nomad stops the allocation.
Functional Requirements
- Supervisor Script (
entrypoint.sh):- Refactor to implement the "Self-Registration" pattern with TTL checks.
- Leadership Detection: Monitor
/data/.primary(LiteFS 0.5).- Primary: Absence of file. Start Navidrome, register service with TTL.
- Replica: Presence of file. Stop Navidrome, deregister service.
- Heartbeat: Periodically (e.g., every 5-10s) PUT to Consul to pass the TTL check while Primary.
- Signal Handling: Trap
SIGTERM/SIGINTto gracefully stop Navidrome and deregister immediately.
- Docker Image:
- Ensure
curlandjqare installed (prerequisites for the script).
- Ensure
- Nomad Configuration:
- Ensure
NOMAD_IP_httpandNOMAD_PORT_httpare accessible to the task (standard, but verifying).
- Ensure
Non-Functional Requirements
- Resilience: The script must handle Consul unavailability gracefully (retries) without crashing the application loop.
- Cleanliness: No "ghost" services. Replicas must not appear in the service catalog.
Acceptance Criteria
- Navidrome runs ONLY on the Primary.
- Only ONE
navidromeservice is registered in Consul (pointing to the Primary). - Stopping the Primary allocation results in immediate deregistration (via trap).
- Hard killing the Primary allocation results in deregistration after TTL expires (approx 15s).
- Replicas do not register any service.
Implementation Details
- Script Name: We will stick with
entrypoint.shfor consistency withlitefs.ymlconfiguration, refactoring its content. - Service ID: Use
navidrome-${NOMAD_ALLOC_ID}to ensure uniqueness and traceability.