From 9b6159a40c6da0dbfe88070f811b72f9b93f3906 Mon Sep 17 00:00:00 2001 From: sstent Date: Mon, 9 Feb 2026 06:06:48 -0800 Subject: [PATCH] chore(conductor): Archive track 'implement_ttl_heartbeat' --- .../implement_ttl_heartbeat_20260208/index.md | 5 +++ .../metadata.json | 8 +++++ .../implement_ttl_heartbeat_20260208/plan.md | 22 +++++++++++++ .../implement_ttl_heartbeat_20260208/spec.md | 32 +++++++++++++++++++ 4 files changed, 67 insertions(+) create mode 100644 conductor/archive/implement_ttl_heartbeat_20260208/index.md create mode 100644 conductor/archive/implement_ttl_heartbeat_20260208/metadata.json create mode 100644 conductor/archive/implement_ttl_heartbeat_20260208/plan.md create mode 100644 conductor/archive/implement_ttl_heartbeat_20260208/spec.md diff --git a/conductor/archive/implement_ttl_heartbeat_20260208/index.md b/conductor/archive/implement_ttl_heartbeat_20260208/index.md new file mode 100644 index 0000000..08f2366 --- /dev/null +++ b/conductor/archive/implement_ttl_heartbeat_20260208/index.md @@ -0,0 +1,5 @@ +# Track implement_ttl_heartbeat_20260208 Context + +- [Specification](./spec.md) +- [Implementation Plan](./plan.md) +- [Metadata](./metadata.json) diff --git a/conductor/archive/implement_ttl_heartbeat_20260208/metadata.json b/conductor/archive/implement_ttl_heartbeat_20260208/metadata.json new file mode 100644 index 0000000..8b86308 --- /dev/null +++ b/conductor/archive/implement_ttl_heartbeat_20260208/metadata.json @@ -0,0 +1,8 @@ +{ + "track_id": "implement_ttl_heartbeat_20260208", + "type": "enhancement", + "status": "new", + "created_at": "2026-02-08T19:00:00Z", + "updated_at": "2026-02-08T19:00:00Z", + "description": "Implement TTL Heartbeat architecture for robust Consul service registration and cleaner failure handling." +} diff --git a/conductor/archive/implement_ttl_heartbeat_20260208/plan.md b/conductor/archive/implement_ttl_heartbeat_20260208/plan.md new file mode 100644 index 0000000..55f5aab --- /dev/null +++ b/conductor/archive/implement_ttl_heartbeat_20260208/plan.md @@ -0,0 +1,22 @@ +# Plan: Implement TTL Heartbeat Service Registration (`implement_ttl_heartbeat`) + +## Phase 1: Container Environment Preparation [x] [checkpoint: 51b8fce] +- [x] Task: Update `Dockerfile` to install `curl` and `jq` (f7fe258) +- [x] Task: Verify `litefs.yml` points to `entrypoint.sh` (should already be correct) (verified) +- [x] Task: Conductor - User Manual Verification 'Phase 1: Container Environment Preparation' (Protocol in workflow.md) + +## Phase 2: Script Implementation [x] [checkpoint: 139016f] +- [x] Task: Refactor `entrypoint.sh` with the TTL Heartbeat logic (d977301) + - [x] Implement `register_service` with TTL check definition + - [x] Implement `pass_ttl` loop + - [x] Implement robust `stop_app` and signal trapping + - [x] Ensure correct Primary/Replica detection logic (LiteFS 0.5: Primary = No `.primary` file) +- [x] Task: Conductor - User Manual Verification 'Phase 2: Script Implementation' (Protocol in workflow.md) + +## Phase 3: Deployment and Verification [ ] +- [~] Task: Commit changes and push to Gitea to trigger build +- [ ] Task: Monitor Gitea build completion +- [ ] Task: Deploy updated Nomad job (forcing update if necessary) +- [ ] Task: Verify "Clean" state in Consul (only one primary registered) +- [ ] Task: Verify Failover/Stop behavior (immediate deregistration vs TTL expiry) +- [ ] Task: Conductor - User Manual Verification 'Phase 3: Deployment and Verification' (Protocol in workflow.md) diff --git a/conductor/archive/implement_ttl_heartbeat_20260208/spec.md b/conductor/archive/implement_ttl_heartbeat_20260208/spec.md new file mode 100644 index 0000000..13bf3ae --- /dev/null +++ b/conductor/archive/implement_ttl_heartbeat_20260208/spec.md @@ -0,0 +1,32 @@ +# Specification: Implement TTL Heartbeat Service Registration (`implement_ttl_heartbeat`) + +## Overview +Replace the current "register and forget" Consul registration logic with a robust "TTL Heartbeat" pattern. This ensures that only the active Primary node is registered in Consul, and service entries are automatically removed (deregistered) if the node crashes, failover occurs, or Nomad stops the allocation. + +## Functional Requirements +- **Supervisor Script (`entrypoint.sh`):** + - Refactor to implement the "Self-Registration" pattern with TTL checks. + - **Leadership Detection:** Monitor `/data/.primary` (LiteFS 0.5). + - **Primary:** Absence of file. Start Navidrome, register service with TTL. + - **Replica:** Presence of file. Stop Navidrome, deregister service. + - **Heartbeat:** Periodically (e.g., every 5-10s) PUT to Consul to pass the TTL check while Primary. + - **Signal Handling:** Trap `SIGTERM`/`SIGINT` to gracefully stop Navidrome and deregister immediately. +- **Docker Image:** + - Ensure `curl` and `jq` are installed (prerequisites for the script). +- **Nomad Configuration:** + - Ensure `NOMAD_IP_http` and `NOMAD_PORT_http` are accessible to the task (standard, but verifying). + +## Non-Functional Requirements +- **Resilience:** The script must handle Consul unavailability gracefully (retries) without crashing the application loop. +- **Cleanliness:** No "ghost" services. Replicas must not appear in the service catalog. + +## Acceptance Criteria +- [ ] Navidrome runs ONLY on the Primary. +- [ ] Only ONE `navidrome` service is registered in Consul (pointing to the Primary). +- [ ] Stopping the Primary allocation results in immediate deregistration (via trap). +- [ ] Hard killing the Primary allocation results in deregistration after TTL expires (approx 15s). +- [ ] Replicas do not register any service. + +## Implementation Details +- **Script Name:** We will stick with `entrypoint.sh` for consistency with `litefs.yml` configuration, refactoring its content. +- **Service ID:** Use `navidrome-${NOMAD_ALLOC_ID}` to ensure uniqueness and traceability.