chore(conductor): Cleanup tracked files for archived tracks
This commit is contained in:
@@ -1,5 +0,0 @@
|
||||
# Track cluster_status_python_20260208 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "cluster_status_python_20260208",
|
||||
"type": "feature",
|
||||
"status": "new",
|
||||
"created_at": "2026-02-08T15:00:00Z",
|
||||
"updated_at": "2026-02-08T15:00:00Z",
|
||||
"description": "create a script that runs on my local system (i don't run consul locally) that: - check consul services are registered correctly - diplays the expected state (who is primary, what replicas exist) - show basic litefs status info for each node"
|
||||
}
|
||||
@@ -1,31 +0,0 @@
|
||||
# Plan: Cluster Status Script (`cluster_status_python`)
|
||||
|
||||
## Phase 1: Environment and Project Structure [x] [checkpoint: e71d5e2]
|
||||
- [x] Task: Initialize Python project structure (venv, requirements.txt)
|
||||
- [x] Task: Create initial configuration for Consul connectivity (default URLs and env var support)
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1: Environment and Project Structure' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Core Data Fetching [x] [checkpoint: 90ffed5]
|
||||
- [x] Task: Implement Consul API client to fetch `navidrome` and `replica-navidrome` services
|
||||
- [x] Write tests for fetching services from Consul (mocking API)
|
||||
- [x] Implement service discovery logic
|
||||
- [x] Task: Implement LiteFS HTTP API client to fetch node status
|
||||
- [x] Write tests for fetching LiteFS status (mocking API)
|
||||
- [x] Implement logic to query `:20202/status` for each discovered node
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2: Core Data Fetching' (Protocol in workflow.md)
|
||||
|
||||
## Phase 3: Data Processing and Formatting [x] [checkpoint: 20d99be]
|
||||
- [x] Task: Implement data aggregation logic
|
||||
- [x] Write tests for aggregating Consul and LiteFS data into a single cluster state object
|
||||
- [x] Implement logic to calculate overall cluster health and role assignment
|
||||
- [x] Task: Implement CLI output formatting (Table and Color)
|
||||
- [x] Write tests for table formatting and color-coding logic
|
||||
- [x] Implement `tabulate` based output with a health summary
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3: Data Processing and Formatting' (Protocol in workflow.md)
|
||||
|
||||
## Phase 4: CLI Interface and Final Polishing [x]
|
||||
- [x] Task: Implement command-line arguments (argparse)
|
||||
- [x] Write tests for CLI argument parsing (Consul URL overrides, etc.)
|
||||
- [x] Finalize the `main` entry point
|
||||
- [x] Task: Final verification of script against requirements
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 4: CLI Interface and Final Polishing' (Protocol in workflow.md)
|
||||
@@ -1,40 +0,0 @@
|
||||
# Specification: Cluster Status Script (`cluster_status_python`)
|
||||
|
||||
## Overview
|
||||
Create a Python-based CLI script to be run on a local system (outside the cluster) to monitor the health and status of the Navidrome LiteFS/Consul cluster. This tool will bridge the gap for local monitoring without needing a local Consul instance.
|
||||
|
||||
## Functional Requirements
|
||||
- **Consul Connectivity:**
|
||||
- Connect to a remote Consul instance.
|
||||
- Default to a hardcoded URL with support for overrides via command-line arguments (e.g., `--consul-url`) or environment variables (`CONSUL_HTTP_ADDR`).
|
||||
- Assume no Consul authentication token is required.
|
||||
- **Service Discovery:**
|
||||
- Query Consul for the `navidrome` (Primary) and `replica-navidrome` (Replica) services.
|
||||
- Verify that services are registered correctly and health checks are passing.
|
||||
- **Status Reporting:**
|
||||
- Display a text-based table summarizing the state of all nodes in the cluster.
|
||||
- Color-coded output for quick health assessment.
|
||||
- Include a summary section at the top indicating overall cluster health.
|
||||
- **Node-Level Details:**
|
||||
- Role identification (Primary vs. Replica).
|
||||
- Uptime of the LiteFS process.
|
||||
- Advertise URL for each node.
|
||||
- Replication Lag (for Replicas).
|
||||
- Write-forwarding proxy target (for Replicas).
|
||||
|
||||
## Non-Functional Requirements
|
||||
- **Language:** Python 3.x.
|
||||
- **Dependencies:** Use standard libraries or common packages like `requests` for API calls and `tabulate` for table formatting.
|
||||
- **Portability:** Must run on Linux (user's OS) without requiring local Consul or Nomad binaries.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Script successfully retrieves service list from remote Consul.
|
||||
- [ ] Script correctly identifies the current Primary node based on Consul tags/service names.
|
||||
- [ ] Script queries the LiteFS HTTP API (`:20202/status`) on each node to gather internal metrics.
|
||||
- [ ] Output is formatted as a clear, readable text table.
|
||||
- [ ] Overrides for Consul URL are functional.
|
||||
|
||||
## Out of Scope
|
||||
- Direct interaction with Nomad API (Consul is the source of truth for this script).
|
||||
- Database-level inspection (SQL queries).
|
||||
- Remote log tailing.
|
||||
@@ -1,5 +0,0 @@
|
||||
# Track fix_routing_20260207 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "fix_routing_20260207",
|
||||
"type": "bug",
|
||||
"status": "new",
|
||||
"created_at": "2026-02-07T17:36:00Z",
|
||||
"updated_at": "2026-02-07T17:36:00Z",
|
||||
"description": "fix routing - use litefs to register the navidrome service with consul. the serivce should point to the master and avoid the litefs proxy (it breaks navidrome)"
|
||||
}
|
||||
@@ -1,25 +0,0 @@
|
||||
# Implementation Plan: Direct Primary Routing for Navidrome-LiteFS
|
||||
|
||||
This plan outlines the steps to reconfigure the Navidrome-LiteFS cluster to bypass the LiteFS write-forwarding proxy and use direct primary node routing for improved reliability and performance.
|
||||
|
||||
## Phase 1: Infrastructure Configuration Update [checkpoint: 5a57902]
|
||||
In this phase, we will modify the Nomad job and LiteFS configuration to support direct port access and primary-aware health checks.
|
||||
|
||||
- [x] Task: Update `navidrome-litefs-v2.nomad` to point service directly to Navidrome port
|
||||
- [x] Modify `service` block to use port 4533 instead of dynamic mapped port.
|
||||
- [x] Replace HTTP health check with a script check running `litefs is-primary`.
|
||||
- [x] Task: Update `litefs.yml` to ensure consistent internal API binding (if needed)
|
||||
- [x] Task: Conductor - User Manual Verification 'Infrastructure Configuration Update' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Deployment and Validation
|
||||
In this phase, we will deploy the changes and verify that the cluster correctly handles primary election and routing.
|
||||
|
||||
- [x] Task: Deploy updated Nomad job
|
||||
- [x] Execute `nomad job run navidrome-litefs-v2.nomad`.
|
||||
- [x] Task: Verify Consul health status
|
||||
- [x] Confirm that only the LiteFS primary node is marked as `passing`.
|
||||
- [x] Confirm that replica nodes are marked as `critical`.
|
||||
- [x] Task: Verify Ingress Routing
|
||||
- [x] Confirm Traefik correctly routes traffic only to the primary node.
|
||||
- [x] Verify that Navidrome is accessible and functional.
|
||||
- [x] Task: Conductor - User Manual Verification 'Deployment and Validation' (Protocol in workflow.md)
|
||||
@@ -1,26 +0,0 @@
|
||||
# Specification: Direct Primary Routing for Navidrome-LiteFS
|
||||
|
||||
## Overview
|
||||
This track aims to fix routing issues caused by the LiteFS proxy. We will reconfigure the Nomad service registration to point directly to the Navidrome process (port 4533) on the primary node, bypassing the LiteFS write-forwarding proxy (port 8080). To ensure Traefik only routes traffic to the node capable of writes, we will implement a "Primary-only" health check.
|
||||
|
||||
## Functional Requirements
|
||||
- **Direct Port Mapping:** Update the Nomad `service` block to use the host port `4533` directly instead of the LiteFS proxy port.
|
||||
- **Primary-Aware Health Check:** Replace the standard HTTP health check with a script check.
|
||||
- **Check Logic:** The script will execute `litefs is-primary`.
|
||||
- If the node is the primary, the command exits with `0` (Passing).
|
||||
- If the node is a replica, the command exits with a non-zero code (Critical).
|
||||
- **Service Tags:** Retain all existing Traefik tags so ingress routing continues to work.
|
||||
|
||||
## Non-Functional Requirements
|
||||
- **Failover Reliability:** In the event of a leader election, the old primary must become unhealthy and the new primary must become healthy in Consul, allowing Traefik to update its backends automatically.
|
||||
- **Minimal Latency:** Bypassing the proxy eliminates the extra network hop for reads and potential compatibility issues with Navidrome's connection handling.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Consul reports the service as `passing` only on the node currently holding the LiteFS primary lease.
|
||||
- [ ] Consul reports the service as `critical` on all replica nodes.
|
||||
- [ ] Traefik correctly routes traffic to the primary node.
|
||||
- [ ] Navidrome is accessible and functions correctly without the LiteFS proxy intermediary.
|
||||
|
||||
## Out of Scope
|
||||
- Modifying Navidrome internal logic.
|
||||
- Implementing an external health-check responder.
|
||||
Reference in New Issue
Block a user