chore(conductor): Archive track 'cluster_status_python'

This commit is contained in:
2026-02-08 07:34:21 -08:00
parent c7e7c9fd7b
commit f367f93768
5 changed files with 84 additions and 3 deletions

View File

@@ -0,0 +1,5 @@
# Track cluster_status_python_20260208 Context
- [Specification](./spec.md)
- [Implementation Plan](./plan.md)
- [Metadata](./metadata.json)

View File

@@ -0,0 +1,8 @@
{
"track_id": "cluster_status_python_20260208",
"type": "feature",
"status": "new",
"created_at": "2026-02-08T15:00:00Z",
"updated_at": "2026-02-08T15:00:00Z",
"description": "create a script that runs on my local system (i don't run consul locally) that: - check consul services are registered correctly - diplays the expected state (who is primary, what replicas exist) - show basic litefs status info for each node"
}

View File

@@ -0,0 +1,31 @@
# Plan: Cluster Status Script (`cluster_status_python`)
## Phase 1: Environment and Project Structure [x] [checkpoint: e71d5e2]
- [x] Task: Initialize Python project structure (venv, requirements.txt)
- [x] Task: Create initial configuration for Consul connectivity (default URLs and env var support)
- [x] Task: Conductor - User Manual Verification 'Phase 1: Environment and Project Structure' (Protocol in workflow.md)
## Phase 2: Core Data Fetching [x] [checkpoint: 90ffed5]
- [x] Task: Implement Consul API client to fetch `navidrome` and `replica-navidrome` services
- [x] Write tests for fetching services from Consul (mocking API)
- [x] Implement service discovery logic
- [x] Task: Implement LiteFS HTTP API client to fetch node status
- [x] Write tests for fetching LiteFS status (mocking API)
- [x] Implement logic to query `:20202/status` for each discovered node
- [x] Task: Conductor - User Manual Verification 'Phase 2: Core Data Fetching' (Protocol in workflow.md)
## Phase 3: Data Processing and Formatting [x] [checkpoint: 20d99be]
- [x] Task: Implement data aggregation logic
- [x] Write tests for aggregating Consul and LiteFS data into a single cluster state object
- [x] Implement logic to calculate overall cluster health and role assignment
- [x] Task: Implement CLI output formatting (Table and Color)
- [x] Write tests for table formatting and color-coding logic
- [x] Implement `tabulate` based output with a health summary
- [x] Task: Conductor - User Manual Verification 'Phase 3: Data Processing and Formatting' (Protocol in workflow.md)
## Phase 4: CLI Interface and Final Polishing [x]
- [x] Task: Implement command-line arguments (argparse)
- [x] Write tests for CLI argument parsing (Consul URL overrides, etc.)
- [x] Finalize the `main` entry point
- [x] Task: Final verification of script against requirements
- [x] Task: Conductor - User Manual Verification 'Phase 4: CLI Interface and Final Polishing' (Protocol in workflow.md)

View File

@@ -0,0 +1,40 @@
# Specification: Cluster Status Script (`cluster_status_python`)
## Overview
Create a Python-based CLI script to be run on a local system (outside the cluster) to monitor the health and status of the Navidrome LiteFS/Consul cluster. This tool will bridge the gap for local monitoring without needing a local Consul instance.
## Functional Requirements
- **Consul Connectivity:**
- Connect to a remote Consul instance.
- Default to a hardcoded URL with support for overrides via command-line arguments (e.g., `--consul-url`) or environment variables (`CONSUL_HTTP_ADDR`).
- Assume no Consul authentication token is required.
- **Service Discovery:**
- Query Consul for the `navidrome` (Primary) and `replica-navidrome` (Replica) services.
- Verify that services are registered correctly and health checks are passing.
- **Status Reporting:**
- Display a text-based table summarizing the state of all nodes in the cluster.
- Color-coded output for quick health assessment.
- Include a summary section at the top indicating overall cluster health.
- **Node-Level Details:**
- Role identification (Primary vs. Replica).
- Uptime of the LiteFS process.
- Advertise URL for each node.
- Replication Lag (for Replicas).
- Write-forwarding proxy target (for Replicas).
## Non-Functional Requirements
- **Language:** Python 3.x.
- **Dependencies:** Use standard libraries or common packages like `requests` for API calls and `tabulate` for table formatting.
- **Portability:** Must run on Linux (user's OS) without requiring local Consul or Nomad binaries.
## Acceptance Criteria
- [ ] Script successfully retrieves service list from remote Consul.
- [ ] Script correctly identifies the current Primary node based on Consul tags/service names.
- [ ] Script queries the LiteFS HTTP API (`:20202/status`) on each node to gather internal metrics.
- [ ] Output is formatted as a clear, readable text table.
- [ ] Overrides for Consul URL are functional.
## Out of Scope
- Direct interaction with Nomad API (Consul is the source of truth for this script).
- Database-level inspection (SQL queries).
- Remote log tailing.