conductor(checkpoint): Checkpoint end of Phase 1
This commit is contained in:
40
conductor/tracks/cluster_status_python_20260208/spec.md
Normal file
40
conductor/tracks/cluster_status_python_20260208/spec.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Specification: Cluster Status Script (`cluster_status_python`)
|
||||
|
||||
## Overview
|
||||
Create a Python-based CLI script to be run on a local system (outside the cluster) to monitor the health and status of the Navidrome LiteFS/Consul cluster. This tool will bridge the gap for local monitoring without needing a local Consul instance.
|
||||
|
||||
## Functional Requirements
|
||||
- **Consul Connectivity:**
|
||||
- Connect to a remote Consul instance.
|
||||
- Default to a hardcoded URL with support for overrides via command-line arguments (e.g., `--consul-url`) or environment variables (`CONSUL_HTTP_ADDR`).
|
||||
- Assume no Consul authentication token is required.
|
||||
- **Service Discovery:**
|
||||
- Query Consul for the `navidrome` (Primary) and `replica-navidrome` (Replica) services.
|
||||
- Verify that services are registered correctly and health checks are passing.
|
||||
- **Status Reporting:**
|
||||
- Display a text-based table summarizing the state of all nodes in the cluster.
|
||||
- Color-coded output for quick health assessment.
|
||||
- Include a summary section at the top indicating overall cluster health.
|
||||
- **Node-Level Details:**
|
||||
- Role identification (Primary vs. Replica).
|
||||
- Uptime of the LiteFS process.
|
||||
- Advertise URL for each node.
|
||||
- Replication Lag (for Replicas).
|
||||
- Write-forwarding proxy target (for Replicas).
|
||||
|
||||
## Non-Functional Requirements
|
||||
- **Language:** Python 3.x.
|
||||
- **Dependencies:** Use standard libraries or common packages like `requests` for API calls and `tabulate` for table formatting.
|
||||
- **Portability:** Must run on Linux (user's OS) without requiring local Consul or Nomad binaries.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Script successfully retrieves service list from remote Consul.
|
||||
- [ ] Script correctly identifies the current Primary node based on Consul tags/service names.
|
||||
- [ ] Script queries the LiteFS HTTP API (`:20202/status`) on each node to gather internal metrics.
|
||||
- [ ] Output is formatted as a clear, readable text table.
|
||||
- [ ] Overrides for Consul URL are functional.
|
||||
|
||||
## Out of Scope
|
||||
- Direct interaction with Nomad API (Consul is the source of truth for this script).
|
||||
- Database-level inspection (SQL queries).
|
||||
- Remote log tailing.
|
||||
Reference in New Issue
Block a user