Files
navidrome-litefs/conductor/archive/cluster_status_python_20260208/spec.md

2.1 KiB

Specification: Cluster Status Script (cluster_status_python)

Overview

Create a Python-based CLI script to be run on a local system (outside the cluster) to monitor the health and status of the Navidrome LiteFS/Consul cluster. This tool will bridge the gap for local monitoring without needing a local Consul instance.

Functional Requirements

  • Consul Connectivity:
    • Connect to a remote Consul instance.
    • Default to a hardcoded URL with support for overrides via command-line arguments (e.g., --consul-url) or environment variables (CONSUL_HTTP_ADDR).
    • Assume no Consul authentication token is required.
  • Service Discovery:
    • Query Consul for the navidrome (Primary) and replica-navidrome (Replica) services.
    • Verify that services are registered correctly and health checks are passing.
  • Status Reporting:
    • Display a text-based table summarizing the state of all nodes in the cluster.
    • Color-coded output for quick health assessment.
    • Include a summary section at the top indicating overall cluster health.
  • Node-Level Details:
    • Role identification (Primary vs. Replica).
    • Uptime of the LiteFS process.
    • Advertise URL for each node.
    • Replication Lag (for Replicas).
    • Write-forwarding proxy target (for Replicas).

Non-Functional Requirements

  • Language: Python 3.x.
  • Dependencies: Use standard libraries or common packages like requests for API calls and tabulate for table formatting.
  • Portability: Must run on Linux (user's OS) without requiring local Consul or Nomad binaries.

Acceptance Criteria

  • Script successfully retrieves service list from remote Consul.
  • Script correctly identifies the current Primary node based on Consul tags/service names.
  • Script queries the LiteFS HTTP API (:20202/status) on each node to gather internal metrics.
  • Output is formatted as a clear, readable text table.
  • Overrides for Consul URL are functional.

Out of Scope

  • Direct interaction with Nomad API (Consul is the source of truth for this script).
  • Database-level inspection (SQL queries).
  • Remote log tailing.