Files
navidrome-litefs/conductor/archive/update_monitor_discovery_20260208/plan.md

1.7 KiB

Plan: Update Monitor Discovery Logic (update_monitor_discovery)

Phase 1: Nomad Discovery Enhancement [x] [checkpoint: 353683e]

  • Task: Update nomad_client.py to fetch job allocations with IPs (353683e)
    • Write tests for parsing allocation IPs from nomad job status or nomad alloc status
    • Implement get_job_allocations(job_id) returning a list of dicts (id, node, ip)
  • Task: Conductor - User Manual Verification 'Phase 1: Nomad Discovery Enhancement' (Protocol in workflow.md)

Phase 2: Aggregator Refactor [x] [checkpoint: 655a9b2]

  • Task: Refactor cluster_aggregator.py to drive discovery via Nomad (655a9b2)
    • Update get_cluster_status to call nomad_client.get_job_allocations first
    • Update loop to iterate over allocations and supplement with LiteFS and Consul data
  • Task: Update consul_client.py to fetch all services once and allow lookup by IP/ID (655a9b2)
  • Task: Update tests for the new discovery flow (655a9b2)
  • Task: Conductor - User Manual Verification 'Phase 2: Aggregator Refactor' (Protocol in workflow.md)

Phase 3: UI and Health Logic [x] [checkpoint: 21e9c3d]

  • Task: Update output_formatter.py for "Standby" nodes (21e9c3d)
    • Update table formatting to handle missing Consul status for replicas
  • Task: Update Cluster Health calculation (21e9c3d)
    • "Healthy" = 1 Primary (Consul passing) + N Replicas (LiteFS connected)
  • [~] Task: Extract Uptime from Nomad and internal LiteFS states (txid, checksum)
  • [~] Task: Update aggregator and formatter to display detailed database info
  • Task: Final verification run (21e9c3d)
  • Task: Conductor - User Manual Verification 'Phase 3: Final Verification' (Protocol in workflow.md)