# Plan: Cluster Diagnosis and Script Enhancement (`diagnose_and_enhance`) ## Phase 1: Enhanced Diagnostics (Consul) [x] [checkpoint: a686c5b] - [x] Task: Update `consul_client.py` to fetch detailed health check output - [x] Write tests for fetching `Output` field from Consul checks - [x] Implement logic to extract and store the `Output` (error message) - [x] Task: Update aggregator and formatter to display Consul errors - [x] Update aggregation logic to include `consul_error` - [x] Update table formatter to indicate an error (maybe a flag or color) - [x] Add a "Diagnostics" section to the output to print full error details - [x] Task: Conductor - User Manual Verification 'Phase 1: Enhanced Diagnostics (Consul)' (Protocol in workflow.md) ## Phase 2: Nomad Integration and Logs [x] [checkpoint: 6d77729] - [x] Task: Implement `nomad_client.py` wrapper - [x] Write tests for `get_allocation_logs`, `get_node_status`, and `restart_allocation` (mocking subprocess) - [x] Implement `subprocess.run(["nomad", ...])` logic to fetch logs and restart allocations - [x] Task: Integrate Nomad logs into diagnosis - [x] Update aggregator to call Nomad client for critical nodes - [x] Update "Diagnostics" section to display the last 20 lines of stderr - [x] Task: Conductor - User Manual Verification 'Phase 2: Nomad Integration and Logs' (Protocol in workflow.md) ## Phase 3: Advanced LiteFS Status [ ] - [x] Task: Implement `litefs_status` via `nomad alloc exec` - [x] Write tests for executing remote commands via Nomad - [x] Update `litefs_client.py` to fallback to `nomad alloc exec` if HTTP fails - [x] Parse `litefs status` output (text/json) to extract uptime and replication lag - [x] Task: Final Polish and Diagnosis Run - [x] Ensure all pieces work together - [x] Run the script to diagnose `odroid8` - [~] Task: Conductor - User Manual Verification 'Phase 3: Advanced LiteFS Status' (Protocol in workflow.md)