From 22ec8a5cc01870f3a05e260f4d219c4d9bda5fed Mon Sep 17 00:00:00 2001 From: sstent Date: Sun, 8 Feb 2026 07:55:12 -0800 Subject: [PATCH] conductor(plan): Mark phase 'Phase 2: Nomad Integration and Logs' as complete --- .../diagnose_and_enhance_20260208/plan.md | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 conductor/tracks/diagnose_and_enhance_20260208/plan.md diff --git a/conductor/tracks/diagnose_and_enhance_20260208/plan.md b/conductor/tracks/diagnose_and_enhance_20260208/plan.md new file mode 100644 index 0000000..909b58b --- /dev/null +++ b/conductor/tracks/diagnose_and_enhance_20260208/plan.md @@ -0,0 +1,30 @@ +# Plan: Cluster Diagnosis and Script Enhancement (`diagnose_and_enhance`) + +## Phase 1: Enhanced Diagnostics (Consul) [x] [checkpoint: a686c5b] +- [x] Task: Update `consul_client.py` to fetch detailed health check output + - [x] Write tests for fetching `Output` field from Consul checks + - [x] Implement logic to extract and store the `Output` (error message) +- [x] Task: Update aggregator and formatter to display Consul errors + - [x] Update aggregation logic to include `consul_error` + - [x] Update table formatter to indicate an error (maybe a flag or color) + - [x] Add a "Diagnostics" section to the output to print full error details +- [x] Task: Conductor - User Manual Verification 'Phase 1: Enhanced Diagnostics (Consul)' (Protocol in workflow.md) + +## Phase 2: Nomad Integration and Logs [x] [checkpoint: 6d77729] +- [x] Task: Implement `nomad_client.py` wrapper + - [x] Write tests for `get_allocation_logs`, `get_node_status`, and `restart_allocation` (mocking subprocess) + - [x] Implement `subprocess.run(["nomad", ...])` logic to fetch logs and restart allocations +- [x] Task: Integrate Nomad logs into diagnosis + - [x] Update aggregator to call Nomad client for critical nodes + - [x] Update "Diagnostics" section to display the last 20 lines of stderr +- [x] Task: Conductor - User Manual Verification 'Phase 2: Nomad Integration and Logs' (Protocol in workflow.md) + +## Phase 3: Advanced LiteFS Status [ ] +- [ ] Task: Implement `litefs_status` via `nomad alloc exec` + - [ ] Write tests for executing remote commands via Nomad + - [ ] Update `litefs_client.py` to fallback to `nomad alloc exec` if HTTP fails + - [ ] Parse `litefs status` output (text/json) to extract uptime and replication lag +- [ ] Task: Final Polish and Diagnosis Run + - [ ] Ensure all pieces work together + - [ ] Run the script to diagnose `odroid8` +- [ ] Task: Conductor - User Manual Verification 'Phase 3: Advanced LiteFS Status' (Protocol in workflow.md)