conductor(plan): Mark phase 'Phase 2: Nomad Integration and Logs' as complete
This commit is contained in:
30
conductor/tracks/diagnose_and_enhance_20260208/plan.md
Normal file
30
conductor/tracks/diagnose_and_enhance_20260208/plan.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Plan: Cluster Diagnosis and Script Enhancement (`diagnose_and_enhance`)
|
||||
|
||||
## Phase 1: Enhanced Diagnostics (Consul) [x] [checkpoint: a686c5b]
|
||||
- [x] Task: Update `consul_client.py` to fetch detailed health check output
|
||||
- [x] Write tests for fetching `Output` field from Consul checks
|
||||
- [x] Implement logic to extract and store the `Output` (error message)
|
||||
- [x] Task: Update aggregator and formatter to display Consul errors
|
||||
- [x] Update aggregation logic to include `consul_error`
|
||||
- [x] Update table formatter to indicate an error (maybe a flag or color)
|
||||
- [x] Add a "Diagnostics" section to the output to print full error details
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1: Enhanced Diagnostics (Consul)' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Nomad Integration and Logs [x] [checkpoint: 6d77729]
|
||||
- [x] Task: Implement `nomad_client.py` wrapper
|
||||
- [x] Write tests for `get_allocation_logs`, `get_node_status`, and `restart_allocation` (mocking subprocess)
|
||||
- [x] Implement `subprocess.run(["nomad", ...])` logic to fetch logs and restart allocations
|
||||
- [x] Task: Integrate Nomad logs into diagnosis
|
||||
- [x] Update aggregator to call Nomad client for critical nodes
|
||||
- [x] Update "Diagnostics" section to display the last 20 lines of stderr
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2: Nomad Integration and Logs' (Protocol in workflow.md)
|
||||
|
||||
## Phase 3: Advanced LiteFS Status [ ]
|
||||
- [ ] Task: Implement `litefs_status` via `nomad alloc exec`
|
||||
- [ ] Write tests for executing remote commands via Nomad
|
||||
- [ ] Update `litefs_client.py` to fallback to `nomad alloc exec` if HTTP fails
|
||||
- [ ] Parse `litefs status` output (text/json) to extract uptime and replication lag
|
||||
- [ ] Task: Final Polish and Diagnosis Run
|
||||
- [ ] Ensure all pieces work together
|
||||
- [ ] Run the script to diagnose `odroid8`
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Advanced LiteFS Status' (Protocol in workflow.md)
|
||||
Reference in New Issue
Block a user