# Specification: Fix Odroid8 and Script Robustness (`fix_odroid8_and_script`) ## Overview Address the "critical" loop on node `odroid8` caused by a LiteFS Cluster ID mismatch and improve the `cluster_status` script's error handling when the Nomad CLI is unavailable or misconfigured. ## Functional Requirements - **Node Recovery (`odroid8`):** - Identify the specific LiteFS data directory on `odroid8` (usually `/var/lib/litefs` inside the container, mapped to a host path). - Guide the user through stopping the allocation and wiping the metadata/data to resolve the Consul lease conflict. - **Script Robustness:** - Update `nomad_client.py` to handle `subprocess` failures more gracefully. - If a `nomad` command fails, the script should not print a traceback or confusing "non-zero exit status" messages to the primary output. - Instead, it should log the error to `stderr` and continue, marking the affected fields (like logs or full ID) as "Nomad Error". - Add a clear warning in the output if Nomad connectivity is lost, suggesting the user verify `NOMAD_ADDR`. ## Non-Functional Requirements - **Reliability:** The script should remain functional even if one of the underlying tools (Nomad CLI) is broken. - **Ease of Use:** Provide clear, copy-pasteable commands for the manual cleanup process. ## Acceptance Criteria - [ ] `odroid8` node successfully joins the cluster and shows as `passing` in Consul. - [ ] `cluster_status` script runs without error even if the `nomad` binary is missing or cannot connect to the server (showing fallback info). - [ ] Script provides a helpful message when `nomad` commands fail. ## Out of Scope - Fixing the Navidrome database path issue (this will be handled in a separate track once the cluster is stable). - Automating the host-level cleanup (manual guidance only).