This commit is contained in:
2025-11-10 06:56:40 -08:00
parent 45af5f1e3d
commit 7975ca5dda
12 changed files with 44637 additions and 91 deletions

101
CONSUL_PERSISTENCE.md Normal file
View File

@@ -0,0 +1,101 @@
# Consul Persistence for Connection Monitor
This document describes the Consul-based state persistence implementation for the connection monitoring script.
## Overview
The connection monitor now supports state persistence using Consul's KV store. This allows the script to resume from its previous state if restarted, maintaining continuity of remediation processes and connection state tracking.
## Configuration
### Consul Server
- **URL**: `http://consul.service.dc1.consul:8500` (configurable via constructor parameter)
- **Authentication**: None required (no ACL tokens)
- **Key Structure**: All state is stored under `qbitcheck/connection_monitor/`
### State Data Persisted
The following state variables are persisted to Consul:
#### Connection State (`state/`)
- `connection_state`: Current connection state ('stable' or 'unstable')
- `last_state_change_time`: Timestamp of last state transition
- `consecutive_failures`: Count of consecutive connection failures
- `consecutive_stable_checks`: Count of consecutive stable checks
#### Remediation State (`remediation/`)
- `state`: Current remediation phase (None, 'stopping_torrents', 'restarting_nomad', 'waiting_for_stability', 'restarting_torrents')
- `start_time`: When remediation process started
- `stabilization_checks`: Count of stabilization checks during remediation
#### Stability Tracking (`stability/`)
- `start_time`: When stability timer started (for 1-hour requirement)
## Implementation Details
### State Persistence Points
State is automatically saved to Consul at these critical points:
1. **Connection State Transitions**:
- When transitioning from stable to unstable (`on_stable_to_unstable`)
- When transitioning from unstable to stable (`on_unstable_to_stable`)
2. **Remediation Process**:
- When remediation starts (`start_remediation`)
- After each remediation state transition:
- Stopping torrents → Restarting Nomad
- Restarting Nomad → Waiting for stability
- Waiting for stability → Restarting torrents
- When remediation completes successfully
- When remediation fails or times out
- On unexpected errors during remediation
3. **Stability Tracking**:
- When 1-hour stability requirement is met
### Error Handling
- If Consul is unavailable, the script continues operation with graceful degradation
- Consul connection errors are logged but don't interrupt monitoring
- State loading failures result in default initialization
## Usage
### Basic Usage
```python
monitor = ConnectionMonitor(
qbittorrent_url='http://sp.service.dc1.consul:8080',
nomad_url='http://192.168.4.36:4646',
tracker_name='your_tracker_name',
consul_url='http://consul.service.dc1.consul:8500' # Optional, defaults to above
)
```
### Without Consul
If the `python-consul` package is not installed, state persistence is automatically disabled with a warning message.
## Dependencies
Add to requirements.txt:
```
python-consul>=1.1.0
```
Install with:
```bash
pip install -r requirements.txt
```
## Benefits
1. **State Continuity**: Script can be restarted without losing track of ongoing remediation processes
2. **Crash Recovery**: Survives process restarts and system reboots
3. **Monitoring**: External systems can monitor the state via Consul
4. **Debugging**: Historical state available for troubleshooting
## Limitations
- Requires Consul server to be available
- State is eventually consistent (saved after transitions)
- No built-in state expiration or cleanup (manual Consul management required)