mirror of
https://github.com/sstent/consul-monitor.git
synced 2025-12-06 08:01:58 +00:00
318 lines
9.5 KiB
Markdown
318 lines
9.5 KiB
Markdown
# Consul Service Monitor - Design Document
|
|
|
|
## Overview
|
|
|
|
A web-based dashboard application that monitors and visualizes the health status of services registered in HashiCorp Consul. The application provides real-time monitoring with historical health tracking capabilities.
|
|
|
|
## Architecture
|
|
|
|
### High-Level Components
|
|
|
|
1. **Web Frontend** - Interactive dashboard displaying service status
|
|
2. **Backend API** - REST API for data retrieval and configuration
|
|
3. **Data Collection Service** - Background service polling Consul for health data
|
|
4. **SQLite Database** - Historical health check data storage
|
|
5. **Consul Integration** - Service discovery and health check monitoring
|
|
|
|
### Technology Stack
|
|
|
|
- **Frontend**: HTML5, CSS3, JavaScript (with Chart.js for visualizations)
|
|
- **Backend**: Python 3.9+ with Flask
|
|
- **Database**: SQLite (ephemeral storage)
|
|
- **Service Discovery**: HashiCorp Consul (consul.service.dc1.consul)
|
|
- **Updates**: Periodic polling (no WebSockets needed)
|
|
|
|
## Functional Requirements
|
|
|
|
### Core Features
|
|
|
|
#### 1. Service List Display
|
|
- Display all services registered in Consul
|
|
- Show service name, ID, and tags
|
|
- Provide clickable links to service URLs
|
|
- Support sorting and filtering
|
|
|
|
#### 2. Health Status Visualization
|
|
- **Current Status Indicator**
|
|
- Green icon: All health checks passing
|
|
- Red icon: One or more health checks failing
|
|
- Yellow icon: Warning state (if supported)
|
|
- **Historical Status Chart**
|
|
- Mini bar chart showing 24-hour health history
|
|
- Time-based visualization (hourly aggregation)
|
|
- Color-coded status representation
|
|
|
|
#### 3. Auto-refresh Functionality
|
|
- Toggle switch to enable/disable auto-refresh
|
|
- Configurable refresh interval (30s, 1m, 2m, 5m, 10m)
|
|
- Visual indicator when auto-refresh is active
|
|
- Manual refresh button
|
|
|
|
#### 4. Configuration Management
|
|
- Session-based storage of user preferences (no persistence needed)
|
|
- Configurable history granularity (5m, 15m, 30m, 1h) - default: 15 minutes
|
|
|
|
## Database Schema
|
|
|
|
### Tables
|
|
|
|
```sql
|
|
-- Services table
|
|
CREATE TABLE services (
|
|
id TEXT PRIMARY KEY,
|
|
name TEXT NOT NULL,
|
|
address TEXT,
|
|
port INTEGER,
|
|
tags TEXT, -- JSON array
|
|
meta TEXT, -- JSON object
|
|
first_seen DATETIME DEFAULT CURRENT_TIMESTAMP,
|
|
last_seen DATETIME DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
-- Health checks table
|
|
CREATE TABLE health_checks (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
service_id TEXT NOT NULL,
|
|
check_id TEXT NOT NULL,
|
|
check_name TEXT,
|
|
status TEXT NOT NULL, -- 'passing', 'warning', 'critical'
|
|
output TEXT,
|
|
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
|
|
FOREIGN KEY (service_id) REFERENCES services (id)
|
|
);
|
|
|
|
-- Configuration table (session-based, optional for defaults)
|
|
CREATE TABLE config (
|
|
key TEXT PRIMARY KEY,
|
|
value TEXT NOT NULL,
|
|
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
-- Service URLs are generated using pattern: http://{service_name}.service.dc1.consul:{port}
|
|
|
|
-- Indexes for performance
|
|
CREATE INDEX idx_health_checks_service_timestamp ON health_checks (service_id, timestamp);
|
|
CREATE INDEX idx_health_checks_timestamp ON health_checks (timestamp);
|
|
```
|
|
|
|
## API Design
|
|
|
|
### REST Endpoints
|
|
|
|
```python
|
|
# Flask routes
|
|
GET /
|
|
- Serves main dashboard HTML page
|
|
|
|
GET /api/services
|
|
- Returns list of all services with current health status
|
|
- Generated URLs: http://{service_name}.service.dc1.consul:{port}
|
|
- Response: Array of service objects with health summary
|
|
|
|
GET /api/services/<service_id>/history
|
|
- Returns historical health data for charts
|
|
- Query params: ?granularity=15 (minutes: 5,15,30,60)
|
|
- Response: Time-series data for Chart.js
|
|
|
|
POST /api/config
|
|
- Updates session configuration
|
|
- Body: { "autoRefresh": true, "refreshInterval": 60, "historyGranularity": 15 }
|
|
|
|
GET /api/config
|
|
- Returns current session configuration
|
|
```
|
|
|
|
## Data Collection Service
|
|
|
|
### Polling Strategy
|
|
|
|
```yaml
|
|
Consul Polling:
|
|
- Interval: 60 seconds
|
|
- Consul Address: consul.service.dc1.consul:8500
|
|
- Endpoints:
|
|
- /v1/agent/services (service discovery)
|
|
- /v1/health/service/{service} (health checks)
|
|
- No authentication required
|
|
- Error handling: Log errors, continue polling
|
|
- Expected services: 30-40 services
|
|
|
|
Data Retention:
|
|
- Keep detailed data for 24 hours only (ephemeral storage)
|
|
- No long-term aggregation needed
|
|
- Database recreated on container restart
|
|
```
|
|
|
|
### Health Check Processing
|
|
|
|
1. **Data Collection**
|
|
- Poll Consul API for service list
|
|
- For each service, fetch health check status
|
|
- Store raw health check data with timestamps
|
|
|
|
2. **Status Aggregation**
|
|
- Service-level status: Worst status among all checks
|
|
- Historical aggregation: Count of passing/warning/critical per time window
|
|
|
|
3. **Change Detection**
|
|
- Compare current status with previous poll
|
|
- Trigger notifications/updates on status changes
|
|
- Maintain service registration/deregistration events
|
|
|
|
## Frontend Design
|
|
|
|
### Main Dashboard Layout
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────┐
|
|
│ Consul Service Monitor [⚙️] [🔄] │
|
|
├─────────────────────────────────────────────────┤
|
|
│ Auto-refresh: [ON/OFF] Interval: [1m ▼] │
|
|
│ History granularity: [15m ▼] │
|
|
├─────────────────────────────────────────────────┤
|
|
│ Service Name │ Status │ URL │ History │
|
|
├─────────────────┼────────┼──────────┼───────────┤
|
|
│ web-api │ 🟢 │ [link] │ ▆▆█▆█▆▆ │
|
|
│ database │ 🔴 │ [link] │ █▆▆▄▂▂▄ │
|
|
│ cache-service │ 🟢 │ [link] │ ████████ │
|
|
└─────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Interactive Elements
|
|
|
|
- **Status Icons**: Visual indicators only (no detailed popup needed)
|
|
- **History Charts**: Chart.js mini bar charts with 24-hour data
|
|
- **Service Links**: URLs generated as http://{service_name}.service.dc1.consul:{port}
|
|
- **Desktop-optimized**: No mobile responsive design required
|
|
|
|
### Updates
|
|
|
|
- Periodic AJAX polling for updates
|
|
- Configurable refresh intervals (30s, 1m, 2m, 5m, 10m)
|
|
- Visual loading indicators during refresh
|
|
|
|
## Configuration Management
|
|
|
|
### User Settings (Session-based)
|
|
|
|
```json
|
|
{
|
|
"autoRefresh": {
|
|
"enabled": false,
|
|
"interval": 60,
|
|
"options": [30, 60, 120, 300, 600]
|
|
},
|
|
"display": {
|
|
"historyGranularity": 15,
|
|
"granularityOptions": [5, 15, 30, 60]
|
|
}
|
|
}
|
|
```
|
|
|
|
### System Configuration
|
|
|
|
```python
|
|
# Flask configuration
|
|
CONSUL_HOST = "consul.service.dc1.consul"
|
|
CONSUL_PORT = 8500
|
|
DATABASE_PATH = ":memory:" # Ephemeral SQLite
|
|
POLL_INTERVAL = 60 # seconds
|
|
MAX_SERVICES = 50 # Safety limit
|
|
```
|
|
|
|
## Deployment Considerations
|
|
|
|
### Docker Deployment
|
|
|
|
```dockerfile
|
|
FROM python:3.11-slim
|
|
|
|
WORKDIR /app
|
|
|
|
# Install dependencies
|
|
COPY requirements.txt .
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
# Copy application
|
|
COPY . .
|
|
|
|
# Expose port
|
|
EXPOSE 5000
|
|
|
|
# Set environment variables
|
|
ENV FLASK_APP=app.py
|
|
ENV FLASK_ENV=production
|
|
ENV CONSUL_HOST=consul.service.dc1.consul
|
|
ENV CONSUL_PORT=8500
|
|
|
|
# Health check
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
|
CMD curl -f http://localhost:5000/health || exit 1
|
|
|
|
CMD ["python", "-m", "flask", "run", "--host=0.0.0.0"]
|
|
```
|
|
|
|
### Python Dependencies (requirements.txt)
|
|
|
|
```
|
|
Flask==2.3.3
|
|
requests==2.31.0
|
|
sqlite3 # Built-in
|
|
APScheduler==3.10.4 # For background polling
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
- `CONSUL_HOST`: Consul server hostname (default: consul.service.dc1.consul)
|
|
- `CONSUL_PORT`: Consul server port (default: 8500)
|
|
- `FLASK_PORT`: Web server port (default: 5000)
|
|
- `POLL_INTERVAL`: Health check polling interval in seconds (default: 60)
|
|
|
|
### Health Checks
|
|
|
|
The application should expose its own health endpoint:
|
|
- `GET /health`: Returns application health status
|
|
- `GET /metrics`: Prometheus-style metrics (optional)
|
|
|
|
## Security Considerations
|
|
|
|
1. **Consul Access**: No authentication required for your setup
|
|
2. **Database**: Ephemeral SQLite in container memory
|
|
3. **Web Interface**: Open dashboard, no authentication needed
|
|
4. **Input Validation**: Sanitize service names and configuration inputs
|
|
5. **Container Security**: Run as non-root user in container
|
|
|
|
## Future Enhancements
|
|
|
|
- **Alerting**: Email/Slack notifications on service failures (mentioned as future feature)
|
|
- **Service Filtering**: Search and filter capabilities for larger service lists
|
|
- **Service Details**: Detailed health check information popup/modal
|
|
- **Themes**: Dark/light mode toggle
|
|
- **Export**: Export health data as CSV/JSON
|
|
- **Custom Time Ranges**: Configurable history periods beyond 24 hours
|
|
|
|
## Development Phases
|
|
|
|
### Phase 1: Core Functionality
|
|
- Basic Consul integration
|
|
- SQLite database setup
|
|
- Simple web interface
|
|
- Manual refresh capability
|
|
|
|
### Phase 2: Real-time Features
|
|
- Auto-refresh functionality
|
|
- WebSocket integration
|
|
- Historical data visualization
|
|
- Configuration persistence
|
|
|
|
### Phase 3: Enhanced UX
|
|
- Responsive design
|
|
- Advanced filtering
|
|
- Performance optimizations
|
|
- Error handling improvements
|
|
|
|
### Phase 4: Production Ready
|
|
- Docker deployment
|
|
- Security hardening
|
|
- Monitoring and logging
|
|
- Documentation and testing |