phase 1 - error on refresh

This commit is contained in:
2025-08-09 19:48:49 -07:00
parent 20bda0cee0
commit a75ab4acce
3 changed files with 207 additions and 471 deletions

View File

@@ -1,477 +1,221 @@
# Phase 1 Implementation Plan - Consul Service Monitor
# Phase 1 Implementation Plan - Remaining Tasks
## Overview
Implement the core functionality for a Flask-based Consul service monitoring dashboard. This phase focuses on basic Consul integration, SQLite database setup, and a simple web interface with manual refresh capability.
## Current Status: ~90% Complete ✅
## Project Structure
Create the following directory structure:
```
consul-monitor/
├── app.py # Main Flask application
├── consul_client.py # Consul API integration
├── database.py # SQLite database operations
├── requirements.txt # Python dependencies
├── templates/
│ └── index.html # Main dashboard template
├── static/
│ ├── css/
│ │ └── style.css # Dashboard styles
│ └── js/
│ └── app.js # Frontend JavaScript with Alpine.js
└── Dockerfile # Container configuration
```
The codebase has been implemented very well with proper structure, error handling, and functionality. Only a few critical issues remain to be fixed.
## Dependencies (requirements.txt)
```
Flask==2.3.3
requests==2.31.0
```
## 🚨 Critical Issues to Fix
## Database Implementation (database.py)
### 1. Alpine.js Integration Problem (PRIORITY 1)
### Database Schema
Implement exactly these SQLite tables:
**Problem**: Alpine.js is not recognizing the `serviceMonitor` component, causing all frontend functionality to fail.
```sql
-- Services table
CREATE TABLE IF NOT EXISTS services (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
address TEXT,
port INTEGER,
tags TEXT, -- Store as JSON string
meta TEXT, -- Store as JSON string
first_seen DATETIME DEFAULT CURRENT_TIMESTAMP,
last_seen DATETIME DEFAULT CURRENT_TIMESTAMP
);
**Root Cause**: Script loading timing issue - Alpine.js is trying to initialize before the component is registered.
-- Health checks table
CREATE TABLE IF NOT EXISTS health_checks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
service_id TEXT NOT NULL,
check_name TEXT,
status TEXT NOT NULL, -- 'passing', 'warning', 'critical'
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (service_id) REFERENCES services (id)
);
**Solution**: Fix the script loading order in `templates/index.html`
-- Indexes for performance
CREATE INDEX IF NOT EXISTS idx_health_checks_service_timestamp
ON health_checks (service_id, timestamp);
```
### Database Functions
Create these specific functions in database.py:
1. **`init_database()`**: Initialize SQLite database with the above schema
2. **`upsert_service(service_data)`**: Insert or update service record
- Parameters: dictionary with id, name, address, port, tags (as JSON string), meta (as JSON string)
- Update last_seen timestamp on existing records
3. **`insert_health_check(service_id, check_name, status)`**: Insert health check record
4. **`get_all_services_with_health()`**: Return all services with their latest health status
- Join services table with latest health_checks record per service
- Return list of dictionaries with service details + current health status
5. **`get_service_history(service_id, hours=24)`**: Get health history for specific service
6. **`is_database_available()`**: Test database connectivity
## Consul Client Implementation (consul_client.py)
### Configuration
Set these constants:
```python
CONSUL_HOST = "consul.service.dc1.consul"
CONSUL_PORT = 8500
CONSUL_BASE_URL = f"http://{CONSUL_HOST}:{CONSUL_PORT}"
```
### Consul Functions
Implement these specific functions:
1. **`get_consul_services()`**:
- Call `/v1/agent/services` endpoint
- Return dictionary of services or raise exception on failure
- Handle HTTP errors and connection timeouts
2. **`get_service_health(service_name)`**:
- Call `/v1/health/service/{service_name}` endpoint
- Parse health check results
- Return list of health checks with check_name and status
- Handle cases where service has no health checks
3. **`is_consul_available()`**:
- Test connection to Consul
- Return True/False boolean
4. **`fetch_all_service_data()`**:
- Orchestrate calls to get_consul_services() and get_service_health()
- Return combined service and health data
- Handle partial failures gracefully
## Flask Application (app.py)
### Application Configuration
```python
from flask import Flask, render_template, jsonify
import sqlite3
import json
from datetime import datetime
```
### Flask Routes
Implement exactly these routes:
1. **`GET /`**:
- Render main dashboard using index.html template
- Pass initial service data to template
- Handle database/consul errors gracefully
2. **`GET /api/services`**:
- Return JSON array of all services with current health status
- Include generated URLs using pattern: `http://{service_name}.service.dc1.consul:{port}`
- Response format:
```json
{
"status": "success|error",
"consul_available": true|false,
"services": [
{
"id": "service-id",
"name": "service-name",
"address": "10.0.0.1",
"port": 8080,
"url": "http://service-name.service.dc1.consul:8080",
"tags": ["tag1", "tag2"],
"current_status": "passing|warning|critical|unknown",
"last_check": "2024-01-01T12:00:00"
}
],
"error": "error message if any"
}
```
3. **`GET /health`**:
- Return application health status
- Test both database and Consul connectivity
- Response format:
```json
{
"status": "healthy|unhealthy",
"consul": "connected|disconnected",
"database": "available|unavailable",
"timestamp": "2024-01-01T12:00:00"
}
```
### Data Flow Logic
Implement this exact flow in the `/api/services` endpoint:
1. Try to fetch fresh data from Consul using `fetch_all_service_data()`
2. If successful:
- Update database with new service and health data
- Return fresh data with `consul_available: true`
3. If Consul fails:
- Retrieve cached data from database using `get_all_services_with_health()`
- Return cached data with `consul_available: false` and error message
4. If both fail:
- Return error response with empty services array
## Frontend Implementation
### HTML Template (templates/index.html)
Create dashboard with this structure:
**Current code (problematic):**
```html
<!DOCTYPE html>
<html>
<head>
<title>Consul Service Monitor</title>
<link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">
<script src="https://unpkg.com/alpinejs@3.x.x/dist/cdn.min.js" defer></script>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body x-data="serviceMonitor()">
<div class="header">
<h1>Consul Service Monitor</h1>
<div class="controls">
<button @click="refreshServices()" :disabled="loading">
<span x-show="!loading">🔄 Refresh</span>
<span x-show="loading">Loading...</span>
</button>
</div>
</div>
<div x-show="error" class="error-banner" x-text="error"></div>
<div x-show="!consulAvailable" class="warning-banner">
⚠️ Consul connection failed - showing cached data
</div>
<div class="services-container">
<table class="services-table">
<thead>
<tr>
<th>Service Name</th>
<th>Status</th>
<th>URL</th>
<th>Tags</th>
</tr>
</thead>
<tbody>
<template x-for="service in services" :key="service.id">
<tr>
<td x-text="service.name"></td>
<td>
<span class="status-icon"
:class="getStatusClass(service.current_status)"
x-text="getStatusEmoji(service.current_status)">
</span>
</td>
<td>
<a :href="service.url" target="_blank" x-text="service.url"></a>
</td>
<td>
<template x-for="tag in service.tags">
<span class="tag" x-text="tag"></span>
</template>
</td>
</tr>
</template>
</tbody>
</table>
<div x-show="services.length === 0 && !loading" class="no-services">
No services found
</div>
</div>
</body>
</html>
<script src="https://unpkg.com/alpinejs@3.x.x/dist/cdn.min.js" defer></script>
<script src="{{ url_for('static', filename='js/app.js') }}" defer></script>
```
### Alpine.js JavaScript (static/js/app.js)
```javascript
function serviceMonitor() {
return {
services: [],
loading: false,
error: null,
consulAvailable: true,
init() {
this.refreshServices();
},
async refreshServices() {
this.loading = true;
this.error = null;
try {
const response = await fetch('/api/services');
const data = await response.json();
if (data.status === 'success') {
this.services = data.services;
this.consulAvailable = data.consul_available;
} else {
this.error = data.error || 'Failed to fetch services';
this.services = data.services || [];
this.consulAvailable = data.consul_available;
}
} catch (err) {
this.error = 'Network error: ' + err.message;
this.services = [];
this.consulAvailable = false;
} finally {
this.loading = false;
}
},
getStatusClass(status) {
return {
'status-passing': status === 'passing',
'status-warning': status === 'warning',
'status-critical': status === 'critical',
'status-unknown': !status || status === 'unknown'
};
},
getStatusEmoji(status) {
switch(status) {
case 'passing': return '🟢';
case 'warning': return '🟡';
case 'critical': return '🔴';
default: return '⚪';
}
}
**Fixed code:**
```html
<!-- Load Alpine.js but don't auto-start -->
<script src="https://unpkg.com/alpinejs@3.x.x/dist/cdn.min.js" defer></script>
<script>
// Prevent Alpine from auto-starting
window.deferLoadingAlpine = function (alpine) {
window.Alpine = alpine;
}
}
</script>
<script src="{{ url_for('static', filename='js/app.js') }}"></script>
<script>
// Start Alpine after our components are loaded
document.addEventListener('DOMContentLoaded', function() {
Alpine.start();
});
</script>
```
### CSS Styling (static/css/style.css)
Implement these specific styles:
```css
/* Basic reset and layout */
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: Arial, sans-serif; background: #f5f5f5; }
**Alternative simpler fix** (modify `static/js/app.js`):
```javascript
// Add this at the top of app.js
document.addEventListener('alpine:init', () => {
Alpine.data('serviceMonitor', serviceMonitor);
});
/* Header */
.header {
background: white;
padding: 1rem 2rem;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
display: flex;
justify-content: space-between;
align-items: center;
}
/* Alert banners */
.error-banner, .warning-banner {
padding: 0.75rem 2rem;
margin: 0;
font-weight: bold;
}
.error-banner { background: #fee; color: #c33; }
.warning-banner { background: #fff3cd; color: #856404; }
/* Services table */
.services-container { padding: 2rem; }
.services-table {
width: 100%;
background: white;
border-radius: 8px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
border-collapse: collapse;
}
.services-table th, .services-table td {
padding: 1rem;
text-align: left;
border-bottom: 1px solid #eee;
}
.services-table th { background: #f8f9fa; font-weight: bold; }
/* Status indicators */
.status-icon { font-size: 1.2rem; }
.status-passing { color: #28a745; }
.status-warning { color: #ffc107; }
.status-critical { color: #dc3545; }
.status-unknown { color: #6c757d; }
/* Tags */
.tag {
display: inline-block;
background: #e9ecef;
padding: 0.25rem 0.5rem;
border-radius: 4px;
font-size: 0.875rem;
margin-right: 0.5rem;
}
/* Buttons */
button {
background: #007bff;
color: white;
border: none;
padding: 0.5rem 1rem;
border-radius: 4px;
cursor: pointer;
}
button:hover { background: #0056b3; }
button:disabled { background: #6c757d; cursor: not-allowed; }
// Remove the existing registration code
```
## Error Handling Requirements
### 2. Missing Favicon (Minor)
### Consul Connection Errors
- Catch `requests.exceptions.ConnectionError` and `requests.exceptions.Timeout`
- Log errors but continue serving cached data
- Display connection status in UI
**Problem**: 404 error for `/favicon.ico`
### Database Errors
- Handle SQLite database lock errors
- Graceful degradation when database is unavailable
- Return appropriate HTTP status codes
### Data Validation
- Validate service data structure from Consul API
- Handle missing or malformed service records
- Default to 'unknown' status for services without health checks
## Testing Checklist
Before considering Phase 1 complete, verify:
1. **Database Operations**:
- [ ] Database tables created correctly
- [ ] Services can be inserted/updated
- [ ] Health checks are stored with timestamps
- [ ] Queries return expected data structure
2. **Consul Integration**:
- [ ] Can fetch service list from Consul
- [ ] Can fetch health status for each service
- [ ] Handles Consul connection failures gracefully
- [ ] Service URLs generated correctly
3. **Web Interface**:
- [ ] Dashboard loads without errors
- [ ] Services displayed in table format
- [ ] Status icons show correct colors
- [ ] Refresh button updates data via AJAX
- [ ] Error messages display when appropriate
4. **Error Scenarios**:
- [ ] App starts when Consul is unavailable
- [ ] Shows cached data when Consul fails
- [ ] Displays appropriate error messages
- [ ] Recovers when Consul comes back online
## Docker Configuration (Dockerfile)
```dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Create non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
# Expose port
EXPOSE 5000
# Environment variables
ENV FLASK_APP=app.py
ENV FLASK_ENV=production
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:5000/health', timeout=5)" || exit 1
CMD ["python", "-m", "flask", "run", "--host=0.0.0.0"]
**Solution**: Add favicon route to `app.py`:
```python
@app.route('/favicon.ico')
def favicon():
return '', 204 # No Content response
```
## Implementation Order
Follow this exact sequence:
## 📋 Remaining Implementation Tasks
1. Create project structure and requirements.txt
2. Implement database.py with all functions and test database operations
3. Implement consul_client.py and test Consul connectivity
4. Create basic Flask app.py with health endpoint
5. Add /api/services endpoint with full error handling
6. Create HTML template with Alpine.js integration
7. Add CSS styling for professional appearance
8. Test complete workflow: Consul → Database → API → Frontend
9. Create Dockerfile and test containerized deployment
10. Verify all error scenarios work as expected
### Task 1: Fix Alpine.js Integration
- [ ] **Option A**: Update HTML template script loading order (recommended)
- [ ] **Option B**: Modify JavaScript to use `alpine:init` event
- [ ] Test that all Alpine.js directives work correctly
- [ ] Verify refresh button functionality
- [ ] Confirm error/warning banners display properly
## Success Criteria
Phase 1 is complete when:
- Application starts successfully in Docker container
- Dashboard displays list of services from Consul
- Manual refresh button updates service data
- Application gracefully handles Consul outages
- All services show correct health status with colored indicators
- Generated service URLs follow the specified pattern
- Error messages display appropriately in the UI
### Task 2: Add Favicon Handler
- [ ] Add favicon route to prevent 404 errors
- [ ] Optionally add actual favicon file to static directory
### Task 3: Final Testing Checklist
- [ ] **Frontend Functionality**:
- [ ] Dashboard loads without Alpine.js errors
- [ ] Refresh button works and shows loading state
- [ ] Services display in table with proper status icons
- [ ] Error/warning banners show when appropriate
- [ ] Service URLs are clickable and correct
- [ ] **Backend Integration**:
- [ ] `/api/services` returns proper JSON response
- [ ] Consul unavailable scenario shows cached data
- [ ] Health endpoint returns correct status
- [ ] Database operations work correctly
- [ ] **Error Scenarios**:
- [ ] App starts when Consul is down
- [ ] Graceful fallback to cached data
- [ ] Proper error messages in UI
- [ ] Recovery when Consul comes online
## 🔧 Specific Code Changes Needed
### File: `templates/index.html`
**Replace the script section with:**
```html
<script src="https://unpkg.com/alpinejs@3.x.x/dist/cdn.min.js" defer></script>
<script>
document.addEventListener('alpine:init', () => {
Alpine.data('serviceMonitor', () => ({
services: [],
loading: false,
error: null,
consulAvailable: true,
init() {
this.refreshServices();
},
async refreshServices() {
this.loading = true;
this.error = null;
try {
const response = await fetch('/api/services');
const data = await response.json();
if (data.status === 'success') {
this.services = data.services;
this.consulAvailable = data.consul_available;
} else {
this.error = data.error || 'Failed to fetch services';
this.services = data.services || [];
this.consulAvailable = data.consul_available;
}
} catch (err) {
this.error = 'Network error: ' + err.message;
this.services = [];
this.consulAvailable = false;
} finally {
this.loading = false;
}
},
getStatusClass(status) {
return {
'status-passing': status === 'passing',
'status-warning': status === 'warning',
'status-critical': status === 'critical',
'status-unknown': !status || status === 'unknown'
};
},
getStatusEmoji(status) {
switch(status) {
case 'passing': return '🟢';
case 'warning': return '🟡';
case 'critical': return '🔴';
default: return '⚪';
}
}
}));
});
</script>
```
### File: `app.py`
**Add favicon route:**
```python
@app.route('/favicon.ico')
def favicon():
return '', 204
```
## 🎯 Success Criteria (Updated)
Phase 1 will be complete when:
- [x] Application starts successfully in Docker container
- [x] Backend API endpoints return correct data
- [x] Database operations work correctly
- [x] Consul integration handles failures gracefully
- [ ] **Dashboard displays without Alpine.js errors** ⚠️
- [ ] **Manual refresh button updates service data** ⚠️
- [x] Application gracefully handles Consul outages
- [x] Services show correct health status structure
- [x] Generated service URLs follow specified pattern
- [x] Error handling works for all scenarios
## 🚀 Quick Fix Implementation
### Immediate Action Plan (30 minutes):
1. **Fix Alpine.js (15 minutes)**:
- Replace script section in `index.html` with the code above
- Remove the separate `app.js` file (inline the code)
- Test the dashboard loads without errors
2. **Add favicon handler (5 minutes)**:
- Add favicon route to `app.py`
- Restart application
3. **Test complete workflow (10 minutes)**:
- Verify dashboard loads
- Test refresh button
- Check error scenarios
- Confirm all functionality works
## 📊 Implementation Progress
| Component | Status | Notes |
|-----------|--------|--------|
| Database Layer | ✅ Complete | All functions implemented correctly |
| Consul Client | ✅ Complete | Proper error handling included |
| Flask Application | ✅ Complete | All routes working, minor favicon fix needed |
| HTML Template | ⚠️ 95% Complete | Alpine.js integration issue only |
| CSS Styling | ✅ Complete | Professional appearance achieved |
| JavaScript Logic | ⚠️ Integration Issue | Code is correct, loading order problem |
| Docker Setup | ✅ Complete | Production-ready configuration |
| Error Handling | ✅ Complete | Comprehensive error scenarios covered |
## 🎉 Conclusion
The implementation is excellent and nearly complete! The Alpine.js integration issue is the only significant blocker preventing full functionality. Once fixed, Phase 1 will be fully operational and ready for Phase 2 enhancements.
**Estimated time to completion: 30 minutes**