TimescaleDB Monitoring and Alerting
This guide covers comprehensive monitoring and alerting strategies for TimescaleDB in production environments.
Overview
Effective monitoring is crucial for maintaining TimescaleDB performance and ensuring long-term system health. The PRS system includes automated monitoring for:
- System health - Database availability and performance
- Storage usage - Disk space and growth trends
- Compression efficiency - Data compression effectiveness
- Query performance - Response times and slow queries
- Backup freshness - Data protection verification
Key Metrics to Monitor
1. Database Health Metrics
| Metric |
Threshold |
Alert Level |
Description |
| Database Availability |
Down |
Critical |
Database connection status |
| Extension Status |
Disabled |
Critical |
TimescaleDB extension availability |
| Hypertable Count |
< Expected |
Warning |
Number of hypertables created |
2. Storage Metrics
| Metric |
Threshold |
Alert Level |
Description |
| Disk Usage |
> 80% |
Critical |
Overall disk space utilization |
| Database Size |
> 40 GB |
Warning |
Total database size |
| Growth Rate |
> 200% expected |
Warning |
Daily/monthly growth rate |
3. Compression Metrics
| Metric |
Threshold |
Alert Level |
Description |
| Compression Ratio |
< 60% |
Warning |
Overall compression effectiveness |
| Uncompressed Chunks |
> 7 days old |
Warning |
Chunks eligible for compression |
| Compression Policies |
Missing |
Critical |
Compression policy configuration |
| Metric |
Threshold |
Alert Level |
Description |
| Average Query Time |
> 5000ms |
Warning |
Query performance degradation |
| Slow Query Count |
> 10/hour |
Warning |
Number of slow queries |
| Connection Count |
> 25 |
Warning |
Database connection usage |
1. Health Monitoring Script
The timescaledb-alerts.sh script provides comprehensive health monitoring:
| Bash |
|---|
| # Run health monitoring
./scripts/timescaledb-alerts.sh
# Output includes:
# - Disk usage check
# - Database size monitoring
# - Compression efficiency
# - Query performance
# - Backup freshness
|
Features:
- Automated thresholds - Configurable alert levels
- Exit codes - 0 for success, 1 for alerts
- Syslog integration - Alerts logged to system logs
- Comprehensive reporting - Detailed status information
2. Maintenance Status Commands
| Bash |
|---|
| # Check compression status
./scripts/deploy-ec2.sh timescaledb-maintenance status
# View storage usage report
./scripts/deploy-ec2.sh timescaledb-maintenance storage
# Check TimescaleDB status
./scripts/deploy-ec2.sh timescaledb-status
|
3. Database Queries for Monitoring
Hypertable Status
| SQL |
|---|
| -- View all hypertables and their status
SELECT
hypertable_name,
num_chunks,
pg_size_pretty(pg_total_relation_size('public.' || hypertable_name)) as size
FROM timescaledb_information.hypertables
ORDER BY pg_total_relation_size('public.' || hypertable_name) DESC;
|
Compression Status
| SQL |
|---|
| -- Check compression efficiency
SELECT
h.hypertable_name,
COUNT(c.chunk_name) as total_chunks,
COUNT(CASE WHEN c.is_compressed THEN 1 END) as compressed_chunks,
ROUND(
(COUNT(CASE WHEN c.is_compressed THEN 1 END)::float / COUNT(c.chunk_name)::float) * 100,
2
) as compression_ratio
FROM timescaledb_information.hypertables h
LEFT JOIN timescaledb_information.chunks c ON h.hypertable_name = c.hypertable_name
GROUP BY h.hypertable_name
ORDER BY compression_ratio DESC;
|
Storage Growth
| SQL |
|---|
| -- Monitor database size
SELECT
'Total Database Size' as metric,
pg_size_pretty(pg_database_size(current_database())) as size;
-- Check largest tables
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
|
Automated Monitoring Setup
1. Cron Job Configuration
Add to production server crontab (crontab -e):
| Bash |
|---|
| # Health monitoring every 30 minutes
*/30 * * * * cd /home/ubuntu/prs-prod/prs-production-deployment/ec2-graviton-setup && ./scripts/timescaledb-alerts.sh >> /var/log/timescaledb-alerts.log 2>&1
# Daily status report at 6 AM
0 6 * * * cd /home/ubuntu/prs-prod/prs-production-deployment/ec2-graviton-setup && ./scripts/timescaledb-maintenance.sh status >> /var/log/timescaledb-status.log 2>&1
# Weekly storage report on Mondays at 7 AM
0 7 * * 1 cd /home/ubuntu/prs-prod/prs-production-deployment/ec2-graviton-setup && ./scripts/timescaledb-maintenance.sh storage >> /var/log/timescaledb-storage.log 2>&1
|
2. Log Management
Create log rotation configuration (/etc/logrotate.d/timescaledb):
| Bash |
|---|
| /var/log/timescaledb-*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 644 ubuntu ubuntu
}
|
3. System Integration
Systemd Service Monitoring
| Bash |
|---|
| # Check if TimescaleDB container is running
systemctl status docker
# Monitor container health
docker ps | grep timescale
docker stats prs-ec2-postgres-timescale --no-stream
|
Resource Monitoring
| Bash |
|---|
| # Monitor system resources
./scripts/deploy-ec2.sh monitor
# Check memory usage
free -h
# Check disk usage
df -h
|
Alert Configuration
1. Alert Thresholds
The monitoring system uses these configurable thresholds:
| Bash |
|---|
| # In timescaledb-alerts.sh
DISK_USAGE_THRESHOLD=80 # Disk usage percentage
COMPRESSION_RATIO_THRESHOLD=60 # Compression efficiency percentage
DB_SIZE_THRESHOLD_GB=40 # Database size in GB
QUERY_TIME_THRESHOLD_MS=5000 # Query time in milliseconds
|
2. Alert Actions
When alerts are triggered:
- Log to syslog - All alerts logged with
timescaledb-alert tag
- Exit with error code - Script exits with code 1
- Detailed reporting - Specific issue information provided
- Recommended actions - Suggested remediation steps
3. Integration with External Systems
Email Notifications
| Bash |
|---|
| # Add to timescaledb-alerts.sh
send_email_alert() {
local message="$1"
echo "$message" | mail -s "TimescaleDB Alert - PRS Production" admin@your-domain.com
}
|
Slack Integration
| Bash |
|---|
| # Add to timescaledb-alerts.sh
send_slack_alert() {
local message="$1"
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$message\"}" \
YOUR_SLACK_WEBHOOK_URL
}
|
Enable pg_stat_statements for query monitoring:
| SQL |
|---|
| -- Enable extension (if not already enabled)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- View slow queries
SELECT
query,
calls,
total_time,
mean_time,
rows
FROM pg_stat_statements
WHERE mean_time > 1000 -- Queries taking more than 1 second
ORDER BY mean_time DESC
LIMIT 10;
|
2. Connection Monitoring
| SQL |
|---|
| -- Monitor active connections
SELECT
count(*) as total_connections,
count(*) FILTER (WHERE state = 'active') as active_connections,
count(*) FILTER (WHERE state = 'idle') as idle_connections
FROM pg_stat_activity;
-- Check connection limits
SELECT
setting as max_connections,
(SELECT count(*) FROM pg_stat_activity) as current_connections
FROM pg_settings
WHERE name = 'max_connections';
|
3. Lock Monitoring
| SQL |
|---|
| -- Check for blocking queries
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
|
Monitoring Dashboard
Key Metrics to Display
- System Health
- Database availability status
- TimescaleDB extension status
-
Container health
-
Storage Metrics
- Total database size
- Disk usage percentage
-
Growth rate trends
-
Compression Metrics
- Overall compression ratio
- Compression policy status
-
Storage savings
-
Performance Metrics
- Average query response time
- Active connections
- Slow query count
Sample Monitoring Commands
| Bash |
|---|
| # Create a simple monitoring script
#!/bin/bash
echo "=== TimescaleDB Health Dashboard ==="
echo "Date: $(date)"
echo ""
# System status
echo "System Status:"
./scripts/timescaledb-alerts.sh > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo " ✅ All systems healthy"
else
echo " ⚠️ Alerts detected - check logs"
fi
# Storage status
echo ""
echo "Storage Status:"
./scripts/timescaledb-maintenance.sh storage | grep -E "(Total Database Size|compression_ratio)"
# Performance status
echo ""
echo "Performance Status:"
./scripts/deploy-ec2.sh timescaledb-status | grep -E "(hypertables|TimescaleDB)"
|
Comprehensive monitoring ensures your TimescaleDB system remains healthy and performant for years of operation!