Skip to content

TimescaleDB Monitoring and Alerting

This guide covers comprehensive monitoring and alerting strategies for TimescaleDB in production environments.

Overview

Effective monitoring is crucial for maintaining TimescaleDB performance and ensuring long-term system health. The PRS system includes automated monitoring for:

  • System health - Database availability and performance
  • Storage usage - Disk space and growth trends
  • Compression efficiency - Data compression effectiveness
  • Query performance - Response times and slow queries
  • Backup freshness - Data protection verification

Key Metrics to Monitor

1. Database Health Metrics

Metric Threshold Alert Level Description
Database Availability Down Critical Database connection status
Extension Status Disabled Critical TimescaleDB extension availability
Hypertable Count < Expected Warning Number of hypertables created

2. Storage Metrics

Metric Threshold Alert Level Description
Disk Usage > 80% Critical Overall disk space utilization
Database Size > 40 GB Warning Total database size
Growth Rate > 200% expected Warning Daily/monthly growth rate

3. Compression Metrics

Metric Threshold Alert Level Description
Compression Ratio < 60% Warning Overall compression effectiveness
Uncompressed Chunks > 7 days old Warning Chunks eligible for compression
Compression Policies Missing Critical Compression policy configuration

4. Performance Metrics

Metric Threshold Alert Level Description
Average Query Time > 5000ms Warning Query performance degradation
Slow Query Count > 10/hour Warning Number of slow queries
Connection Count > 25 Warning Database connection usage

Monitoring Tools

1. Health Monitoring Script

The timescaledb-alerts.sh script provides comprehensive health monitoring:

Bash
1
2
3
4
5
6
7
8
9
# Run health monitoring
./scripts/timescaledb-alerts.sh

# Output includes:
# - Disk usage check
# - Database size monitoring
# - Compression efficiency
# - Query performance
# - Backup freshness

Features: - Automated thresholds - Configurable alert levels - Exit codes - 0 for success, 1 for alerts - Syslog integration - Alerts logged to system logs - Comprehensive reporting - Detailed status information

2. Maintenance Status Commands

Bash
1
2
3
4
5
6
7
8
# Check compression status
./scripts/deploy-ec2.sh timescaledb-maintenance status

# View storage usage report
./scripts/deploy-ec2.sh timescaledb-maintenance storage

# Check TimescaleDB status
./scripts/deploy-ec2.sh timescaledb-status

3. Database Queries for Monitoring

Hypertable Status

SQL
1
2
3
4
5
6
7
-- View all hypertables and their status
SELECT
    hypertable_name,
    num_chunks,
    pg_size_pretty(pg_total_relation_size('public.' || hypertable_name)) as size
FROM timescaledb_information.hypertables
ORDER BY pg_total_relation_size('public.' || hypertable_name) DESC;

Compression Status

SQL
-- Check compression efficiency
SELECT
    h.hypertable_name,
    COUNT(c.chunk_name) as total_chunks,
    COUNT(CASE WHEN c.is_compressed THEN 1 END) as compressed_chunks,
    ROUND(
        (COUNT(CASE WHEN c.is_compressed THEN 1 END)::float / COUNT(c.chunk_name)::float) * 100,
        2
    ) as compression_ratio
FROM timescaledb_information.hypertables h
LEFT JOIN timescaledb_information.chunks c ON h.hypertable_name = c.hypertable_name
GROUP BY h.hypertable_name
ORDER BY compression_ratio DESC;

Storage Growth

SQL
-- Monitor database size
SELECT
    'Total Database Size' as metric,
    pg_size_pretty(pg_database_size(current_database())) as size;

-- Check largest tables
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

Automated Monitoring Setup

1. Cron Job Configuration

Add to production server crontab (crontab -e):

Bash
1
2
3
4
5
6
7
8
# Health monitoring every 30 minutes
*/30 * * * * cd /home/ubuntu/prs-prod/prs-production-deployment/ec2-graviton-setup && ./scripts/timescaledb-alerts.sh >> /var/log/timescaledb-alerts.log 2>&1

# Daily status report at 6 AM
0 6 * * * cd /home/ubuntu/prs-prod/prs-production-deployment/ec2-graviton-setup && ./scripts/timescaledb-maintenance.sh status >> /var/log/timescaledb-status.log 2>&1

# Weekly storage report on Mondays at 7 AM
0 7 * * 1 cd /home/ubuntu/prs-prod/prs-production-deployment/ec2-graviton-setup && ./scripts/timescaledb-maintenance.sh storage >> /var/log/timescaledb-storage.log 2>&1

2. Log Management

Create log rotation configuration (/etc/logrotate.d/timescaledb):

Bash
1
2
3
4
5
6
7
8
9
/var/log/timescaledb-*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 644 ubuntu ubuntu
}

3. System Integration

Systemd Service Monitoring

Bash
1
2
3
4
5
6
# Check if TimescaleDB container is running
systemctl status docker

# Monitor container health
docker ps | grep timescale
docker stats prs-ec2-postgres-timescale --no-stream

Resource Monitoring

Bash
1
2
3
4
5
6
7
8
# Monitor system resources
./scripts/deploy-ec2.sh monitor

# Check memory usage
free -h

# Check disk usage
df -h

Alert Configuration

1. Alert Thresholds

The monitoring system uses these configurable thresholds:

Bash
1
2
3
4
5
# In timescaledb-alerts.sh
DISK_USAGE_THRESHOLD=80          # Disk usage percentage
COMPRESSION_RATIO_THRESHOLD=60   # Compression efficiency percentage
DB_SIZE_THRESHOLD_GB=40          # Database size in GB
QUERY_TIME_THRESHOLD_MS=5000     # Query time in milliseconds

2. Alert Actions

When alerts are triggered:

  1. Log to syslog - All alerts logged with timescaledb-alert tag
  2. Exit with error code - Script exits with code 1
  3. Detailed reporting - Specific issue information provided
  4. Recommended actions - Suggested remediation steps

3. Integration with External Systems

Email Notifications

Bash
1
2
3
4
5
# Add to timescaledb-alerts.sh
send_email_alert() {
    local message="$1"
    echo "$message" | mail -s "TimescaleDB Alert - PRS Production" admin@your-domain.com
}

Slack Integration

Bash
1
2
3
4
5
6
7
# Add to timescaledb-alerts.sh
send_slack_alert() {
    local message="$1"
    curl -X POST -H 'Content-type: application/json' \
        --data "{\"text\":\"$message\"}" \
        YOUR_SLACK_WEBHOOK_URL
}

Performance Monitoring

1. Query Performance Tracking

Enable pg_stat_statements for query monitoring:

SQL
-- Enable extension (if not already enabled)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- View slow queries
SELECT
    query,
    calls,
    total_time,
    mean_time,
    rows
FROM pg_stat_statements
WHERE mean_time > 1000  -- Queries taking more than 1 second
ORDER BY mean_time DESC
LIMIT 10;

2. Connection Monitoring

SQL
-- Monitor active connections
SELECT
    count(*) as total_connections,
    count(*) FILTER (WHERE state = 'active') as active_connections,
    count(*) FILTER (WHERE state = 'idle') as idle_connections
FROM pg_stat_activity;

-- Check connection limits
SELECT
    setting as max_connections,
    (SELECT count(*) FROM pg_stat_activity) as current_connections
FROM pg_settings
WHERE name = 'max_connections';

3. Lock Monitoring

SQL
-- Check for blocking queries
SELECT
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

Monitoring Dashboard

Key Metrics to Display

  1. System Health
  2. Database availability status
  3. TimescaleDB extension status
  4. Container health

  5. Storage Metrics

  6. Total database size
  7. Disk usage percentage
  8. Growth rate trends

  9. Compression Metrics

  10. Overall compression ratio
  11. Compression policy status
  12. Storage savings

  13. Performance Metrics

  14. Average query response time
  15. Active connections
  16. Slow query count

Sample Monitoring Commands

Bash
# Create a simple monitoring script
#!/bin/bash
echo "=== TimescaleDB Health Dashboard ==="
echo "Date: $(date)"
echo ""

# System status
echo "System Status:"
./scripts/timescaledb-alerts.sh > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "  ✅ All systems healthy"
else
    echo "  ⚠️  Alerts detected - check logs"
fi

# Storage status
echo ""
echo "Storage Status:"
./scripts/timescaledb-maintenance.sh storage | grep -E "(Total Database Size|compression_ratio)"

# Performance status
echo ""
echo "Performance Status:"
./scripts/deploy-ec2.sh timescaledb-status | grep -E "(hypertables|TimescaleDB)"

Comprehensive monitoring ensures your TimescaleDB system remains healthy and performant for years of operation!