Alert Configuration
Create intelligent alerts that notify the right people at the right time, with built-in noise reduction and escalation policies.
Creating Alert Rules
Alert rules define when and how your team gets notified about database issues. DB24x7 supports metric-based alerts, anomaly-based alerts, and composite conditions.
Basic Alert Rule Structure
{
"name": "High CPU Usage Alert",
"description": "Alert when CPU usage exceeds threshold",
"database": "prod-api-db-01",
"enabled": true,
"conditions": {
"metric": "cpu_usage",
"operator": "greater_than",
"threshold": 85,
"duration": "5m",
"evaluation_interval": "1m"
},
"severity": "warning",
"channels": ["slack", "pagerduty"],
"metadata": {
"tags": ["performance", "production"],
"runbook_url": "https://wiki.example.com/high-cpu"
}
}Metric-Based Alerts
Trigger alerts when metrics cross defined thresholds.
- • CPU, Memory, Disk I/O usage
- • Query response time
- • Connection count
- • Replication lag
- • Transaction rate
- • Error rate
Anomaly-Based Alerts
Use ML to detect unusual patterns without fixed thresholds.
- • Sudden traffic spikes
- • Query pattern changes
- • Unusual error patterns
- • Unexpected resource usage
- • Abnormal data growth
- • Latency anomalies
Threshold Configuration
Fine-tune alert sensitivity with multiple threshold levels and time-based conditions to avoid false positives.
Multi-Level Thresholds
{
"name": "Query Performance Degradation",
"conditions": [
{
"severity": "info",
"metric": "avg_query_time",
"operator": "greater_than",
"threshold": 100,
"duration": "10m"
},
{
"severity": "warning",
"metric": "avg_query_time",
"operator": "greater_than",
"threshold": 250,
"duration": "5m"
},
{
"severity": "critical",
"metric": "avg_query_time",
"operator": "greater_than",
"threshold": 500,
"duration": "2m"
}
],
"auto_escalate": true
}Composite Conditions
Combine multiple conditions to create more accurate alerts that reduce false positives.
{
"name": "Database Under Heavy Load",
"conditions": {
"operator": "AND",
"rules": [
{
"metric": "cpu_usage",
"operator": "greater_than",
"threshold": 80,
"duration": "5m"
},
{
"metric": "active_connections",
"operator": "greater_than",
"threshold": 100,
"duration": "5m"
},
{
"metric": "avg_query_time",
"operator": "greater_than",
"threshold": 200,
"duration": "5m"
}
]
}
}Duration Windows
Require conditions to be true for a specified duration before alerting (e.g., 5m, 15m, 1h).
Evaluation Interval
How frequently to check conditions. Balance between responsiveness and resource usage.
Recovery Threshold
Define when alerts auto-resolve, preventing flapping between states.
Alert Channels
Send alerts to your team through multiple channels based on severity and team preferences.
Slack Integration
Send rich, interactive alerts to Slack channels with one-click actions.
{
"type": "slack",
"webhook_url": "https://hooks.slack.com/...",
"channel": "#database-alerts",
"mention_users": ["@dba-team"],
"include_chart": true,
"thread_replies": true
}PagerDuty Integration
Create incidents with on-call rotation for critical alerts.
{
"type": "pagerduty",
"integration_key": "abc123...",
"severity_mapping": {
"critical": "critical",
"warning": "warning",
"info": "info"
},
"auto_resolve": true
}Email Notifications
Send formatted email alerts with detailed context and charts.
{
"type": "email",
"recipients": [
"[email protected]",
"[email protected]"
],
"cc": ["[email protected]"],
"include_graphs": true,
"html_format": true
}Custom Webhooks
Integrate with any system using custom webhook endpoints.
{
"type": "webhook",
"url": "https://api.company.com/alerts",
"method": "POST",
"headers": {
"Authorization": "Bearer token..."
},
"retry_policy": {
"max_attempts": 3,
"backoff": "exponential"
}
}Channel Routing by Severity
{
"routing_rules": [
{
"severity": "info",
"channels": ["email"]
},
{
"severity": "warning",
"channels": ["slack", "email"]
},
{
"severity": "critical",
"channels": ["pagerduty", "slack", "email"],
"escalate_after": "15m"
}
]
}Alert Grouping and Noise Reduction
Intelligent grouping and deduplication prevent alert fatigue and help teams focus on what matters.
Alert Grouping
Combine related alerts into a single notification to reduce noise.
Group By Options:
- • Database: Group all alerts from the same database
- • Alert Rule: Combine repeated triggers of the same rule
- • Severity: Group alerts by severity level
- • Tags: Custom grouping using alert tags
- • Time Window: Group alerts within a time period (e.g., 5 minutes)
{
"grouping": {
"enabled": true,
"group_by": ["database", "severity"],
"group_window": "5m",
"max_group_size": 10
}
}Alert Suppression
Temporarily silence alerts during maintenance windows or known issues.
{
"suppression_rules": [
{
"name": "Weekend Maintenance",
"schedule": {
"days": ["saturday", "sunday"],
"time": "02:00-06:00",
"timezone": "America/New_York"
},
"suppress_alerts": ["info", "warning"]
},
{
"name": "Known Issue - Replica Lag",
"conditions": {
"database": "prod-replica-02",
"alert_name": "High Replication Lag"
},
"expires_at": "2026-02-15T00:00:00Z",
"reason": "Hardware replacement scheduled"
}
]
}Rate Limiting
Limit notification frequency to prevent alert storms.
{
"rate_limiting": {
"max_alerts_per_hour": 10,
"max_alerts_per_day": 50,
"backoff_strategy": "exponential",
"summary_after_limit": true
}
}Escalation Policies
Ensure critical alerts get attention by automatically escalating to senior team members or different channels if not acknowledged.
Multi-Level Escalation
{
"escalation_policy": {
"name": "Critical Database Alerts",
"levels": [
{
"level": 1,
"notify": ["@on-call-dba"],
"channels": ["slack", "pagerduty"],
"wait_time": "15m"
},
{
"level": 2,
"notify": ["@senior-dba", "@ops-lead"],
"channels": ["pagerduty", "phone"],
"wait_time": "15m"
},
{
"level": 3,
"notify": ["@engineering-director"],
"channels": ["pagerduty", "phone", "sms"],
"wait_time": "30m"
}
],
"repeat_final_level": true,
"repeat_interval": "30m"
}
}Time-Based Escalation
Escalate based on how long an alert has been active without resolution.
{
"time_based_escalation": {
"rules": [
{
"after": "30m",
"action": "increase_severity",
"from": "warning",
"to": "critical"
},
{
"after": "1h",
"action": "notify_additional",
"recipients": ["@senior-leadership"]
},
{
"after": "2h",
"action": "create_incident",
"incident_severity": "sev-1"
}
]
}
}Acknowledgment Tracking
Track who acknowledged alerts and when. Escalate if not acknowledged within the specified time window.
On-Call Rotation
Integrate with PagerDuty or custom on-call schedules to route alerts to the right person based on current shift.
Alert Configuration Best Practices
Start Conservative
Begin with higher thresholds and longer duration windows. Tune down as you understand normal baseline behavior to avoid alert fatigue.
Use Alert Severity Appropriately
Reserve "Critical" for issues requiring immediate action. Overuse leads to alert fatigue and missed important notifications.
Include Runbook Links
Every alert should include a link to documentation or runbook with troubleshooting steps and context.
Review and Refine Regularly
Schedule monthly reviews of alert effectiveness. Disable noisy alerts and create new ones based on recent incidents.