Back to Documentation

Alert Configuration

Create intelligent alerts that notify the right people at the right time, with built-in noise reduction and escalation policies.

Creating Alert Rules

Alert rules define when and how your team gets notified about database issues. DB24x7 supports metric-based alerts, anomaly-based alerts, and composite conditions.

Basic Alert Rule Structure

{
  "name": "High CPU Usage Alert",
  "description": "Alert when CPU usage exceeds threshold",
  "database": "prod-api-db-01",
  "enabled": true,

  "conditions": {
    "metric": "cpu_usage",
    "operator": "greater_than",
    "threshold": 85,
    "duration": "5m",
    "evaluation_interval": "1m"
  },

  "severity": "warning",
  "channels": ["slack", "pagerduty"],

  "metadata": {
    "tags": ["performance", "production"],
    "runbook_url": "https://wiki.example.com/high-cpu"
  }
}

Metric-Based Alerts

Trigger alerts when metrics cross defined thresholds.

  • • CPU, Memory, Disk I/O usage
  • • Query response time
  • • Connection count
  • • Replication lag
  • • Transaction rate
  • • Error rate

Anomaly-Based Alerts

Use ML to detect unusual patterns without fixed thresholds.

  • • Sudden traffic spikes
  • • Query pattern changes
  • • Unusual error patterns
  • • Unexpected resource usage
  • • Abnormal data growth
  • • Latency anomalies

Threshold Configuration

Fine-tune alert sensitivity with multiple threshold levels and time-based conditions to avoid false positives.

Multi-Level Thresholds

{
  "name": "Query Performance Degradation",
  "conditions": [
    {
      "severity": "info",
      "metric": "avg_query_time",
      "operator": "greater_than",
      "threshold": 100,
      "duration": "10m"
    },
    {
      "severity": "warning",
      "metric": "avg_query_time",
      "operator": "greater_than",
      "threshold": 250,
      "duration": "5m"
    },
    {
      "severity": "critical",
      "metric": "avg_query_time",
      "operator": "greater_than",
      "threshold": 500,
      "duration": "2m"
    }
  ],
  "auto_escalate": true
}

Composite Conditions

Combine multiple conditions to create more accurate alerts that reduce false positives.

{
  "name": "Database Under Heavy Load",
  "conditions": {
    "operator": "AND",
    "rules": [
      {
        "metric": "cpu_usage",
        "operator": "greater_than",
        "threshold": 80,
        "duration": "5m"
      },
      {
        "metric": "active_connections",
        "operator": "greater_than",
        "threshold": 100,
        "duration": "5m"
      },
      {
        "metric": "avg_query_time",
        "operator": "greater_than",
        "threshold": 200,
        "duration": "5m"
      }
    ]
  }
}

Duration Windows

Require conditions to be true for a specified duration before alerting (e.g., 5m, 15m, 1h).

Evaluation Interval

How frequently to check conditions. Balance between responsiveness and resource usage.

Recovery Threshold

Define when alerts auto-resolve, preventing flapping between states.

Alert Channels

Send alerts to your team through multiple channels based on severity and team preferences.

Slack Integration

Send rich, interactive alerts to Slack channels with one-click actions.

{ "type": "slack", "webhook_url": "https://hooks.slack.com/...", "channel": "#database-alerts", "mention_users": ["@dba-team"], "include_chart": true, "thread_replies": true }

PagerDuty Integration

Create incidents with on-call rotation for critical alerts.

{ "type": "pagerduty", "integration_key": "abc123...", "severity_mapping": { "critical": "critical", "warning": "warning", "info": "info" }, "auto_resolve": true }

Email Notifications

Send formatted email alerts with detailed context and charts.

{ "type": "email", "recipients": [ "[email protected]", "[email protected]" ], "cc": ["[email protected]"], "include_graphs": true, "html_format": true }

Custom Webhooks

Integrate with any system using custom webhook endpoints.

{ "type": "webhook", "url": "https://api.company.com/alerts", "method": "POST", "headers": { "Authorization": "Bearer token..." }, "retry_policy": { "max_attempts": 3, "backoff": "exponential" } }

Channel Routing by Severity

{
  "routing_rules": [
    {
      "severity": "info",
      "channels": ["email"]
    },
    {
      "severity": "warning",
      "channels": ["slack", "email"]
    },
    {
      "severity": "critical",
      "channels": ["pagerduty", "slack", "email"],
      "escalate_after": "15m"
    }
  ]
}

Alert Grouping and Noise Reduction

Intelligent grouping and deduplication prevent alert fatigue and help teams focus on what matters.

Alert Grouping

Combine related alerts into a single notification to reduce noise.

Group By Options:

  • Database: Group all alerts from the same database
  • Alert Rule: Combine repeated triggers of the same rule
  • Severity: Group alerts by severity level
  • Tags: Custom grouping using alert tags
  • Time Window: Group alerts within a time period (e.g., 5 minutes)
{
  "grouping": {
    "enabled": true,
    "group_by": ["database", "severity"],
    "group_window": "5m",
    "max_group_size": 10
  }
}

Alert Suppression

Temporarily silence alerts during maintenance windows or known issues.

{
  "suppression_rules": [
    {
      "name": "Weekend Maintenance",
      "schedule": {
        "days": ["saturday", "sunday"],
        "time": "02:00-06:00",
        "timezone": "America/New_York"
      },
      "suppress_alerts": ["info", "warning"]
    },
    {
      "name": "Known Issue - Replica Lag",
      "conditions": {
        "database": "prod-replica-02",
        "alert_name": "High Replication Lag"
      },
      "expires_at": "2026-02-15T00:00:00Z",
      "reason": "Hardware replacement scheduled"
    }
  ]
}

Rate Limiting

Limit notification frequency to prevent alert storms.

{
  "rate_limiting": {
    "max_alerts_per_hour": 10,
    "max_alerts_per_day": 50,
    "backoff_strategy": "exponential",
    "summary_after_limit": true
  }
}

Escalation Policies

Ensure critical alerts get attention by automatically escalating to senior team members or different channels if not acknowledged.

Multi-Level Escalation

{
  "escalation_policy": {
    "name": "Critical Database Alerts",
    "levels": [
      {
        "level": 1,
        "notify": ["@on-call-dba"],
        "channels": ["slack", "pagerduty"],
        "wait_time": "15m"
      },
      {
        "level": 2,
        "notify": ["@senior-dba", "@ops-lead"],
        "channels": ["pagerduty", "phone"],
        "wait_time": "15m"
      },
      {
        "level": 3,
        "notify": ["@engineering-director"],
        "channels": ["pagerduty", "phone", "sms"],
        "wait_time": "30m"
      }
    ],
    "repeat_final_level": true,
    "repeat_interval": "30m"
  }
}

Time-Based Escalation

Escalate based on how long an alert has been active without resolution.

{
  "time_based_escalation": {
    "rules": [
      {
        "after": "30m",
        "action": "increase_severity",
        "from": "warning",
        "to": "critical"
      },
      {
        "after": "1h",
        "action": "notify_additional",
        "recipients": ["@senior-leadership"]
      },
      {
        "after": "2h",
        "action": "create_incident",
        "incident_severity": "sev-1"
      }
    ]
  }
}

Acknowledgment Tracking

Track who acknowledged alerts and when. Escalate if not acknowledged within the specified time window.

On-Call Rotation

Integrate with PagerDuty or custom on-call schedules to route alerts to the right person based on current shift.

Alert Configuration Best Practices

Start Conservative

Begin with higher thresholds and longer duration windows. Tune down as you understand normal baseline behavior to avoid alert fatigue.

Use Alert Severity Appropriately

Reserve "Critical" for issues requiring immediate action. Overuse leads to alert fatigue and missed important notifications.

Include Runbook Links

Every alert should include a link to documentation or runbook with troubleshooting steps and context.

Review and Refine Regularly

Schedule monthly reviews of alert effectiveness. Disable noisy alerts and create new ones based on recent incidents.

Related Documentation