AI Anomaly Detection

Proactively identify unusual database behavior with machine learning before it impacts your users.

DocumentationAI FeaturesAnomaly Detection

Overview

Anomaly Detection uses advanced machine learning algorithms to continuously monitor your database metrics and identify unusual patterns that deviate from normal behavior. By learning your database's unique patterns over time, it can detect issues before they escalate into critical problems.

Self-Learning

Adapts to your unique database patterns and workload characteristics

Early Warning

Detect issues minutes to hours before they impact users

Precision Alerts

Reduce noise with intelligent filtering and context-aware alerting

How AI Detects Anomalies

1

Baseline Learning

The system establishes baseline behavior by analyzing historical data across multiple dimensions:

  • Query execution times and patterns
  • Resource utilization (CPU, memory, I/O)
  • Connection counts and distribution
  • Error rates and types
  • Time-based patterns (hourly, daily, weekly cycles)
2

Real-Time Monitoring

Continuously compares current metrics against learned baselines using statistical methods and ML models. Monitors hundreds of metrics simultaneously to detect subtle deviations.

3

Pattern Recognition

Uses multiple algorithms to identify different types of anomalies:

Sudden Spikes/Drops

Rapid changes in metrics

Gradual Trends

Slow degradation over time

Cyclical Violations

Breaking expected patterns

Correlation Breaks

Unusual metric relationships

4

Severity Assessment

Evaluates the severity and potential impact of detected anomalies based on deviation magnitude, affected resources, and historical patterns. Assigns confidence scores to reduce false positives.

5

Alert & Context

Generates alerts with rich context including visualization of the anomaly, historical comparison, potential causes, and recommended actions for investigation.

Types of Anomalies Detected

Performance Anomalies

  • Query execution time spikes (e.g., 10x slower than baseline)
  • Sudden increase in slow queries
  • Throughput degradation
  • Lock wait time increases
  • Replication lag anomalies

Resource Anomalies

  • Unexpected CPU utilization spikes
  • Memory consumption abnormalities
  • Disk I/O pattern changes
  • Network traffic anomalies
  • Cache hit rate drops

Connection Anomalies

  • Unusual connection count spikes
  • Connection pool exhaustion patterns
  • Failed connection rate increases
  • Long-running transaction buildup
  • Idle connection accumulation

Error Anomalies

  • Error rate spikes (queries, connections)
  • New error types appearing
  • Deadlock frequency changes
  • Timeout pattern deviations
  • Transaction rollback increases

Example: Query Performance Anomaly Detected

HIGH SEVERITY

Detection Details

Metric:Avg Query Time
Normal Range:45-120ms
Current Value:1,850ms
Deviation:+1,542% (15.4σ)
Confidence:98.7%

Recommended Actions

  • Check recent schema changes on users table
  • Review queries using full table scan
  • Verify idx_users_email index is present
  • Analyze table statistics freshness

Alert Configuration

Severity Levels

Critical

Immediate action required - service impact imminent

Deviation ≥ 10σ

High

Urgent investigation needed - potential user impact

Deviation 5-10σ

Medium

Notable deviation - monitor and investigate

Deviation 3-5σ

Low

Minor anomaly - informational only

Deviation 2-3σ

Notification Channels

Email

Detailed anomaly reports with visualizations

Slack

Real-time alerts to team channels

PagerDuty

Critical alerts with on-call escalation

Webhooks

Custom integrations with your tools

Customizable Alert Rules

Sensitivity Tuning

Adjust detection sensitivity per metric or database

Quiet Hours

Suppress non-critical alerts during specific time windows

Metric Selection

Choose which metrics to monitor for anomalies

Escalation Policies

Define severity-based routing and escalation paths

Reducing False Positives

DB24x7's anomaly detection is designed to minimize false positives while maintaining high detection accuracy. Here's how:

Contextual Awareness

The system understands expected variations:

  • Weekly and daily patterns
  • Known maintenance windows
  • Scheduled batch jobs
  • Deployment events

Multi-Signal Validation

Correlates multiple signals before alerting:

  • Cross-metric correlation
  • Historical pattern matching
  • Confidence scoring
  • Duration thresholds

Adaptive Learning

Continuously improves detection accuracy:

  • Learns from feedback
  • Adjusts to workload changes
  • Seasonal pattern recognition
  • Growth trend awareness

Smart Filtering

Intelligent alert suppression:

  • Duplicate event grouping
  • Flapping detection
  • Minimum duration rules
  • Impact-based prioritization

Feedback Loop

Mark alerts as "False Positive" or "Expected Behavior" to train the model. The system learns from your feedback and reduces similar alerts in the future, improving accuracy over time.

Training the Model

Initial Training Period

When you first enable anomaly detection, DB24x7 requires a training period to learn your database's normal behavior patterns:

1

Days 1-7: Learning Mode

Collects baseline data without generating alerts. Observes patterns across all time periods.

2

Days 8-14: Cautious Detection

Begins detecting obvious anomalies with high confidence thresholds. Alerts on critical issues only.

3

Days 15-30: Full Detection

Operates at full sensitivity with established baselines. All severity levels active.

4

Day 30+: Continuous Refinement

Adapts to evolving patterns and workload changes. Improves based on feedback.

Best Practices

  • Enable during stable, normal operations (avoid deployment windows)
  • Include at least one full business cycle (week) in training
  • Document known events (maintenance, migrations) to exclude
  • Provide feedback on initial alerts to calibrate sensitivity

Things to Avoid

  • Starting training during major deployments or migrations
  • Changing detection sensitivity too frequently
  • Ignoring feedback prompts on anomaly alerts
  • Disabling detection during unusual but valid traffic patterns

Pro Tip: If your workload patterns change significantly (new feature launch, traffic surge), consider retraining the model to adapt to the new normal. Navigate to Settings → Anomaly Detection → Retrain Model.

Configuration Options

SettingDescriptionDefault
Detection SensitivityLow, Medium, or High - controls σ thresholdsMedium
Training PeriodDays of data to use for baseline learning14 days
Minimum ConfidenceConfidence threshold to trigger alerts (0-100%)85%
Alert FrequencyMaximum alerts per hour for same anomaly type1 per hour
Monitored MetricsSelect specific metrics to monitor or use allAll metrics
Quiet HoursTime windows to suppress non-critical alertsNone

Detect Issues Before They Impact Users

Enable AI-powered anomaly detection and sleep better at night