AI Anomaly Detection

Proactively identify unusual database behavior with machine learning before it impacts your users.

Documentation AI FeaturesAnomaly Detection

Overview

Anomaly Detection uses advanced machine learning algorithms to continuously monitor your database metrics and identify unusual patterns that deviate from normal behavior. By learning your database's unique patterns over time, it can detect issues before they escalate into critical problems.

Self-Learning

Adapts to your unique database patterns and workload characteristics

Early Warning

Detect issues minutes to hours before they impact users

Precision Alerts

Reduce noise with intelligent filtering and context-aware alerting

How AI Detects Anomalies

Baseline Learning

The system establishes baseline behavior by analyzing historical data across multiple dimensions:

Query execution times and patterns
Resource utilization (CPU, memory, I/O)
Connection counts and distribution
Error rates and types
Time-based patterns (hourly, daily, weekly cycles)

Real-Time Monitoring

Continuously compares current metrics against learned baselines using statistical methods and ML models. Monitors hundreds of metrics simultaneously to detect subtle deviations.

Pattern Recognition

Uses multiple algorithms to identify different types of anomalies:

Sudden Spikes/Drops

Rapid changes in metrics

Gradual Trends

Slow degradation over time

Cyclical Violations

Breaking expected patterns

Correlation Breaks

Unusual metric relationships

Severity Assessment

Evaluates the severity and potential impact of detected anomalies based on deviation magnitude, affected resources, and historical patterns. Assigns confidence scores to reduce false positives.

Alert & Context

Generates alerts with rich context including visualization of the anomaly, historical comparison, potential causes, and recommended actions for investigation.

Types of Anomalies Detected

Performance Anomalies

Query execution time spikes (e.g., 10x slower than baseline)
Sudden increase in slow queries
Throughput degradation
Lock wait time increases
Replication lag anomalies

Resource Anomalies

Unexpected CPU utilization spikes
Memory consumption abnormalities
Disk I/O pattern changes
Network traffic anomalies
Cache hit rate drops

Connection Anomalies

Unusual connection count spikes
Connection pool exhaustion patterns
Failed connection rate increases
Long-running transaction buildup
Idle connection accumulation

Error Anomalies

Error rate spikes (queries, connections)
New error types appearing
Deadlock frequency changes
Timeout pattern deviations
Transaction rollback increases

Example: Query Performance Anomaly Detected

HIGH SEVERITY

Detection Details

Metric:Avg Query Time

Normal Range:45-120ms

Current Value:1,850ms

Deviation:+1,542% (15.4σ)

Confidence:98.7%

Recommended Actions

Check recent schema changes on users table
Review queries using full table scan
Verify idx_users_email index is present
Analyze table statistics freshness

Alert Configuration

Severity Levels

Critical

Immediate action required - service impact imminent

Deviation ≥ 10σ

High

Urgent investigation needed - potential user impact

Deviation 5-10σ

Medium

Notable deviation - monitor and investigate

Deviation 3-5σ

Low

Minor anomaly - informational only

Deviation 2-3σ

Notification Channels

Detailed anomaly reports with visualizations

Slack

Real-time alerts to team channels

PagerDuty

Critical alerts with on-call escalation

Webhooks

Custom integrations with your tools

Customizable Alert Rules

Sensitivity Tuning

Adjust detection sensitivity per metric or database

Quiet Hours

Suppress non-critical alerts during specific time windows

Metric Selection

Choose which metrics to monitor for anomalies

Escalation Policies

Define severity-based routing and escalation paths

Reducing False Positives

DB24x7's anomaly detection is designed to minimize false positives while maintaining high detection accuracy. Here's how:

Contextual Awareness

The system understands expected variations:

Weekly and daily patterns
Known maintenance windows
Scheduled batch jobs
Deployment events

Multi-Signal Validation

Correlates multiple signals before alerting:

Cross-metric correlation
Historical pattern matching
Confidence scoring
Duration thresholds

Adaptive Learning

Continuously improves detection accuracy:

Learns from feedback
Adjusts to workload changes
Seasonal pattern recognition
Growth trend awareness

Smart Filtering

Intelligent alert suppression:

Duplicate event grouping
Flapping detection
Minimum duration rules
Impact-based prioritization

Feedback Loop

Mark alerts as "False Positive" or "Expected Behavior" to train the model. The system learns from your feedback and reduces similar alerts in the future, improving accuracy over time.

Training the Model

Initial Training Period

When you first enable anomaly detection, DB24x7 requires a training period to learn your database's normal behavior patterns:

Days 1-7: Learning Mode

Collects baseline data without generating alerts. Observes patterns across all time periods.

Days 8-14: Cautious Detection

Begins detecting obvious anomalies with high confidence thresholds. Alerts on critical issues only.

Days 15-30: Full Detection

Operates at full sensitivity with established baselines. All severity levels active.

Day 30+: Continuous Refinement

Adapts to evolving patterns and workload changes. Improves based on feedback.

Best Practices

Enable during stable, normal operations (avoid deployment windows)
Include at least one full business cycle (week) in training
Document known events (maintenance, migrations) to exclude
Provide feedback on initial alerts to calibrate sensitivity

Things to Avoid

Starting training during major deployments or migrations
Changing detection sensitivity too frequently
Ignoring feedback prompts on anomaly alerts
Disabling detection during unusual but valid traffic patterns

Pro Tip: If your workload patterns change significantly (new feature launch, traffic surge), consider retraining the model to adapt to the new normal. Navigate to Settings → Anomaly Detection → Retrain Model.

Configuration Options

Setting	Description	Default
Detection Sensitivity	Low, Medium, or High - controls σ thresholds	Medium
Training Period	Days of data to use for baseline learning	14 days
Minimum Confidence	Confidence threshold to trigger alerts (0-100%)	85%
Alert Frequency	Maximum alerts per hour for same anomaly type	1 per hour
Monitored Metrics	Select specific metrics to monitor or use all	All metrics
Quiet Hours	Time windows to suppress non-critical alerts	None

Detect Issues Before They Impact Users

Enable AI-powered anomaly detection and sleep better at night

Enable Anomaly Detection Back to Docs