Objective
Build an unsupervised anomaly detection model using Isolation Forest to identify "noisy neighbors" (high-resource-consuming tenants) in a shared cloud environment by analyzing network traffic patterns and flagging outliers that degrade performance for other tenants.
Business Value
- Performance Isolation: Proactively identify tenants causing performance degradation before they impact other customers
- SLA Protection: Prevent noisy neighbors from violating service level agreements of co-located tenants
- Resource Management: Enable targeted resource throttling, migration, or workload balancing
- Cost Optimization: Optimize resource allocation and prevent over-provisioning due to performance issues
- Customer Experience: Maintain consistent performance and reliability across multi-tenant environments
Core Libraries
- scikit-learn: Isolation Forest for unsupervised anomaly detection and data preprocessing
- pandas: Multi-tenant traffic data manipulation and time-series analysis
- numpy: Numerical computations and statistical operations
- matplotlib/seaborn: Traffic pattern visualization and anomaly detection results
- time: Performance measurement and monitoring
Dataset
- Source: Synthetically Generated (realistic multi-tenant network traffic patterns)
- Size: 20,000 samples (20 tenants × 1,000 time steps)
- Features: Packets per second, bytes per second, average packet size, network utilization
- Anomalies: Primary and secondary noisy neighbor events with varying intensity and duration
- Type: Unsupervised anomaly detection with ground truth for evaluation
Step-by-Step Guide
1. Environment Setup
# Create virtual environment
python -m venv noisy_neighbors_env
source noisy_neighbors_env/bin/activate  # On Windows: noisy_neighbors_env\Scripts\activate
# Install required packages
pip install pandas numpy scikit-learn matplotlib seaborn
2. Multi-tenant Traffic Data Generation
# Generate realistic multi-tenant network traffic patterns
import pandas as pd
import numpy as np
import random
# Simulation parameters
num_tenants = 20
time_steps = 1000
tenants = [f'tenant_{i+1}' for i in range(num_tenants)]
noisy_neighbor_tenant = 'tenant_5'
secondary_noisy_tenant = 'tenant_15'
data = []
for t in range(time_steps):
for tenant in tenants:
is_noisy = False
# Define normal behavior patterns with tenant-specific baselines
if 'tenant_1' in tenant or 'tenant_2' in tenant:
base_pps = max(0, np.random.normal(2000, 500))  # Low activity
base_bps = base_pps * np.random.normal(250, 30)
elif 'tenant_19' in tenant or 'tenant_20' in tenant:
base_pps = max(0, np.random.normal(8000, 1200))  # High activity
base_bps = base_pps * np.random.normal(400, 60)
else:
base_pps = max(0, np.random.normal(5000, 1000))  # Normal activity
base_bps = base_pps * np.random.normal(300, 50)
# Add time-based patterns (daily cycles)
time_factor = 1 + 0.3  np.sin(2  np.pi * t / 100)
base_pps *= time_factor
base_bps *= time_factor
# Simulate noisy neighbor events
if tenant == noisy_neighbor_tenant and 400 <= t < 600:
base_pps *= np.random.uniform(5, 10)  # 5-10x spike
base_bps *= np.random.uniform(5, 10)
is_noisy = True
if tenant == secondary_noisy_tenant and 750 <= t < 800:
base_pps *= np.random.uniform(8, 15)  # Intense spike
base_bps *= np.random.uniform(8, 15)
is_noisy = True
# Calculate derived metrics
avg_packet_size = base_bps / max(base_pps, 1)
network_utilization = min(base_bps / 1000000, 100)
data.append([t, tenant, base_pps, base_bps, avg_packet_size,
network_utilization, is_noisy])
df = pd.DataFrame(data, columns=['timestamp', 'tenant_id', 'packets_per_second',
'bytes_per_second', 'avg_packet_size',
'network_utilization', 'is_truly_noisy'])
3. Data Exploration and Pattern Analysis
import matplotlib.pyplot as plt
import seaborn as sns
# Visualize traffic patterns
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
# Time series visualization
pivot_data = df.pivot(index='timestamp', columns='tenant_id', values='packets_per_second')
axes[0,0].plot(pivot_data.index, pivot_data[noisy_neighbor_tenant],
color='red', linewidth=2, label='Primary Noisy Neighbor')
axes[0,0].plot(pivot_data.index, pivot_data[secondary_noisy_tenant],
color='orange', linewidth=2, label='Secondary Noisy Neighbor')
axes[0,0].set_title('Packets per Second Over Time')
axes[0,0].legend()
# Distribution comparison
normal_data = df[df['is_truly_noisy'] == False]
noisy_data = df[df['is_truly_noisy'] == True]
axes[0,1].hist(normal_data['packets_per_second'], bins=50, alpha=0.7,
label='Normal', density=True)
axes[0,1].hist(noisy_data['packets_per_second'], bins=30, alpha=0.7,
label='Noisy Neighbor', density=True)
axes[0,1].set_title('Traffic Distribution Comparison')
axes[0,1].legend()
# Traffic correlation analysis
axes[1,0].scatter(normal_data['packets_per_second'], normal_data['bytes_per_second'],
alpha=0.6, label='Normal', s=20)
axes[1,0].scatter(noisy_data['packets_per_second'], noisy_data['bytes_per_second'],
alpha=0.8, label='Noisy Neighbor', s=30, color='red')
axes[1,0].set_title('Packets vs Bytes Correlation')
axes[1,0].legend()
plt.tight_layout()
plt.show()
# Tenant behavior summary
tenant_summary = df.groupby('tenant_id').agg({
'packets_per_second': ['mean', 'max', 'std'],
'is_truly_noisy': 'sum'
}).round(2)
print("Tenant Behavior Summary:")
print(tenant_summary.head(10))
4. Feature Engineering and Data Preprocessing
from sklearn.preprocessing import StandardScaler
# Prepare features for anomaly detection
feature_cols = ['packets_per_second', 'bytes_per_second', 'avg_packet_size', 'network_utilization']
X = df[feature_cols].copy()
# Handle any NaN or infinite values
X = X.replace([np.inf, -np.inf], np.nan)
X = X.fillna(X.median())
# Scale features for better model performance
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Store ground truth labels for evaluation
y_true = df['is_truly_noisy'].values
print(f"Feature matrix shape: {X_scaled.shape}")
print(f"Ground truth anomalies: {np.sum(y_true)} ({np.mean(y_true)*100:.2f}%)")
5. Isolation Forest Model Training
from sklearn.ensemble import IsolationForest
# Calculate expected contamination rate
expected_contamination = np.mean(y_true)
contamination_rate = 0.015  # Slightly conservative estimate
# Configure and train Isolation Forest
model = IsolationForest(
n_estimators=200,
contamination=contamination_rate,
random_state=42,
n_jobs=-1
)
print(f"Training Isolation Forest with {contamination_rate*100:.1f}% expected contamination...")
model.fit(X_scaled)
# Make predictions and calculate anomaly scores
y_pred_raw = model.predict(X_scaled)
y_pred = (y_pred_raw == -1).astype(int)  # Convert to binary
anomaly_scores = model.decision_function(X_scaled)
print(f"Predicted anomalies: {np.sum(y_pred)} ({np.mean(y_pred)*100:.2f}%)")
print(f"Actual anomalies: {np.sum(y_true)} ({np.mean(y_true)*100:.2f}%)")
6. Model Evaluation and Performance Analysis
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_fscore_support
# Calculate performance metrics
precision, recall, f1, support = precision_recall_fscore_support(
y_true, y_pred, average='binary', pos_label=1
)
print(f"Performance Metrics:")
print(f"• Precision: {precision:.3f}")
print(f"• Recall: {recall:.3f}")
print(f"• F1-Score: {f1:.3f}")
# Detailed classification report
print("\nDetailed Classification Report:")
print(classification_report(y_true, y_pred, target_names=['Normal', 'Noisy Neighbor']))
# Confusion matrix analysis
cm = confusion_matrix(y_true, y_pred)
tn, fp, fn, tp = cm.ravel()
accuracy = (tp + tn) / (tp + tn + fp + fn)
specificity = tn / (tn + fp)
false_positive_rate = fp / (fp + tn)
print(f"\nAdditional Metrics:")
print(f"• Accuracy: {accuracy:.3f}")
print(f"• Specificity: {specificity:.3f}")
print(f"• False Positive Rate: {false_positive_rate:.3f}")
# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=['Normal', 'Noisy Neighbor'],
yticklabels=['Normal', 'Noisy Neighbor'])
plt.title('Confusion Matrix for Noisy Neighbor Detection')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
7. Tenant-wise Analysis and Investigation
# Analyze detection performance by tenant
df_analysis = df.copy()
df_analysis['predicted_anomaly'] = y_pred
df_analysis['anomaly_score'] = anomaly_scores
tenant_analysis = []
for tenant in tenants:
tenant_data = df_analysis[df_analysis['tenant_id'] == tenant]
true_anomalies = tenant_data['is_truly_noisy'].sum()
detected_anomalies = tenant_data['predicted_anomaly'].sum()
if true_anomalies > 0:
true_positives = ((tenant_data['is_truly_noisy'] == 1) &
(tenant_data['predicted_anomaly'] == 1)).sum()
tenant_recall = true_positives / true_anomalies
tenant_precision = true_positives / detected_anomalies if detected_anomalies > 0 else 0
else:
tenant_recall = 0
tenant_precision = 0 if detected_anomalies == 0 else np.nan
tenant_analysis.append({
'tenant_id': tenant,
'true_anomalies': true_anomalies,
'detected_anomalies': detected_anomalies,
'recall': tenant_recall,
'precision': tenant_precision
})
tenant_df = pd.DataFrame(tenant_analysis)
print("Tenant Detection Performance:")
print(tenant_df[tenant_df['true_anomalies'] > 0])  # Focus on noisy tenants
# Visualize anomaly timeline for noisy neighbors
noisy_tenant_data = df_analysis[df_analysis['tenant_id'] == noisy_neighbor_tenant]
plt.figure(figsize=(14, 6))
plt.plot(noisy_tenant_data['timestamp'], noisy_tenant_data['packets_per_second'],
color='blue', alpha=0.7, label='Traffic')
# Highlight detections
true_anomalies = noisy_tenant_data[noisy_tenant_data['is_truly_noisy'] == 1]
plt.scatter(true_anomalies['timestamp'], true_anomalies['packets_per_second'],
color='red', s=50, label='True Anomalies', zorder=5)
detected_anomalies = noisy_tenant_data[noisy_tenant_data['predicted_anomaly'] == 1]
plt.scatter(detected_anomalies['timestamp'], detected_anomalies['packets_per_second'],
color='orange', s=30, marker='x', label='Detected Anomalies', zorder=5)
plt.title(f'Anomaly Detection Timeline for {noisy_neighbor_tenant}')
plt.xlabel('Time Step')
plt.ylabel('Packets per Second')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Success Criteria
- High Recall (>80%): Detect most noisy neighbor events to prevent performance degradation
- Balanced Precision (>70%): Minimize false alarms to avoid alert fatigue
- Low False Positive Rate (<10%): Maintain operational efficiency with reliable alerts
- Tenant Isolation: Successfully identify specific problematic tenants
Next Steps & Extensions
1. Real-time Deployment: Integrate with cloud monitoring platforms for live anomaly detection
2. Multi-dimensional Analysis: Include CPU, memory, disk I/O, and network bandwidth metrics
3. Automated Response: Implement automatic resource throttling or tenant migration
4. Adaptive Thresholds: Use dynamic contamination rates based on historical patterns
5. Root Cause Analysis: Identify specific applications or processes causing noisy behavior
6. Predictive Alerts: Forecast potential noisy neighbor events before they impact performance
Files Structure
034_Noisy_Neighbors_Detection_Cloud/
├── readme.md
├── noisy_neighbors_detection_cloud.ipynb
├── requirements.txt
└── data/
└── (Generated multi-tenant traffic data)
Running the Project
1. Install required dependencies from requirements.txt
2. Execute the Jupyter notebook step by step
3. Analyze multi-tenant traffic patterns and baseline behavior
4. Train Isolation Forest model for unsupervised anomaly detection
5. Evaluate detection performance and investigate tenant-specific results
6. Deploy model for real-time noisy neighbor monitoring
This project demonstrates how unsupervised machine learning can solve critical multi-tenancy challenges in cloud environments, providing automatic detection of performance-impacting tenants while maintaining high operational efficiency and minimal false alarms.