Project 034: Detecting Noisy Neighbors in a Multi-tenant Cloud Environment

Multi-tenant Analytics & Anomaly Detection

Objective

Build an unsupervised anomaly detection model using Isolation Forest to identify "noisy neighbors" (high-resource-consuming tenants) in a shared cloud environment by analyzing network traffic patterns and flagging outliers that degrade performance for other tenants.

Business Value

- Performance Isolation: Proactively identify tenants causing performance degradation before they impact other customers

- SLA Protection: Prevent noisy neighbors from violating service level agreements of co-located tenants

- Resource Management: Enable targeted resource throttling, migration, or workload balancing

- Cost Optimization: Optimize resource allocation and prevent over-provisioning due to performance issues

- Customer Experience: Maintain consistent performance and reliability across multi-tenant environments

Core Libraries

- scikit-learn: Isolation Forest for unsupervised anomaly detection and data preprocessing

- pandas: Multi-tenant traffic data manipulation and time-series analysis

- numpy: Numerical computations and statistical operations

- matplotlib/seaborn: Traffic pattern visualization and anomaly detection results

- time: Performance measurement and monitoring

Dataset

- Source: Synthetically Generated (realistic multi-tenant network traffic patterns)

- Size: 20,000 samples (20 tenants × 1,000 time steps)

- Features: Packets per second, bytes per second, average packet size, network utilization

- Anomalies: Primary and secondary noisy neighbor events with varying intensity and duration

- Type: Unsupervised anomaly detection with ground truth for evaluation

Step-by-Step Guide

1. Environment Setup

# Create virtual environment

python -m venv noisy_neighbors_env

source noisy_neighbors_env/bin/activate # On Windows: noisy_neighbors_env\Scripts\activate

# Install required packages

pip install pandas numpy scikit-learn matplotlib seaborn

2. Multi-tenant Traffic Data Generation

# Generate realistic multi-tenant network traffic patterns

import pandas as pd

import numpy as np

import random

# Simulation parameters

num_tenants = 20

time_steps = 1000

tenants = [f'tenant_{i+1}' for i in range(num_tenants)]

noisy_neighbor_tenant = 'tenant_5'

secondary_noisy_tenant = 'tenant_15'

data = []

for t in range(time_steps):

for tenant in tenants:

is_noisy = False

# Define normal behavior patterns with tenant-specific baselines

if 'tenant_1' in tenant or 'tenant_2' in tenant:

base_pps = max(0, np.random.normal(2000, 500)) # Low activity

base_bps = base_pps * np.random.normal(250, 30)

elif 'tenant_19' in tenant or 'tenant_20' in tenant:

base_pps = max(0, np.random.normal(8000, 1200)) # High activity

base_bps = base_pps * np.random.normal(400, 60)

else:

base_pps = max(0, np.random.normal(5000, 1000)) # Normal activity

base_bps = base_pps * np.random.normal(300, 50)

# Add time-based patterns (daily cycles)

time_factor = 1 + 0.3 np.sin(2 np.pi * t / 100)

base_pps *= time_factor

base_bps *= time_factor

# Simulate noisy neighbor events

if tenant == noisy_neighbor_tenant and 400 <= t < 600:

base_pps *= np.random.uniform(5, 10) # 5-10x spike

base_bps *= np.random.uniform(5, 10)

is_noisy = True

if tenant == secondary_noisy_tenant and 750 <= t < 800:

base_pps *= np.random.uniform(8, 15) # Intense spike

base_bps *= np.random.uniform(8, 15)

is_noisy = True

# Calculate derived metrics

avg_packet_size = base_bps / max(base_pps, 1)

network_utilization = min(base_bps / 1000000, 100)

data.append([t, tenant, base_pps, base_bps, avg_packet_size,

network_utilization, is_noisy])

df = pd.DataFrame(data, columns=['timestamp', 'tenant_id', 'packets_per_second',

'bytes_per_second', 'avg_packet_size',

'network_utilization', 'is_truly_noisy'])

3. Data Exploration and Pattern Analysis

import matplotlib.pyplot as plt

import seaborn as sns

# Visualize traffic patterns

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Time series visualization

pivot_data = df.pivot(index='timestamp', columns='tenant_id', values='packets_per_second')

axes[0,0].plot(pivot_data.index, pivot_data[noisy_neighbor_tenant],

color='red', linewidth=2, label='Primary Noisy Neighbor')

axes[0,0].plot(pivot_data.index, pivot_data[secondary_noisy_tenant],

color='orange', linewidth=2, label='Secondary Noisy Neighbor')

axes[0,0].set_title('Packets per Second Over Time')

axes[0,0].legend()

# Distribution comparison

normal_data = df[df['is_truly_noisy'] == False]

noisy_data = df[df['is_truly_noisy'] == True]

axes[0,1].hist(normal_data['packets_per_second'], bins=50, alpha=0.7,

label='Normal', density=True)

axes[0,1].hist(noisy_data['packets_per_second'], bins=30, alpha=0.7,

label='Noisy Neighbor', density=True)

axes[0,1].set_title('Traffic Distribution Comparison')

axes[0,1].legend()

# Traffic correlation analysis

axes[1,0].scatter(normal_data['packets_per_second'], normal_data['bytes_per_second'],

alpha=0.6, label='Normal', s=20)

axes[1,0].scatter(noisy_data['packets_per_second'], noisy_data['bytes_per_second'],

alpha=0.8, label='Noisy Neighbor', s=30, color='red')

axes[1,0].set_title('Packets vs Bytes Correlation')

axes[1,0].legend()

plt.tight_layout()

plt.show()

# Tenant behavior summary

tenant_summary = df.groupby('tenant_id').agg({

'packets_per_second': ['mean', 'max', 'std'],

'is_truly_noisy': 'sum'

}).round(2)

print("Tenant Behavior Summary:")

print(tenant_summary.head(10))

4. Feature Engineering and Data Preprocessing

from sklearn.preprocessing import StandardScaler

# Prepare features for anomaly detection

feature_cols = ['packets_per_second', 'bytes_per_second', 'avg_packet_size', 'network_utilization']

X = df[feature_cols].copy()

# Handle any NaN or infinite values

X = X.replace([np.inf, -np.inf], np.nan)

X = X.fillna(X.median())

# Scale features for better model performance

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Store ground truth labels for evaluation

y_true = df['is_truly_noisy'].values

print(f"Feature matrix shape: {X_scaled.shape}")

print(f"Ground truth anomalies: {np.sum(y_true)} ({np.mean(y_true)*100:.2f}%)")

5. Isolation Forest Model Training

from sklearn.ensemble import IsolationForest

# Calculate expected contamination rate

expected_contamination = np.mean(y_true)

contamination_rate = 0.015 # Slightly conservative estimate

# Configure and train Isolation Forest

model = IsolationForest(

n_estimators=200,

contamination=contamination_rate,

random_state=42,

n_jobs=-1

)

print(f"Training Isolation Forest with {contamination_rate*100:.1f}% expected contamination...")

model.fit(X_scaled)

# Make predictions and calculate anomaly scores

y_pred_raw = model.predict(X_scaled)

y_pred = (y_pred_raw == -1).astype(int) # Convert to binary

anomaly_scores = model.decision_function(X_scaled)

print(f"Predicted anomalies: {np.sum(y_pred)} ({np.mean(y_pred)*100:.2f}%)")

print(f"Actual anomalies: {np.sum(y_true)} ({np.mean(y_true)*100:.2f}%)")

6. Model Evaluation and Performance Analysis

from sklearn.metrics import classification_report, confusion_matrix, precision_recall_fscore_support

# Calculate performance metrics

precision, recall, f1, support = precision_recall_fscore_support(

y_true, y_pred, average='binary', pos_label=1

)

print(f"Performance Metrics:")

print(f"• Precision: {precision:.3f}")

print(f"• Recall: {recall:.3f}")

print(f"• F1-Score: {f1:.3f}")

# Detailed classification report

print("\nDetailed Classification Report:")

print(classification_report(y_true, y_pred, target_names=['Normal', 'Noisy Neighbor']))

# Confusion matrix analysis

cm = confusion_matrix(y_true, y_pred)

tn, fp, fn, tp = cm.ravel()

accuracy = (tp + tn) / (tp + tn + fp + fn)

specificity = tn / (tn + fp)

false_positive_rate = fp / (fp + tn)

print(f"\nAdditional Metrics:")

print(f"• Accuracy: {accuracy:.3f}")

print(f"• Specificity: {specificity:.3f}")

print(f"• False Positive Rate: {false_positive_rate:.3f}")

# Visualize confusion matrix

plt.figure(figsize=(8, 6))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',

xticklabels=['Normal', 'Noisy Neighbor'],

yticklabels=['Normal', 'Noisy Neighbor'])

plt.title('Confusion Matrix for Noisy Neighbor Detection')

plt.ylabel('Actual')

plt.xlabel('Predicted')

plt.show()

7. Tenant-wise Analysis and Investigation

# Analyze detection performance by tenant

df_analysis = df.copy()

df_analysis['predicted_anomaly'] = y_pred

df_analysis['anomaly_score'] = anomaly_scores

tenant_analysis = []

for tenant in tenants:

tenant_data = df_analysis[df_analysis['tenant_id'] == tenant]

true_anomalies = tenant_data['is_truly_noisy'].sum()

detected_anomalies = tenant_data['predicted_anomaly'].sum()

if true_anomalies > 0:

true_positives = ((tenant_data['is_truly_noisy'] == 1) &

(tenant_data['predicted_anomaly'] == 1)).sum()

tenant_recall = true_positives / true_anomalies

tenant_precision = true_positives / detected_anomalies if detected_anomalies > 0 else 0

else:

tenant_recall = 0

tenant_precision = 0 if detected_anomalies == 0 else np.nan

tenant_analysis.append({

'tenant_id': tenant,

'true_anomalies': true_anomalies,

'detected_anomalies': detected_anomalies,

'recall': tenant_recall,

'precision': tenant_precision

})

tenant_df = pd.DataFrame(tenant_analysis)

print("Tenant Detection Performance:")

print(tenant_df[tenant_df['true_anomalies'] > 0]) # Focus on noisy tenants

# Visualize anomaly timeline for noisy neighbors

noisy_tenant_data = df_analysis[df_analysis['tenant_id'] == noisy_neighbor_tenant]

plt.figure(figsize=(14, 6))

plt.plot(noisy_tenant_data['timestamp'], noisy_tenant_data['packets_per_second'],

color='blue', alpha=0.7, label='Traffic')

# Highlight detections

true_anomalies = noisy_tenant_data[noisy_tenant_data['is_truly_noisy'] == 1]

plt.scatter(true_anomalies['timestamp'], true_anomalies['packets_per_second'],

color='red', s=50, label='True Anomalies', zorder=5)

detected_anomalies = noisy_tenant_data[noisy_tenant_data['predicted_anomaly'] == 1]

plt.scatter(detected_anomalies['timestamp'], detected_anomalies['packets_per_second'],

color='orange', s=30, marker='x', label='Detected Anomalies', zorder=5)

plt.title(f'Anomaly Detection Timeline for {noisy_neighbor_tenant}')

plt.xlabel('Time Step')

plt.ylabel('Packets per Second')

plt.legend()

plt.grid(True, alpha=0.3)

plt.show()

Success Criteria

- High Recall (>80%): Detect most noisy neighbor events to prevent performance degradation

- Balanced Precision (>70%): Minimize false alarms to avoid alert fatigue

- Low False Positive Rate (<10%): Maintain operational efficiency with reliable alerts

- Tenant Isolation: Successfully identify specific problematic tenants

Next Steps & Extensions

1. Real-time Deployment: Integrate with cloud monitoring platforms for live anomaly detection

2. Multi-dimensional Analysis: Include CPU, memory, disk I/O, and network bandwidth metrics

3. Automated Response: Implement automatic resource throttling or tenant migration

4. Adaptive Thresholds: Use dynamic contamination rates based on historical patterns

5. Root Cause Analysis: Identify specific applications or processes causing noisy behavior

6. Predictive Alerts: Forecast potential noisy neighbor events before they impact performance

Files Structure

034_Noisy_Neighbors_Detection_Cloud/

├── readme.md

├── noisy_neighbors_detection_cloud.ipynb

├── requirements.txt

└── data/

└── (Generated multi-tenant traffic data)

Running the Project

1. Install required dependencies from requirements.txt

2. Execute the Jupyter notebook step by step

3. Analyze multi-tenant traffic patterns and baseline behavior

4. Train Isolation Forest model for unsupervised anomaly detection

5. Evaluate detection performance and investigate tenant-specific results

6. Deploy model for real-time noisy neighbor monitoring

This project demonstrates how unsupervised machine learning can solve critical multi-tenancy challenges in cloud environments, providing automatic detection of performance-impacting tenants while maintaining high operational efficiency and minimal false alarms.