Chuyển tới nội dung chính

Performance Monitor Agent

The Performance Monitor Agent tracks system performance metrics, identifies bottlenecks, and provides optimization recommendations.

📋 Overview

PropertyValue
Modulesrc.agents.monitoring.performance_monitor_agent
ClassPerformanceMonitorAgent
AuthorUIP Team
Version1.0.0

🎯 Purpose

The Performance Monitor Agent provides:

  • Real-time performance tracking for all system components
  • Resource utilization monitoring (CPU, memory, I/O)
  • Latency analysis for API endpoints and database queries
  • Bottleneck identification and optimization recommendations
  • Historical trend analysis for capacity planning

📊 Metrics Collected

System Metrics

MetricUnitDescription
cpu_usage%CPU utilization
memory_usageMBMemory consumption
disk_ioMB/sDisk read/write rate
network_ioMB/sNetwork throughput

Application Metrics

MetricUnitDescription
request_latencymsAPI response time
request_throughputreq/sRequests per second
error_rate%Failed requests percentage
active_connectionscountConcurrent connections

Agent Metrics

MetricUnitDescription
agent_execution_timemsAgent processing time
entities_processedcountEntities per execution
queue_depthcountPending operations

🔧 Architecture

┌─────────────────────────────────────────────┐
│ Performance Monitor Agent │
├─────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │
│ │ System │ │ App │ │ Agent │ │
│ │ Metrics │ │ Metrics │ │ Metrics │ │
│ └────┬────┘ └────┬────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────┼──────────────┘ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Aggregator │ │
│ └───────┬───────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Time- │ │Prometheus│ │ Alert │ │
│ │Series DB│ │ Export │ │ Engine │ │
│ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────┘

🚀 Usage

Basic Monitoring

from src.agents.monitoring.performance_monitor_agent import PerformanceMonitorAgent

monitor = PerformanceMonitorAgent()

# Start monitoring
monitor.start()

# Get current metrics
metrics = monitor.get_metrics()
print(f"CPU Usage: {metrics['cpu_usage']}%")
print(f"Memory Usage: {metrics['memory_usage']}MB")
print(f"Request Latency: {metrics['avg_latency']}ms")

Track Specific Operations

# Track API endpoint performance
with monitor.track("api.cameras.list"):
cameras = await get_cameras()

# Track database query
with monitor.track("db.neo4j.query"):
results = neo4j.query(cypher)

# Get timing statistics
stats = monitor.get_stats("api.cameras.list")
print(f"Avg: {stats['avg_ms']}ms, P95: {stats['p95_ms']}ms")

Custom Metrics

# Register custom metric
monitor.register_metric(
name="active_websockets",
type="gauge",
description="Number of active WebSocket connections"
)

# Update metric
monitor.set_metric("active_websockets", 42)

# Increment counter
monitor.increment_metric("requests_total")

⚙️ Configuration

# config/performance_monitor_config.yaml
performance_monitor:
enabled: true
collection_interval_seconds: 10

# Metrics to collect
system_metrics:
- cpu_usage
- memory_usage
- disk_io
- network_io

# Application metrics
app_metrics:
track_endpoints: true
track_database_queries: true
track_agent_execution: true

# Alerting thresholds
thresholds:
cpu_warning: 70
cpu_critical: 90
memory_warning: 80
memory_critical: 95
latency_warning_ms: 500
latency_critical_ms: 2000

# Export configuration
export:
prometheus:
enabled: true
port: 9091
timeseries_db:
enabled: true
url: http://localhost:8086
database: uip_metrics

📈 Dashboard Integration

Grafana Queries

# Average request latency
rate(request_latency_sum[5m]) / rate(request_latency_count[5m])

# CPU usage by service
avg(cpu_usage) by (service)

# Request throughput
sum(rate(requests_total[1m])) by (endpoint)

# Error rate
sum(rate(requests_failed_total[5m])) / sum(rate(requests_total[5m])) * 100

Sample Dashboard

{
"title": "UIP Performance Dashboard",
"panels": [
{
"title": "Request Latency (P95)",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(request_latency_bucket[5m]))"
}
]
},
{
"title": "Throughput",
"type": "stat",
"targets": [
{
"expr": "sum(rate(requests_total[1m]))"
}
]
}
]
}

🛡️ Performance Alerts

# Configure performance alerts
monitor.add_alert(
name="high_latency",
condition=lambda m: m['avg_latency'] > 1000,
severity="warning",
action=lambda: notify_ops("High API latency detected")
)

monitor.add_alert(
name="memory_critical",
condition=lambda m: m['memory_usage'] > 95,
severity="critical",
action=lambda: trigger_gc()
)

See the complete agents reference for all available agents.