Automated Server Health Monitor
📉 The Challenge
Managing 2 Data Centers and 3 Remote Sites involves checking disk space, RAM, and CPU load on over 50 servers daily.
- Manual Time: 45 minutes/day
- Risk: Human error (missing a full disk warning) leading to downtime.
💡 The Solution
I built a custom Python Automation Agent that runs as a cron job.
Key Features
- Agentless Architecture: Uses
Paramikoto SSH into servers without installing agents on them. - Smart Thresholds: Only alerts if usage > 90% (Disk) or > 95% (RAM).
- Instant Alerts: Sends a formatted webhook payload to the IT Operations Slack channel.
🛠 Tech Stack
- Language: Python 3.9
- Libraries:
paramiko,requests,dotenv - Deployment: Docker Container on a management node
🚀 Results
| Metric | Before | After |
|---|---|---|
| Check Time | 45 Mins | 30 Seconds |
| Coverage | Random/Daily | Every 15 Mins |
| Incidents | Reactive | Proactive |
Comments