📉 The Challenge

Managing 2 Data Centers and 3 Remote Sites involves checking disk space, RAM, and CPU load on over 50 servers daily.

  • Manual Time: 45 minutes/day
  • Risk: Human error (missing a full disk warning) leading to downtime.

💡 The Solution

I built a custom Python Automation Agent that runs as a cron job.

Key Features

  1. Agentless Architecture: Uses Paramiko to SSH into servers without installing agents on them.
  2. Smart Thresholds: Only alerts if usage > 90% (Disk) or > 95% (RAM).
  3. Instant Alerts: Sends a formatted webhook payload to the IT Operations Slack channel.

🛠 Tech Stack

  • Language: Python 3.9
  • Libraries: paramiko, requests, dotenv
  • Deployment: Docker Container on a management node

🚀 Results

MetricBeforeAfter
Check Time45 Mins30 Seconds
CoverageRandom/DailyEvery 15 Mins
IncidentsReactiveProactive