MCP Servers for DevOps: Complete Guide to Model Context Protocol in 2026

Remember when connecting Claude to your Kubernetes cluster meant writing 3,000 lines of custom API wrappers? Dark times.
Then Anthropic dropped Model Context Protocol (MCP) in late 2024, and suddenly the whole game changed. Now Claude can kubectl your clusters, Cursor can terraform your infrastructure, and AI agents can actually ship code to production instead of just suggesting it.
But here's the catch: most teams are doing MCP wrong. They're uploading AWS credentials to SaaS platforms, giving third-party clouds cluster-admin access, and basically recreating the security nightmare that got 30 organizations breached last month.
This guide is different. We're going to show you how to run MCP servers on YOUR infrastructure, where credentials never leave your network and jailbreaks can't escape your security boundaries.
By the end, you'll know how to give AI agents secure access to Kubernetes, Terraform, AWS, GitHub, and 30+ other DevOps tools without sending your keys to San Francisco.
What is MCP? (And Why Should DevOps Care?)
MCP is basically USB for AI. Before USB, every device needed its own proprietary connector. Before MCP, every AI agent needed custom integrations for every tool.
Now? You write one MCP server for Kubernetes, and it works with Claude, Cursor, Windsurf, or whatever AI assistant you're using. No rebuilding integrations when you switch models.
The Technical Bits (Explained Like You're Explaining to Your CEO)
MCP has three pieces:
- MCP Client: The AI app (Claude Desktop, Cursor, etc.) that wants to do stuff
- MCP Server: Lightweight program that exposes tools, data, and prompts
- Transport: How they talk - usually stdio (local) or HTTP (remote)
For DevOps, stdio is the move. It runs locally, credentials never hit the network, and there's nothing to expose to the internet.
Why Self-Hosted MCP Servers Are the Only Sane Option
Last week, Chinese hackers jailbroke Claude to infiltrate 30 organizations. The AI did 80-90% of the attack autonomously. They didn't hack Anthropic - they just asked Claude nicely.
Now imagine if those hackers had targeted a company using a SaaS AI agent with full AWS credentials. One clever prompt and your production environment becomes a crypto mining operation.
Self-hosted MCP changes the threat model entirely:
- Credentials stay in YOUR Kubernetes cluster, protected by YOUR RBAC
- Jailbreaks are contained by network policies - the agent can't reach beyond its pod
- Audit logs go to YOUR SIEM, not some vendor's S3 bucket
- Compliance teams actually sleep at night
Actual quote from a fintech CISO: "I can explain a breach in my infrastructure to the board. I cannot explain why I gave our production keys to a startup."
15 MCP Servers Every DevOps Team Should Know
There are 5,000+ MCP servers on GitHub now. Most are demos. Here are the ones actually running in production:
Kubernetes & Containers
kubectl-mcp-server - Ask Claude "why is nginx crashing?" and it checks pod events, logs, resource limits, node pressure, and recent deployments. Like having an SRE on speed dial.
k8m (multi-cluster) - Manages dev, staging, and prod from one interface. 50+ built-in tools for logs, metrics, debugging. Used by teams running 10+ clusters.
Docker MCP - Container image operations, registry management, local builds. "Build and push this to ECR" actually works.
Infrastructure as Code
Terraform MCP - "Apply terraform for staging" is all you need. It handles plan, shows what'll change, waits for approval (if configured), then applies. State management included.
Ansible MCP - Run playbooks via natural language. "Back up all prod databases" executes the playbook, shows output, alerts if anything fails.
Pulumi MCP - For teams that write infrastructure in actual code. TypeScript/Python/Go support.
Cloud Providers
AWS MCP (Official) - Lambda, ECS, EKS, S3, EC2, RDS... everything. "Show me EC2 instances over $500/month unused for 30 days" = instant FinOps.
GCP MCP - Compute Engine, Cloud Run, GKE. Works with service accounts and workload identity.
Azure DevOps MCP (Microsoft Official) - Work items, PRs, builds, test plans. The first AI integration Microsoft shipped that doesn't suck.
CI/CD & Git
GitHub MCP - Create PRs, review code, manage issues. "Open a PR for the cost-optimization branch" actually creates a PR with a proper description.
GitLab MCP - Same but for self-hosted GitLab. Merge request automation, pipeline triggers, security scans.
Jenkins MCP - Trigger builds, check job status, fetch logs. "Did the prod deployment finish?" gets a real answer.
Monitoring & Observability
Prometheus MCP - "Show CPU usage for payment-api over the last hour" returns actual graphs. Query language? Who needs it.
Datadog MCP - Search logs, create monitors, analyze traces. Works with existing Datadog auth.
Grafana MCP - Generate dashboards from natural language. "Create a dashboard for RDS performance" builds the actual dashboard.
Let's Build an MCP Server (Actually Simple Version)
Forget the hello-world demos. Let's build something useful: an MCP server that gives Claude access to your EC2 instances.
Install the SDK
Write the Server (ec2_mcp.py)
Hook It Up to Claude Desktop
Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
Restart Claude Desktop. Now try: "List all EC2 instances in us-west-2"
Claude calls your MCP server → server runs boto3 with YOUR credentials → results come back. Your AWS keys never left your machine.
Real Story: How MCP Saved On-Call at 2 AM
Here's what actually happened last Tuesday:
2:17 AM: PagerDuty fires. The `payments` namespace is crash-looping.
Without MCP (the old way):
- Wake up, grab laptop
- SSH to jump box (find which one has kubectl...)
- `kubectl get pods -n payments` - 3 pods crashing
- `kubectl describe pod payment-processor-abc123` - scroll through events
- `kubectl logs payment-processor-abc123` - OOMKilled errors
- Check deployment yaml for memory limits
- Realize limit is 256Mi, usage is 400Mi
- Update deployment, apply, watch rollout
Time: 25 minutes. Back to sleep at 2:42 AM, exhausted.
With MCP (the new way):
Open Claude Desktop:
"Debug the payments namespace. Find crash-looping pods and tell me why."
Claude (via k8s MCP server):
- Lists pods in payments namespace
- Identifies 3 crash-looping pods
- Fetches pod events and logs
- Analyzes: "OOMKilled - memory limit 256Mi exceeded"
- Checks actual usage: averaging 380Mi
- Recommends: "Increase memory limit to 512Mi"
Time: 45 seconds. You review the fix, apply it, back to sleep at 2:19 AM.
That's the difference. Same problem. 45 seconds vs 25 minutes.
Security: MCP vs SaaS AI Agents
Let's be real about the security trade-offs:
| Feature | Self-Hosted MCP | SaaS AI Agent |
|---|---|---|
| Where credentials live | Your Kubernetes secrets | Vendor's database |
| Where code runs | Your infrastructure | Vendor's cloud |
| Network exposure | Zero (stdio transport) | HTTPS to vendor APIs |
| Jailbreak blast radius | Limited to pod RBAC | Full AWS access |
| Audit logs | Your SIEM/Splunk/DataDog | Vendor logs (maybe) |
| Compliance | SOC2/HIPAA friendly | Requires vendor BAA |
| Data residency | Your region/VPC | Vendor's region |
The key insight: With self-hosted MCP, a jailbreak can only access what YOUR security policies allow. With SaaS, a jailbreak gets everything you uploaded.
Production Deployment Patterns
Pattern 1: Developer Laptop (Getting Started)
MCP servers run on your MacBook. Credentials from `~/.kube/config` and `~/.aws/credentials`. Perfect for testing.
Pro: Easy setup, zero infra.
Con: Everyone has different configs. No shared context.
Pattern 2: Bastion Host (Production Teams)
MCP servers run on a jump box. Team connects via SSH. Shared audit logs, centralized RBAC.
Pro: Shared context, proper logging, one config to rule them all.
Con: Need to maintain the bastion.
Pattern 3: Kubernetes Sidecar (Enterprise)
Each team gets a pod with MCP servers. Network policies enforce boundaries. Service accounts limit what agents can do.
Pro: Multi-tenant, scales horizontally, fine-grained security.
Con: More complex to set up.
Best Practices (Don't Skip These)
1. Start Read-Only
Give your MCP servers the LEAST permissions possible. If the agent only reads K8s pods, don't give it cluster-admin.
2. Log Everything
Every tool call should hit your audit system:
- Who triggered it
- Tool name + arguments
- Timestamp + outcome
- Any errors
When (not if) something breaks, you'll need that audit trail.
3. Use stdio for Sensitive Stuff
For production operations (terraform apply, database migrations), use stdio transport. No network exposure = smaller attack surface.
4. Version Control Your MCP Configs
Store MCP server configs in Git alongside your infra code. Makes them reproducible and PR-reviewable.
Common Issues (And How to Fix Them)
"MCP server won't connect"
- Use absolute paths: `"command": "/usr/bin/python3"` not just `python3`
- Check permissions: `chmod +x mcp_server.py`
- Test manually: Run `python3 mcp_server.py` - should start without errors
"Permission denied when calling tools"
- Check AWS creds: `aws sts get-caller-identity`
- Verify K8s context: `kubectl config current-context`
- RBAC permissions: Does the service account actually have access?
"Tools are slow"
- Cache frequently accessed data (cluster state, etc.)
- Use async/await for I/O operations
- Add timeouts so tools don't hang forever
CloudShip Station: MCP Runtime That Actually Works
Look, you can cobble together MCP servers yourself. Install them one by one, write config files, debug permissions, manage credentials...
Or you can use Station" target="_blank" rel="noopener noreferrer" class="text-blue-600 hover:text-blue-800 underline">https://cloudshipai.com/station) and skip the pain:
- 30+ pre-built MCP servers - K8s, Terraform, AWS, GitHub, Datadog, all configured and tested
- 30-second install - `npx @cloudship/station install` and you're running
- Multi-cluster support - Manage dev/staging/prod from one agent
- Built-in audit logging - Every tool call logged with context
- Self-hosted - Runs on YOUR infra, credentials stay local
- Open source - 311+ GitHub stars, MIT license
We built Station because we were tired of spending 3 hours configuring MCP servers every time we wanted to add a new tool. Now it takes 30 seconds.
What's Next for MCP in DevOps
MCP is 6 months old and already:
- AWS shipped official MCP servers for Lambda, ECS, and EKS
- Microsoft integrated it into Azure DevOps (actually works, surprisingly)
- Kubernetes SIG is exploring MCP for AI-assisted cluster ops
- HashiCorp added MCP to Terraform Cloud (beta)
In the next year, expect:
- Every major vendor ships MCP servers (monitoring, CI/CD, cloud providers)
- Multi-agent coordination (Terraform + K8s + Datadog working together)
- Policy enforcement layers (AI actions validated before execution)
- MCP gateways (centralized auth, rate limiting, audit logs)
The teams shipping AI agents to production RIGHT NOW are all self-hosting. Not some of them. All of them.
Bottom Line
MCP is the missing piece that makes AI agents practical for DevOps:
- Security - Credentials stay local, jailbreaks are contained
- Standardization - Write once, use everywhere
- Auditability - Full visibility into agent actions
- Flexibility - Works with K8s, Terraform, AWS, GitHub, 30+ tools
You can build custom MCP servers (we showed you how). Or use a runtime like Station" target="_blank" rel="noopener noreferrer" class="text-blue-600 hover:text-blue-800 underline">https://cloudshipai.com/station) to skip the setup and start shipping.
Either way, if you're giving AI agents access to production, self-hosting isn't optional. It's the only sane way to do it.
Ready to get started?
- Install Station - 30+ MCP servers, 30-second setup
- Browse awesome-devops-mcp-servers - Full list of DevOps MCPs
- Read the official MCP docs - Deep dive into the protocol
---
Questions about MCP in your infrastructure? Hit us up" target="_blank" rel="noopener noreferrer" class="text-blue-600 hover:text-blue-800 underline">https://cloudshipai.com/contact) or try Station" target="_blank" rel="noopener noreferrer" class="text-blue-600 hover:text-blue-800 underline">https://cloudshipai.com/station) for free.