Understanding how all the pieces fit together in the MCP ecosystem
⭐ Starring this repository to support this work
By the end of this chapter, you'll understand how your AI assistant talks to your Kubernetes cluster through an MCP server. Your MCP server acts as a translator between AI requests and Kubernetes operations.
Your AI assistant is smart but doesn't know your infrastructure. The MCP server bridges this gap, turning AI questions like "What pods are running?" into actual Kubernetes API calls that return real data.
- 2.1 The Complete Flow
- 2.2 MCP Protocol: The Foundation
- 2.3 Understanding the Client-Server Side
- 2.4 Resources: Making Your Infrastructure Discoverable
- 2.5 Tools: Where Actions Happen
- 2.6 Error Handling: When Things Go Wrong
- 2.7 Protocol Compliance: Following the Rules
- 2.8 Debugging MCP Communications
- 2.9 Hands-On Lab: Exploring MCP in Action
- 2.10 Real-World Architecture Patterns
- 2.11 What's Next?
Here's what happens when you ask an AI to scale your application:
graph TB
subgraph "Your Chat Interface"
A[You: 'Scale my web app to 5 replicas']
B[AI Assistant: Claude/ChatGPT]
end
subgraph "MCP Layer"
C[MCP Client]
D[Your Custom MCP Server]
end
subgraph "Infrastructure"
E[Kubernetes API]
F[Your Cluster]
end
A --> B
B --> C
C --> D
D --> E
E --> F
F --> E
E --> D
D --> C
C --> B
B --> A
The flow:
- You ask: "Scale my web app to 5 replicas"
- AI decides: "I need the scale tool with these parameters"
- MCP Client: Formats the request using MCP protocol
- Your MCP Server: Validates request, calls Kubernetes API
- Kubernetes: Performs the scaling operation
- Response flows back: Success confirmation returns through the chain
Once set up, the AI handles complex requests like "Show me all failing pods and their logs" without custom code for each question.
MCP uses JSON-RPC 2.0 for communication. It's a standard way for programs to exchange JSON messages.
Initialization Messages
{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {
"resources": {},
"tools": {}
}
}
}Resource Messages (discovering what's available)
{
"jsonrpc": "2.0",
"method": "resources/list",
"params": {}
}Tool Messages (taking actions)
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "scale_deployment",
"arguments": {
"namespace": "default",
"name": "web-app",
"replicas": 5
}
}
}Every message follows this pattern, making it predictable and debuggable.
When an AI assistant wants to interact with your Kubernetes cluster, there's a specific sequence of messages. Understanding this flow helps you debug issues and build better MCP servers.
Every MCP session starts with a handshake where both sides declare their capabilities:
sequenceDiagram
participant AI as AI Assistant
participant MC as MCP Client
participant MS as MCP Server
Note over AI,MS: Session Startup
AI->>MC: I want to manage Kubernetes
MC->>MS: initialize
MS->>MC: Here's what I can do: resources, tools
MC->>AI: Server ready with these capabilities
Note over AI,MS: Normal Operation
AI->>MC: Show me all pods
MC->>MS: resources/list (type: pod)
MS->>MC: [pod1, pod2, pod3...]
MC->>AI: Here are your pods
When your MCP server advertises capabilities:
Resources = Things you can query
- "What pods are running?"
- "Show me deployment details"
- "List all services"
Tools = Actions you can take
- "Scale this deployment to 3 replicas"
- "Restart all pods"
- "Create a new secret"
Your Kubernetes MCP server needs both capabilities since you want the AI to discover information and take actions.
MCP supports three message transport methods:
graph LR
subgraph "Transport Options"
A[stdio - Simple pipes]
B[HTTP - Web requests]
C[WebSocket - Real-time]
end
subgraph "Best for..."
D[Local development<br/>Simple scripts]
E[Production services<br/>Load balancing]
F[Real-time updates<br/>Live monitoring]
end
A --> D
B --> E
C --> F
We'll start with stdio since it's simple and works well with VS Code and GitHub Copilot. Later chapters will cover HTTP for production deployments.
Resources are like a catalog of things the AI can ask about. For Kubernetes, this means pods, services, deployments, and everything else in your cluster.
Here's what happens when an AI asks "What deployments are running in the production namespace?":
sequenceDiagram
participant AI as AI Assistant
participant MC as MCP Client
participant MS as MCP Server
participant K8s as Kubernetes API
AI->>MC: Show me production deployments
MC->>MS: resources/list
Note over MS: Filter for deployments<br/>in production namespace
MS->>K8s: GET /apis/apps/v1/namespaces/production/deployments
K8s->>MS: [deployment list]
MS->>MC: MCP resource list response
MC->>AI: Formatted deployment information
Your MCP server needs to present Kubernetes objects in a way that's useful for AI:
{
"uri": "k8s://pod/production/web-app-7d8f9c-xyz123",
"name": "web-app-7d8f9c-xyz123",
"description": "Pod: web-app-7d8f9c-xyz123 in production namespace",
"mimeType": "application/json"
}Why this structure works:
- URI: Unique identifier the AI can reference later
- Name: Human-readable identifier
- Description: Context that helps the AI understand what this is
- MimeType: Tells the client how to interpret the data
Design resource descriptions for AI understanding. Instead of just "nginx-deployment", use "nginx-deployment (3/3 replicas ready) in production namespace". The AI can now understand health and context without additional queries.
Tools let the AI actually do things in your infrastructure. Each tool is a function the AI can call with parameters.
Good MCP tools follow these rules:
- Clear purpose: Each tool does one thing well
- Safe defaults: Tools should be hard to use dangerously
- Rich feedback: Always tell the AI what actually happened
Here's what happens when the AI decides to scale a deployment:
sequenceDiagram
participant AI as AI Assistant
participant MC as MCP Client
participant MS as MCP Server
participant K8s as Kubernetes API
AI->>MC: Scale web-app to 5 replicas
MC->>MS: tools/call "scale_deployment"
Note over MS: Validate parameters<br/>Check permissions<br/>Verify deployment exists
MS->>K8s: PATCH deployment/web-app scale=5
K8s->>MS: Scale operation result
MS->>MC: Success + current status
MC->>AI: "Scaled web-app to 5 replicas. All pods are ready."
Your tools need to define what parameters they accept. This example balances flexibility with safety:
{
"name": "scale_deployment",
"description": "Scale a Kubernetes deployment to specified replica count",
"inputSchema": {
"type": "object",
"properties": {
"namespace": {
"type": "string",
"description": "Kubernetes namespace",
"default": "default"
},
"name": {
"type": "string",
"description": "Deployment name"
},
"replicas": {
"type": "integer",
"minimum": 0,
"maximum": 50,
"description": "Target replica count"
}
},
"required": ["name", "replicas"]
}
}Design decisions:
- Default namespace: Reduces errors for simple operations
- Replica limits: Prevents accidental massive scaling
- Required fields: Forces the AI to provide essential information
In production, things break. Your MCP server needs to handle failures gracefully and give the AI useful information to help users.
graph TD
A[MCP Request] --> B{Validation}
B -->|Invalid| C[Client Error 400]
B -->|Valid| D{Authentication}
D -->|Failed| E[Auth Error 401]
D -->|Success| F{Kubernetes API}
F -->|API Error| G[Server Error 500]
F -->|Not Found| H[Not Found 404]
F -->|Success| I[Success Response]
C --> J[AI gets clear error message]
E --> J
G --> J
H --> J
I --> K[AI proceeds with task]
Instead of generic errors, provide context the AI can use:
Bad:
{"error": "Failed to scale deployment"}Good:
{
"error": {
"code": -32000,
"message": "Cannot scale deployment 'web-app' in namespace 'production'",
"data": {
"reason": "deployment not found",
"suggestions": [
"Check if deployment name is correct",
"Verify you have access to 'production' namespace",
"List available deployments with 'kubectl get deployments -n production'"
]
}
}
}The AI can now understand what went wrong and suggest fixes to the user.
When your Kubernetes cluster has issues, your MCP server shouldn't crash. Handle partial failures:
flowchart TD
A[AI requests pod list] --> B{Can connect to K8s?}
B -->|No| C[Return cached data + warning]
B -->|Yes| D{API responds?}
D -->|Timeout| E[Return partial results]
D -->|Success| F[Return full results]
C --> G[AI knows data might be stale]
E --> H[AI knows some data missing]
F --> I[AI has complete picture]
This way, the AI can still help users even when the cluster is having problems.
MCP has specific requirements for message formats and sequences. Following these ensures your server works reliably with any MCP client.
Every MCP server must handle this sequence:
sequenceDiagram
participant C as MCP Client
participant S as MCP Server
Note over C,S: Initialization (Required)
C->>S: initialize
S->>C: initialize response + capabilities
C->>S: initialized (notification)
Note over C,S: Normal Operations
C->>S: resources/list
S->>C: resource list
C->>S: tools/call
S->>C: tool result
Note over C,S: Cleanup (Optional)
C->>S: shutdown (notification)
Critical points:
- Always respond to
initializewith your capabilities - The
initializednotification confirms the handshake shutdownis optional but good practice
Your responses must follow JSON-RPC 2.0 format exactly:
{
"jsonrpc": "2.0",
"id": 123,
"result": {
"resources": [...]
}
}Common mistakes:
- Missing
jsonrpcfield - Wrong
id(must match the request) - Using
resultanderrorin the same response
When things don't work, you need to see what's happening in the protocol layer.
Log every message to debug issues:
graph LR
A[Incoming Message] --> B[Log Request]
B --> C[Process]
C --> D[Log Response]
D --> E[Send Response]
F[Log Files] --> G[Debug Issues]
B --> F
D --> F
Problem: AI says "Server not responding" Debug: Check if your server sends the initialization response correctly
Problem: AI can't see your tools
Debug: Verify your capabilities include {"tools": {}} in the initialize response
Problem: Tool calls fail silently Debug: Make sure you're returning proper JSON-RPC responses with matching IDs
Test your server before connecting to AI clients:
# Send initialize message
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05"}}' | your-mcp-server
# Should get back: initialize response + capabilitiesThis catches most protocol compliance issues early.
Let's analyze real MCP communications and build a simple client to understand the protocol from both sides.
# Create lab directory
mkdir mcp-architecture-lab
cd mcp-architecture-lab
# Install test MCP server
npm install @modelcontextprotocol/server-filesystemWatch real MCP messages:
# Start filesystem MCP server with debug logging
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{}}}' | \
npx @modelcontextprotocol/server-filesystem /tmp > mcp-session.log 2>&1 &
# Analyze the output
cat mcp-session.logYou'll see:
- Initialize request → Initialize response with capabilities
- Resource structure for files and directories
- JSON-RPC message format in practice
Create a basic MCP client:
// mcp-client.go
package main
import (
"bufio"
"encoding/json"
"fmt"
"os"
"os/exec"
)
type JsonRPCRequest struct {
JsonRPC string `json:"jsonrpc"`
ID int `json:"id"`
Method string `json:"method"`
Params interface{} `json:"params"`
}
type JsonRPCResponse struct {
JsonRPC string `json:"jsonrpc"`
ID int `json:"id"`
Result interface{} `json:"result,omitempty"`
Error interface{} `json:"error,omitempty"`
}
func main() {
// Start the filesystem MCP server
cmd := exec.Command("npx", "@modelcontextprotocol/server-filesystem", "/tmp")
stdin, _ := cmd.StdinPipe()
stdout, _ := cmd.StdoutPipe()
cmd.Start()
// Send initialize request
initReq := JsonRPCRequest{
JsonRPC: "2.0",
ID: 1,
Method: "initialize",
Params: map[string]interface{}{
"protocolVersion": "2024-11-05",
"capabilities": map[string]interface{}{},
},
}
reqBytes, _ := json.Marshal(initReq)
fmt.Fprintln(stdin, string(reqBytes))
// Read response
scanner := bufio.NewScanner(stdout)
if scanner.Scan() {
var resp JsonRPCResponse
json.Unmarshal(scanner.Bytes(), &resp)
fmt.Printf("Server capabilities: %+v\n", resp.Result)
}
}Run this client and observe:
- How initialization works from the client side
- What capabilities the server advertises
- The exact JSON-RPC message format
Create a debugging proxy that sits between client and server:
# Create debug-proxy.py
cat > debug-proxy.py << 'EOF'
#!/usr/bin/env python3
import json
import sys
import subprocess
import threading
def log_message(direction, message):
try:
parsed = json.loads(message.strip())
print(f"[{direction}] {json.dumps(parsed, indent=2)}", file=sys.stderr)
except:
print(f"[{direction}] {message.strip()}", file=sys.stderr)
def forward_messages(source, destination, direction):
for line in source:
log_message(direction, line)
destination.write(line)
destination.flush()
# Start the target MCP server
server = subprocess.Popen(['npx', '@modelcontextprotocol/server-filesystem', '/tmp'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)
# Forward stdin to server, stdout from server
threading.Thread(target=forward_messages,
args=(sys.stdin, server.stdin, "CLIENT->SERVER")).start()
threading.Thread(target=forward_messages,
args=(server.stdout, sys.stdout, "SERVER->CLIENT")).start()
server.wait()
EOF
chmod +x debug-proxy.pyUse this proxy to see all messages:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05"}}' | \
./debug-proxy.pyAfter completing these labs, you understand:
- Message Structure: How JSON-RPC wraps MCP concepts
- Capability Negotiation: How client and server agree on features
- Error Patterns: What happens when things go wrong
- Debugging Techniques: How to trace protocol issues
Now that you understand the protocol, let's look at how successful teams structure their MCP implementations.
graph TB
subgraph "MCP Layer"
A[MCP Protocol Handler]
B[Message Router]
C[Input Validation]
end
subgraph "Business Logic"
D[Kubernetes Operations]
E[Resource Discovery]
F[Security Checks]
end
subgraph "Infrastructure"
G[K8s Client Library]
H[Configuration Manager]
I[Logging & Metrics]
end
A --> B --> C
C --> D
C --> E
C --> F
D --> G
E --> G
F --> H
G --> I
Why this works:
- Clean separation between protocol and business logic
- Easy to test each layer independently
- Simple to add new capabilities
Think in terms of Kubernetes resources instead of API endpoints:
graph LR
subgraph "Resource Handlers"
A[Pod Handler]
B[Deployment Handler]
C[Service Handler]
D[ConfigMap Handler]
end
subgraph "Common Operations"
E[List Resources]
F[Get Resource Details]
G[Update Resource]
H[Delete Resource]
end
A --> E
A --> F
A --> G
A --> H
B --> E
B --> F
B --> G
B --> H
Each handler knows how to work with one type of Kubernetes resource, making the code predictable and maintainable.
You now understand how MCP works under the hood. In the next chapter, we'll put this knowledge to work by building a real Kubernetes MCP server in Go.
Key concepts to remember:
- MCP is JSON-RPC: Simple request/response over various transports
- Resources are discoverable: They let the AI explore what's available
- Tools are actionable: They let the AI make changes
- Error handling matters: Good errors help the AI help users
- Protocol compliance is critical: Follow the rules for reliable operation
The architecture patterns we covered will guide our implementation in Chapter 3, where we'll build a production-ready MCP server that follows these best practices.
Next: Chapter 3: Building MCP Server with Go & Kubernetes
Consider sponsoring this work or check out my book "PromptOps: From YAML to AI" to help create more AI-powered DevOps resources.