MCP Tool Integration as Systems Thinking (Part 3): System Behavior & Policies

15 minute read

In Part 1, we built architectural foundations. In Part 2, we designed for resilience. Now we address system-wide behavior: how tools are discovered, how errors are handled consistently, how performance emerges, and how tool selection becomes strategic.

Policy beats improvisation at scale.

Part 1: Foundation & Architecture
Part 2: Resilience & Runtime Behavior
Part 3: System Behavior & Policies (this article)
Part 4: Advanced Patterns & Production

Tool Discovery Is a Governance Problem

As systems grow, the question shifts from how do we call tools to which tools should exist at all.

Dynamic discovery and registration enable flexibility, but they also require governance. A tool registry becomes a source of truth, not just a convenience.

Effective registries capture intent:

What the tool does
What guarantees it provides
How expensive or slow it is
What permissions it requires

This metadata later enables smarter routing, better fallbacks, and informed deprecation decisions.

Tool Metadata Structure

Tool Metadata Schema:
┌────────────────────────────────────────────────────┐
│ IDENTIFICATION                                     │
│  • toolId: unique identifier                       │
│  • name: human-readable name                       │
│  • version: semantic version                       │
│  • description: purpose and capabilities           │
├────────────────────────────────────────────────────┤
│ CAPABILITIES                                       │
│  • capabilities: ['search', 'realtime-data']      │
│  • tags: ['production-ready', 'external']         │
├────────────────────────────────────────────────────┤
│ PERFORMANCE CHARACTERISTICS                        │
│  • estimatedLatency: fast|medium|slow             │
│    (fast<100ms, medium<1s, slow>1s)               │
│  • rateLimit: {requests: 100, period: '1m'}       │
│  • costPerCall: 0.001 USD                         │
├────────────────────────────────────────────────────┤
│ RELIABILITY GUARANTEES                             │
│  • sla: '99.5%'                                   │
│  • retryable: true                                │
│  • idempotent: true                               │
├────────────────────────────────────────────────────┤
│ SECURITY REQUIREMENTS                              │
│  • requiredPermissions: ['network.external']      │
│  • dataClassification: public|internal|sensitive  │
│  • piiHandling: none|anonymize|encrypt            │
├────────────────────────────────────────────────────┤
│ INPUT/OUTPUT SCHEMA                                │
│  • input: type definitions + validation rules     │
│  • output: expected structure                     │
├────────────────────────────────────────────────────┤
│ OPERATIONAL                                        │
│  • fallbacks: ['cached_search', 'wiki_search']    │
│  • healthCheckEndpoint: URL                       │
└────────────────────────────────────────────────────┘

Tool Discovery Algorithm

FUNCTION discoverTools(sourcePath):
  toolDefinitions = scanDirectory(sourcePath)
  
  FOR EACH definition IN toolDefinitions:
    TRY:
      validateToolMetadata(definition)
      registry.register(definition)
      
      LOG_INFO("Tool discovered", {
        toolId: definition.toolId,
        version: definition.version,
        capabilities: definition.capabilities
      })
      
    CATCH error:
      LOG_ERROR("Tool registration failed", {
        toolId: definition.toolId,
        error: error
      })

FUNCTION findToolsByCapability(capability):
  RETURN registry.query({
    where: {
      capabilities CONTAINS capability,
      tags CONTAINS 'production-ready'
    },
    orderBy: 'reliability.sla' DESC
  })

FUNCTION routeToOptimalTool(intent, constraints):
  candidates = findToolsByCapability(intent.capability)
  
  // Filter by constraints
  IF constraints.maxLatency:
    candidates = filter(candidates, latency < constraints.maxLatency)
  IF constraints.maxCost:
    candidates = filter(candidates, cost < constraints.maxCost)
  IF constraints.requiredSLA:
    candidates = filter(candidates, sla >= constraints.requiredSLA)
  
  // Score and rank
  scored = scoreTools(candidates, intent.priority)
  RETURN scored[0]  // Best match

Governance Questions

Before adding a tool:

Does this capability already exist?
What’s the cost per invocation?
What’s the expected failure rate?
Who owns maintenance?
What’s the deprecation plan?

Before removing a tool:

What depends on it?
What’s the migration path?
Are there usage analytics?
What’s the communication plan?

Error Handling Is a Policy Decision

Error handling should not be improvised at call sites. It should be a policy applied consistently across the system.

That policy answers questions like:

Which errors trigger retries, and how often
Which errors alert humans
Which errors are safe to surface to agents
When a tool should be disabled automatically

When these rules are centralized, the system behaves coherently under stress. When they aren’t, behavior becomes unpredictable and hard to trust.

Error Classification Decision Tree

graph TD
    A[Error Occurred] --> B{Error Type?}
    B -->|Timeout/Network| C[TRANSIENT]
    B -->|HTTP 429| D[RATE_LIMIT]
    B -->|HTTP 401/403| E[AUTHENTICATION]
    B -->|HTTP 4xx| F[VALIDATION]
    B -->|HTTP 5xx| C
    B -->|Unknown| G[UNKNOWN]
    
    C --> H{Circuit open?}
    H -->|Yes| I[FAIL_FAST +<br/>Use Fallback]
    H -->|No| J{Retries<br/>exhausted?}
    J -->|Yes| K[FAIL +<br/>Use Fallback]
    J -->|No| L[RETRY +<br/>Exponential Backoff]
    
    D --> M[RETRY +<br/>Linear Backoff<br/>Honor retry-after]
    
    E --> N{Credentials<br/>refreshed?}
    N -->|No| O[Refresh +<br/>RETRY once]
    N -->|Yes| P[FAIL +<br/>Alert Operator<br/>Disable Tool]
    
    F --> Q[FAIL +<br/>Surface to Agent<br/>Validation Error]
    
    G --> R[FAIL +<br/>Alert Operator]

Error Handling Algorithm

FUNCTION handleError(error, context):
  classification = classifyError(error)
  
  SWITCH classification.category:
    CASE TRANSIENT:
      RETURN handleTransient(error, context, classification)
    CASE AUTHENTICATION:
      RETURN handleAuth(error, context)
    CASE RATE_LIMIT:
      RETURN handleRateLimit(error, context, classification)
    CASE VALIDATION:
      RETURN {action: FAIL, surfaceToAgent: true, guidance: error.message}
    DEFAULT:
      RETURN {action: FAIL, alertOperator: true}

FUNCTION classifyError(error):
  IF error.type IN [TIMEOUT, CONNECTION_REFUSED]:
    RETURN {category: TRANSIENT, retryable: true, maxRetries: 3, backoff: EXPONENTIAL}
  
  IF error.statusCode == 429:
    RETURN {category: RATE_LIMIT, retryable: true, delayMs: error.retryAfter OR 60000}
  
  IF error.statusCode IN [401, 403]:
    RETURN {category: AUTHENTICATION, retryable: false, alertOperator: true}
  
  IF error.statusCode >= 500:
    RETURN {category: TRANSIENT, retryable: true, maxRetries: 3}
  
  IF error.statusCode >= 400:
    RETURN {category: VALIDATION, retryable: false, surfaceToAgent: true}
  
  RETURN {category: UNKNOWN, retryable: false, alertOperator: true}

FUNCTION handleTransient(error, context, classification):
  IF isCircuitOpen(context.toolId):
    RETURN {action: FAIL_FAST, useFallback: true}
  
  IF context.retryCount >= classification.maxRetries:
    recordFailure(context.toolId)
    RETURN {action: FAIL, useFallback: true}
  
  delayMs = 2^(context.retryCount) * 1000  // Exponential backoff
  RETURN {action: RETRY, delayMs: delayMs}

FUNCTION handleAuth(error, context):
  IF NOT context.credentialsRefreshed:
    refreshCredentials(context.toolId)
    RETURN {action: RETRY, delayMs: 0}
  
  alertOperator(severity=HIGH, toolId=context.toolId)
  RETURN {action: FAIL, disableTool: true}

// Circuit Breaker Pattern
FUNCTION recordFailure(toolId):
  breaker = circuitBreakers.get(toolId)
  breaker.failures += 1
  breaker.lastFailure = NOW()
  
  IF breaker.failures >= THRESHOLD:
    breaker.state = OPEN
    LOG_ERROR("Circuit breaker opened", {toolId})
    
    // Auto-reset after timeout
    scheduleTask(after=CIRCUIT_TIMEOUT, action=() => {
      breaker.state = HALF_OPEN
      breaker.failures = 0
    })

Policy Configuration Example

errorHandling:
  retryPolicy:
    maxAttempts: 3
    backoffStrategy: exponential
    baseDelayMs: 1000
    
  circuitBreaker:
    failureThreshold: 5
    timeoutMs: 30000
    halfOpenRequests: 3
    
  authentication:
    autoRefresh: true
    maxRefreshAttempts: 1
    alertOnFailure: true
    
  validation:
    surfaceToAgent: true
    includeFieldErrors: true
    
  rateLimiting:
    honorRetryAfter: true
    defaultBackoffMs: 60000

Performance Emerges From Architecture

In multi-tool environments, performance is not the result of fast tools alone. It emerges from how tools are composed, cached, and orchestrated.

Small inefficiencies multiply when:

Tools are called redundantly
Connections are not reused
Results are not cached
Orchestration is overly sequential

Good performance engineering focuses less on micro-optimizations and more on flow: minimizing unnecessary work and making latency predictable.

Performance Optimization Patterns

flowchart LR
    A[Tool Request] --> B{In-flight<br/>request?}
    B -->|Yes| C[Wait for<br/>existing]
    B -->|No| D{Cached?}
    D -->|Yes & Fresh| E[Return cached]
    D -->|No/Expired| F[Get pooled<br/>connection]
    F --> G[Execute]
    G --> H[Cache result]
    H --> I[Release connection]
    I --> J[Return result]
    C --> J
    E --> J

Key Patterns

1. REQUEST DEDUPLICATION
   Problem: Same request called multiple times simultaneously
   Solution: Track in-flight requests, share result
   
   cacheKey = hash(toolId + normalizedInput)
   IF inflightRequests.contains(cacheKey):
     RETURN AWAIT inflightRequests.get(cacheKey)
   
   promise = executeActual(toolId, input)
   inflightRequests.set(cacheKey, promise)
   TRY:
     result = AWAIT promise
     RETURN result
   FINALLY:
     inflightRequests.remove(cacheKey)

2. CONNECTION POOLING
   Problem: Creating connections is expensive
   Solution: Reuse idle connections
   
   pool = {connections: [], maxSize: 10, activeCount: 0}
   
   IF pool.hasIdleConnection():
     RETURN pool.pop()
   ELSE IF pool.activeCount < pool.maxSize:
     pool.activeCount++
     RETURN createNewConnection()
   ELSE:
     WAIT_FOR availableConnection()

3. SMART CACHING (TTL by tool characteristics)
   Problem: One-size-fits-all caching is inefficient
   Solution: Adaptive TTL based on metadata
   
   FUNCTION getCacheTTL(toolId):
     metadata = registry.getMetadata(toolId)
     
     IF metadata.latency == 'slow':
       RETURN 5_minutes  // Expensive, cache longer
     ELSE IF 'realtime-data' IN metadata.capabilities:
       RETURN 30_seconds  // Fresh data needed
     ELSE:
       RETURN 1_minute  // Default

4. PARALLEL EXECUTION WITH CONCURRENCY LIMITS
   Problem: Unlimited parallelism overwhelms system
   Solution: Sliding window concurrency control
   
   FUNCTION executeParallel(tasks[], maxConcurrency=5):
     results = []
     executing = []
     
     FOR EACH task IN tasks:
       promise = execute(task)
       results.APPEND(promise)
       executing.APPEND(promise)
       
       IF length(executing) >= maxConcurrency:
         AWAIT any(executing)  // Wait for one to complete
         executing.remove(completed)
     
     RETURN AWAIT all(results)

5. CACHE KEY NORMALIZATION
   Problem: Same input, different key (order, formatting)
   Solution: Normalize before hashing
   
   FUNCTION getCacheKey(toolId, input):
     // Sort keys for consistent ordering
     normalized = stringify(input, sortKeys=true)
     hash = hashFunction(normalized)
     RETURN toolId + ":" + hash

Performance Metrics to Track

• Cache Hit Rate: hits / (hits + misses)
  Target: >80% for cacheable operations

• Connection Pool Utilization: active / maxSize
  Target: 60-80% (headroom for spikes)

• Deduplication Rate: deduplicated / totalRequests
  Indicates redundant call patterns

• P50/P95/P99 Latency: Response time percentiles
  Watch for bimodal distributions

• Throughput: Requests/second sustained
  Should scale with concurrency

Tool Selection Is an Exercise in Restraint

One of the most mature signals in an MCP system is not how many tools it uses, but how many it chooses not to.

Tool selection is where strategy shows up:

Community tools are excellent defaults for standard capabilities
Custom tools make sense when differentiation matters
Redundancy should exist for resilience, not indecision

Every tool added increases operational surface area. Thoughtful systems earn complexity deliberately.

Tool Evaluation Scorecard

Before adding a tool, evaluate:

Need (0-10 points)

Does this solve a real user problem?
Is there an existing tool that could work?
What’s the cost of not having this?

Quality (0-10 points)

What’s the documented SLA?
How well-maintained is it?
Are there test cases or examples?

Operational Cost (0-10 points, inverted)

How complex is integration?
What dependencies does it add?
What’s the monitoring burden?

Strategic Fit (0-10 points)

Does this align with platform direction?
Will this still matter in 6 months?
Does this enable future capabilities?

Threshold: Require 30+ points to add a tool.

Tool Deprecation Signals

Remove tools when:

Usage drops below 1% of total tool calls for 30 days
Better alternatives exist with higher satisfaction
Maintenance cost exceeds value delivered
Strategy shifts away from the capability

Policy Checklist

Your MCP system demonstrates good governance when:

Tool metadata is comprehensive and up-to-date
Discovery is automated with validation
Error handling follows consistent policies
Circuit breakers protect against cascading failures
Performance patterns are applied uniformly
Tool selection has clear criteria
Deprecation has a defined process
Operators can query tool health programmatically

Coming Next: Part 4 — Advanced Patterns & Production

In the final part, we’ll explore:

Tool composition and orchestration patterns
Security as structural design
Testing for failure at scale
Production readiness principles

Continue to Part 4: Advanced Patterns & Production

Reflection

Policies scale where improvisation fails. By centralizing decisions about discovery, errors, performance, and selection, you create systems that behave predictably under stress.

The best systems make governance invisible to users but obvious to operators.

Share on

X Facebook LinkedIn Bluesky

Puneet Ghanshani