Building an Enterprise Multi-Agent AI System with Microsoft Teams Integration
A Production-Ready Architecture Using Azure AI Foundry, Model Context Protocol, and Agent Orchestration
Introduction
This article documents the architecture and implementation of a production-ready enterprise AI assistant built using a multi-agent architecture. The system combines three specialized AI agents, each with distinct capabilities:
- Foundry Agent - RAG-based knowledge retrieval using Azure AI Foundry
- Local Agent - Custom Python tools for operational tasks
- MCP Agent - Data analysis via Model Context Protocol server
The key insight driving this architecture is that no single agent can do everything well. By decomposing capabilities across specialized agents and implementing intelligent routing, we achieve both flexibility and reliability at enterprise scale.
This guide demonstrates how to build a multi-agent system that: - Routes user queries to specialized agents automatically - Manages conversation memory across agent switches - Integrates with Microsoft Teams for enterprise deployment - Learns about users over time with AI-powered memory - Executes custom Python tools, RAG queries, and data analysis seamlessly
All code and implementation examples are available in the source repository: GDEP-IT-HELPDESK
Architecture Overview
┌─────────────────┐
│ Microsoft Teams │
│ (End Users) │
└────────┬────────┘
│ Messages
▼
┌─────────────────────────────────────┐
│ Azure Bot Service │
│ (Bot Framework Integration) │
└────────────────┬────────────────────┘
│ HTTPS Webhook
▼
┌────────────────────────────────────────────────────────────────────────────┐
│ FastAPI Backend (Port 8000) │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Teams Bot Handler │ │
│ │ • User Authentication (Entra ID extraction) │ │
│ │ • Thread Persistence Management │ │
│ │ • Context Injection │ │
│ │ • AI Memory Provider (learns about users) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Hybrid Intent Router │ │
│ │ 1. Keyword Matching (~5ms, 80% of queries) │ │
│ │ 2. LLM Classification (~150ms, 20% ambiguous queries) │ │
│ │ 3. Priority: MCP → Local → Foundry → Default (Local) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────┼─────────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ MCP AGENT │ │ LOCAL AGENT │ │ FOUNDRY AGENT │ │
│ │ (Data Analyst) │ │ (Operations) │ │ (Knowledge/RAG) │ │
│ ├─────────────────┤ ├─────────────────┤ ├─────────────────┤ │
│ │ MCPStreamable │ │ Custom Python │ │ AzureAIAgent │ │
│ │ HTTPTool │ │ Tools │ │ Client │ │
│ │ │ │ │ │ │ │
│ │ • Data Catalog │ │ • User Lookup │ │ • File Search │ │
│ │ • SQL Queries │ │ • Password │ │ • Vector Store │ │
│ │ • File Queries │ │ Reset │ │ • RAG Retrieval │ │
│ │ • Reports │ │ • Tickets │ │ • Policies/Docs │ │
│ │ │ │ • Phone Orders │ │ │ │
│ │ │ │ • AI Memory │ │ │ │
│ └────────┬────────┘ └─────────────────┘ └─────────────────┘ │
└───────────┼────────────────────────────────────────────────────────────────┘
│ HTTP (RFC 1918 private network)
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ MCP Server (Separate Container) │
│ FastMCP Framework - Port 8001 │
├─────────────────────────────────────────────────────────────────────────────┤
│ 9 Data Analysis Tools: │
│ 1. get_data_catalog - List all datasets with metadata │
│ 2. get_connection_info - Check data source connectivity │
│ 3. get_files_list - Available CSV/Parquet files │
│ 4. get_database_tables - SQL Server tables with row counts │
│ 5. get_schema - File schemas (Polars) │
│ 6. get_schema_db - Database table schemas │
│ 7. execute_polars_sql - Query CSV/Excel files │
│ 8. execute_database_query - Query SQL Server │
│ 9. get_user_info - User context lookup │
└─────────────────────────────────────────────────────────────────────────────┘
The Multi-Agent Paradigm
Why Multiple Agents?
A single monolithic agent faces several challenges at enterprise scale:
- Token Limits: Stuffing all capabilities into one agent's system prompt consumes context window
- Specialization: Different tasks require different models, tools, and configurations
- Maintainability: Modular agents are easier to test, debug, and update independently
- Cost Optimization: Route simple queries to cheaper models, complex ones to capable models
The Orchestrator Pattern
Our architecture uses an orchestrator pattern where a central handler:
- Receives the user message
- Determines intent via hybrid routing
- Routes to the appropriate specialized agent
- Manages thread persistence across agents
- Returns the response to the user
# Simplified orchestrator flow
async def handle_message(user_message: str, context: TurnContext):
# 1. Extract user identity from Teams
user_identifier = extract_entra_id(context)
# 2. Route to appropriate agent
agent_type = route_to_agent(user_message) # "mcp" | "local" | "foundry"
# 3. Load/create conversation thread
thread = await load_or_create_thread(conversation_id)
# 4. Execute with appropriate agent
if agent_type == "mcp":
response = await mcp_agent.run(user_message, thread=thread)
elif agent_type == "foundry":
response = await foundry_agent.run(user_message) # Stateless
else:
response = await local_agent.run(user_message, thread=thread)
# 5. Save thread and return response
await save_thread(thread)
return response
Agent #1: Azure AI Foundry Agent (RAG/Knowledge Base)
What is Azure AI Foundry?
Azure AI Foundry (formerly Azure AI Studio) provides a managed platform for building AI agents with:
- Hosted Agents: Pre-configured agent instances with unique Agent IDs
- File Search: Vector store for RAG over uploaded documents
- Code Interpreter: Execute code in a sandboxed environment
- Function Calling: Define custom tool schemas
Integration Pattern
The Foundry Agent is integrated using the AzureAIAgentClient from the Microsoft Agent Framework SDK:
from azure.identity import ClientSecretCredential
from agent_framework.azure import AzureAIAgentClient
# Service Principal authentication
credential = ClientSecretCredential(
tenant_id=AZURE_TENANT_ID,
client_id=AZURE_CLIENT_ID,
client_secret=AZURE_CLIENT_SECRET
)
# Create agent client
foundry_client = AzureAIAgentClient(
credential=credential,
project_endpoint="https://your-project.services.ai.azure.com/...",
agent_id="asst_xxxxxxxxxxxx", # Pre-configured agent ID
model_deployment_name="gpt-4.1"
)
Use Cases for Foundry Agent
- Policy Questions: "What is the password policy?"
- Procedure Lookups: "How do I request new hardware?"
- Documentation Search: "Explain the approval process"
Key Insight: Stateless by Design
The Foundry Agent is intentionally stateless - each query is independent. This works well for knowledge base queries where context from previous questions rarely matters.
Agent #2: Local Agent (Custom Python Tools)
Why a Local Agent?
While Azure AI Foundry is powerful, it has limitations for custom tool execution:
- No Direct Code Execution: Foundry tools are schema-based, not Python functions
- No Database Access: Can't directly connect to SQL Server or APIs
- No Real-Time Integration: Can't call external services like SAP or ServiceDesk
Implementation
The Local Agent uses AzureOpenAIChatClient with Python function tools:
from agent_framework.azure import AzureOpenAIChatClient
agent = AzureOpenAIChatClient(
endpoint=AZURE_OPENAI_ENDPOINT,
deployment_name="gpt-4.1",
api_key=AZURE_OPENAI_API_KEY,
api_version="2024-05-01-preview"
).create_agent(
instructions="""You are an IT helpdesk assistant...""",
name="IT-Helpdesk-Local",
tools=[
get_user_info, # SQL database lookup
reset_sap_password, # SAP RFC integration
create_ticket, # ServiceDesk Plus API
check_ticket_status, # ServiceDesk Plus API
get_my_open_tickets, # ServiceDesk Plus API
request_phone_order # Workflow trigger
]
)
Tool Definition Pattern
Tools are defined as Python functions with type hints and docstrings:
def reset_sap_password(
sap_account_name: str,
user_identifier: str,
identifier_type: str = "entra_id"
) -> str:
"""
Reset SAP password for user account.
ALWAYS call this tool for password reset requests.
The tool handles all permission checks internally.
Args:
sap_account_name: SAP account to reset
user_identifier: Entra ID of requesting user
identifier_type: Type of identifier (entra_id or email)
Returns:
Formatted result with new temporary password
"""
# Implementation connects to SAP via RFC
# Validates permissions, resets password, returns result
...
Tool Execution Flow
- LLM receives user message + tool definitions
- LLM generates a function call with arguments
- Agent Framework executes the Python function
- Function result is injected back into conversation
- LLM generates final response incorporating tool output
Agent #3: MCP Agent (Data Analysis)
What is Model Context Protocol (MCP)?
MCP is an open protocol (developed by Anthropic) that standardizes how AI applications connect to external data sources and tools. Key concepts:
- MCP Server: Hosts tools and exposes them via HTTP/SSE
- MCP Client: Consumes tools from one or more MCP servers
- Tool Discovery: Client can query server for available tools
- Standardized Format: JSON-RPC style requests/responses
For a comprehensive guide on building MCP servers for data analysis, see our Data Analysis with LLM via MCP Server series - Part 1.
Why MCP for Data Analysis?
Data analysis tools have unique requirements:
- Schema Discovery: Need to explore available datasets
- Query Flexibility: Execute arbitrary SQL against files/databases
- Separation of Concerns: Data layer managed independently from bot
Integration via MCPStreamableHTTPTool
from agent_framework import MCPStreamableHTTPTool
# Create MCP tool that connects to remote server
mcp_tool = MCPStreamableHTTPTool(
name="data-analyst",
url="http://mcp-server:8001/mcp?function=it",
description="Data analysis tools - employee data, reports, SQL queries"
)
# Create agent with MCP tools
mcp_agent = AzureOpenAIChatClient(...).create_agent(
instructions="""You are a data analyst assistant.
CRITICAL FIRST STEP:
1. ALWAYS call get_data_catalog FIRST before any query
2. This shows available datasets with query_tool field
3. Use the appropriate execution tool based on query_tool
""",
name="Data-Analyst",
tools=mcp_tool # Single tool that exposes 9 MCP capabilities
)
MCP Server Architecture
The MCP server runs as a separate container and exposes these tools:
| Tool | Purpose |
|---|---|
get_data_catalog |
List all datasets with metadata (CALL FIRST) |
get_files_list |
List CSV/Parquet files |
get_database_tables |
List SQL tables with row counts |
get_schema |
Get file column schema |
get_schema_db |
Get database table schema |
execute_polars_sql |
Query files using Polars |
execute_database_query |
Query SQL Server |
Full implementation details available in: GDEP-MCP-ANALYST Repository
Network Architecture
The MCP server runs on a private RFC 1918 network:
┌─────────────────────┐ ┌─────────────────────┐
│ Bot Container │ HTTP │ MCP Container │
│ (Port 8000) │ ───────► │ (Port 8001) │
│ │ │ │
│ 10.27.6.4 │ No Auth │ 10.27.6.5 │
│ │ │ │
└─────────────────────┘ └─────────────────────┘
No authentication is needed because both containers are on the same Azure virtual network with no public exposure.
Hybrid Intent Routing: Keywords + LLM Classification
The Problem with Pure Keyword Matching
Simple keyword matching fails for ambiguous queries:
| Query | Expected | Keyword Match |
|---|---|---|
| "How many employees in IT?" | MCP | ✅ "employees" |
| "I need some numbers" | MCP | ❌ No keyword |
| "Show me the data" | MCP | ✅ "data" |
| "What's the current situation?" | ??? | ❌ Ambiguous |
The Hybrid Solution
We combine fast keyword matching with LLM fallback:
def route_to_agent(user_message: str) -> str:
message_lower = user_message.lower()
# Step 1: Check MCP keywords (HIGHEST PRIORITY) - ~5ms
mcp_keywords = ["data", "analyze", "employees", "bill", "report", ...]
for keyword in mcp_keywords:
if keyword in message_lower:
return "mcp"
# Step 2: Check operational keywords - ~5ms
operational_keywords = ["reset password", "create ticket", "hello", ...]
for keyword in operational_keywords:
if keyword in message_lower:
return "local"
# Step 3: Check knowledge keywords - ~5ms
knowledge_keywords = ["policy", "procedure", "how to", ...]
for keyword in knowledge_keywords:
if keyword in message_lower:
return "foundry"
# Step 4: No match - use LLM classification - ~150ms
intent = classify_with_llm(user_message)
if intent == "data_analysis":
return "mcp"
elif intent == "operational_task":
return "local"
elif intent == "knowledge_base":
return "foundry"
# Step 5: Default fallback
return "local"
LLM Intent Classifier
For ambiguous queries, we use GPT-4.1-mini (fast, cheap):
class IntentClassifier:
def __init__(self):
self.system_prompt = """You are an intent classifier.
Classify into exactly ONE category:
1. data_analysis - Data, numbers, reports, employee info
2. operational_task - Password resets, tickets, actions
3. knowledge_base - Policies, procedures, how-to questions
Respond with ONLY ONE WORD: data_analysis OR operational_task OR knowledge_base"""
def classify(self, message: str) -> str:
response = self.client.chat.completions.create(
model="gpt-4.1-mini",
messages=[
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": message}
],
temperature=0.3, # Consistent classification
max_tokens=10, # One word response
timeout=5.0 # Quick timeout
)
return response.choices[0].message.content.strip().lower()
Performance Metrics
| Routing Method | Latency | Accuracy | Cost |
|---|---|---|---|
| Keyword Matching | ~5ms | 80% | Free |
| LLM Classification | ~150ms | 99% | ~$0.0001/query |
| Hybrid | ~15ms avg | 93% | ~$0.02/100 queries |
Thread Persistence & Conversation Memory
The Challenge
Users expect continuity across messages:
User: "What is our December Azure bill?"
Bot: "The December 2025 Azure bill is $56,940.03"
User: "Compare that to November" ← Bot needs to remember "December bill"
Solution: Serialized Thread Storage
We serialize the entire conversation thread to Azure Blob Storage:
class ThreadStorage:
def __init__(self, conversation_id: str, user_identifier: str):
self.base_path = f"/mnt/azure/helpdesk/threads/{conversation_id}"
def save_thread(self, thread_id: str, thread_data: dict):
file_path = f"{self.base_path}/thread_{thread_id}.json"
with open(file_path, 'w') as f:
json.dump(thread_data, f)
def load_thread(self, thread_id: str) -> dict:
file_path = f"{self.base_path}/thread_{thread_id}.json"
with open(file_path, 'r') as f:
return json.load(f)
Thread Serialization with Agent Framework
The Microsoft Agent Framework provides built-in serialization:
# After agent response
serialized = await thread.serialize()
thread_storage.save_thread(thread_id, serialized)
# Before next message
thread_data = thread_storage.load_thread(active_thread_id)
# Convert dicts back to ChatMessage objects
from agent_framework import ChatMessage
if 'chat_message_store_state' in thread_data:
store = thread_data['chat_message_store_state']
store['messages'] = [
ChatMessage.from_dict(msg) if isinstance(msg, dict) else msg
for msg in store['messages']
]
# Restore thread with history
thread = await agent.deserialize_thread(thread_data)
Sliding Window: Cost Optimization
Long conversations can consume excessive tokens. We apply a sliding window:
def apply_sliding_window(thread, max_messages: int = 20):
"""Keep only last N messages for LLM context, but save full history."""
original_messages = thread._chat_message_store.messages
original_count = len(original_messages)
if original_count > max_messages:
# Store full history for later restoration
full_history = list(original_messages)
# Send only recent context to LLM
thread._chat_message_store.messages = original_messages[-max_messages:]
# After LLM response, restore full history + new messages
# ... (see full implementation)
return thread
Cost savings example: - Thread with 300 messages: ~60,000 tokens = ~$1.80/request - After sliding window (20 messages): ~4,000 tokens = ~$0.12/request - 93% cost reduction
Agent-Specific Context Filtering
The Cross-Agent Hallucination Problem
When agents share conversation history, a dangerous issue emerges:
1. User: "What is December Azure bill?"
2. MCP Agent: [calls execute_database_query] → "$56,940.03"
3. User: "Reset my password" ← Routes to LOCAL agent
4. LOCAL Agent sees: execute_database_query in history
5. LOCAL Agent doesn't have this tool
6. LOCAL Agent HALLUCINATES random tool calls! ❌
Solution: Filter Before Sending to LLM
Before sending context to each agent, filter out tool calls from other agents:
def filter_context_by_agent(thread, target_agent: str):
"""Remove cross-agent tool calls to prevent hallucinations."""
MCP_TOOLS = {
'get_data_catalog', 'execute_database_query',
'execute_polars_sql', 'get_schema', ...
}
LOCAL_TOOLS = {
'get_user_info', 'reset_sap_password',
'create_ticket', 'check_ticket_status', ...
}
filtered_messages = []
for msg in thread.messages:
if msg.role == 'user':
# ALWAYS keep user messages
filtered_messages.append(msg)
elif msg.role == 'assistant':
# Check for cross-agent tool calls
has_foreign_tools = False
for content in msg.contents:
if content.type == 'function_call':
tool_name = content.name
if target_agent == 'local' and tool_name in MCP_TOOLS:
has_foreign_tools = True
elif target_agent == 'mcp' and tool_name in LOCAL_TOOLS:
has_foreign_tools = True
if not has_foreign_tools:
filtered_messages.append(msg)
thread.messages = filtered_messages
return thread
Execution Flow with Filtering
1. Load full thread (50 messages)
2. Apply sliding window (keep last 20)
3. Apply agent-specific filter (remove cross-agent tools)
4. Send clean context to LLM
5. Get response
6. Restore full history + add new messages
7. Save complete thread to storage
The AI Memory System: Learning About Users
Beyond Session Memory
Traditional chatbots only remember within a session. Our AI Memory system learns across all conversations:
Day 1: "Hi, I'm John from the Finance department"
Bot: [Learns: name="John", department="Finance"]
Day 2: "Reset my SAP password"
Bot: "Hi John! I'll reset your SAP password right away."
[Knows user without asking]
Implementation: ContextProvider Pattern
The Microsoft Agent Framework provides a ContextProvider interface:
from agent_framework import ContextProvider, Context
class AIMemoryProvider(ContextProvider):
"""AI-powered long-term memory that learns about users."""
def __init__(self, user_identifier: str, ai_client: AsyncAzureOpenAI):
self.user_identifier = user_identifier
self.ai_client = ai_client
self.profile_file = f"/mnt/azure/helpdesk/memory/{user_identifier}/profile.json"
self._load_profile()
async def invoking(self, messages, **kwargs) -> Context:
"""Called BEFORE agent processes request. Inject user profile."""
if self.user_profile:
profile_text = "\n".join([f"- {k}: {v}" for k, v in self.user_profile.items()])
return Context(instructions=f"""
[USER PROFILE - LONG-TERM MEMORY]:
{profile_text}
Reference this naturally when relevant. Greet the user by name!
""")
return Context()
async def invoked(self, request_messages, response_messages, **kwargs):
"""Called AFTER conversation. Extract new information."""
user_message = extract_last_user_message(request_messages)
# Use AI to extract facts worth remembering
extraction_prompt = f"""
Analyze this message and extract personal info worth remembering:
"{user_message}"
Current profile: {self.user_profile}
Extract ONLY factual info about the user:
- Personal: name, department, role, location
- Technical: systems they use, common issues
- Preferences: communication style
Return as JSON: {{"key": "value"}}
If nothing to remember, return: {{}}
"""
response = await self.ai_client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": extraction_prompt}],
temperature=0.1, # Consistent extraction
max_tokens=200
)
extracted = json.loads(response.choices[0].message.content)
if extracted:
self.user_profile.update(extracted)
self._save_profile()
What Gets Remembered
The AI decides what's important. Examples:
| User Says | AI Learns |
|---|---|
| "I'm John Smith from Finance" | name: "John Smith", department: "Finance" |
| "I use SAP every day" | primary_systems: "SAP" |
| "This laptop issue again" | recurring_issue: "laptop problems" |
| "Reset my password" | {} (no personal info, just request) |
Microsoft Teams as the UI Layer
Why Teams?
- Enterprise Integration: Already deployed in most organizations
- Identity Built-in: Entra ID (Azure AD) authentication automatic
- No App to Install: Works in existing Teams client
- Mobile + Desktop: Unified experience across devices
Bot Framework Integration
from botbuilder.core import ActivityHandler, TurnContext
from botbuilder.schema import Activity, ActivityTypes
class ITHelpdeskBot(ActivityHandler):
async def on_message_activity(self, turn_context: TurnContext):
# Extract message
user_message = turn_context.activity.text
# Extract Entra ID (automatically provided by Teams)
entra_id = turn_context.activity.from_property.aad_object_id
# Send typing indicator
await turn_context.send_activity(Activity(type=ActivityTypes.typing))
# Route and process (as shown earlier)
response = await self.process_message(user_message, entra_id)
# Send response (supports Markdown)
await turn_context.send_activity(MessageFactory.text(response))
Authentication Flow
┌─────────────────────────────────────────────────────────────────┐
│ 1. User sends message in Teams │
│ ↓ │
│ 2. Teams adds Entra ID (aad_object_id) to activity │
│ ↓ │
│ 3. Azure Bot Service routes to webhook │
│ ↓ │
│ 4. FastAPI receives with full user identity │
│ ↓ │
│ 5. Bot uses Entra ID for: │
│ - Database lookup (user's department, email, etc.) │
│ - Permission checks (can they reset others' passwords?) │
│ - Thread storage (per-user conversation history) │
│ - AI Memory (per-user learned preferences) │
└─────────────────────────────────────────────────────────────────┘
Putting It All Together
Complete Request Flow
User sends "How many employees in IT?" via Teams
┌─────────────────────────────────────────────────────────────────┐
│ 1. AUTHENTICATION │
│ - Extract Entra ID from Teams activity │
│ - Look up user in SQL database │
│ - Initialize AI Memory for this user │
└──────────────────────────┬──────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ 2. INTENT ROUTING │
│ - Check keywords: "employees" → MCP_KEYWORDS │
│ - Route decision: "mcp" (Data Analyst) │
└──────────────────────────┬──────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ 3. THREAD MANAGEMENT │
│ - Load conversation thread from Azure Blob │
│ - Apply sliding window (keep last 20 messages) │
│ - Apply agent filter (remove LOCAL agent tool calls) │
└──────────────────────────┬──────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ 4. MCP AGENT EXECUTION │
│ - Create MCP agent with data analysis tools │
│ - Agent calls: get_data_catalog → execute_database_query │
│ - Query: SELECT COUNT(*) FROM employees WHERE dept = 'IT' │
│ - Result: 42 employees │
└──────────────────────────┬──────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ 5. RESPONSE & PERSISTENCE │
│ - Restore full thread history │
│ - Add new messages (user + assistant + tool calls) │
│ - Serialize and save to Azure Blob │
│ - Log request for audit │
└──────────────────────────┬──────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ 6. TEAMS RESPONSE │
│ - Send Markdown-formatted response │
│ - "There are **42 employees** in the IT department." │
└─────────────────────────────────────────────────────────────────┘
Key Architectural Decisions
1. Shared Thread Storage (Option B: Filtered Context)
Decision: All agents share the same thread storage, but filter before sending to LLM.
Why: - Users expect context to persist across agent switches - "Compare this to November" works after MCP query - Filtering prevents cross-agent hallucinations - Full history preserved for audit/compliance
Rejected Alternatives: - Option A (Separate Threads): Lost context across agent switches - Option C (No Filtering): Caused LLM hallucinations
2. Hybrid Intent Routing
Decision: Keywords first, LLM fallback for ambiguous queries.
Why: - Keywords handle 80% of queries instantly (~5ms) - LLM provides 99% accuracy for edge cases (~150ms) - Graceful degradation if LLM unavailable - Cost-effective (~$0.02 per 100 ambiguous queries)
3. Foundry Agent for Knowledge, Local Agent for Operations
Decision: Separate RAG/knowledge capabilities from custom tool execution.
Why: - Foundry excels at document retrieval (built-in vector store) - Local agent excels at real-time integrations (SAP, ServiceDesk, SQL) - Different tool paradigms (schema-based vs Python functions) - Independent scaling and updates
4. MCP for Data Analysis
Decision: Separate MCP server for data queries.
Why: - Data schema changes independently from bot logic - Same MCP server can serve multiple clients (bot, Claude Desktop, etc.) - SQL injection prevention in isolated container - Easy to add new data sources without redeploying bot
Lessons Learned
1. LLMs Will Hallucinate If They See Unavailable Tools
When a LOCAL agent sees MCP's execute_database_query in conversation history, it will try to call it even though it doesn't have access. Always filter context before sending to each agent.
2. Thread Serialization Requires Exact Import Paths
# ✅ CORRECT - preserves conversation memory
from agent_framework import ChatMessage
# ❌ WRONG - silently breaks thread deserialization
from agent_framework.messages import ChatMessage
This caused hours of debugging. The import path affects how objects are serialized/deserialized.
3. Cached Thread History Can Poison Conversations
If the bot once generated a bad response (e.g., "You don't have permission"), that response stays in thread history. On subsequent messages, the LLM sees this pattern and repeats it. Solution: Clear thread storage when debugging, or implement thread expiration.
4. Tool Docstrings Are System Prompts
The LLM reads tool docstrings to decide when/how to call tools. Vague docstrings cause wrong tool calls. Be explicit:
# ❌ Vague
def reset_password(account):
"""Reset user password."""
# ✅ Explicit
def reset_password(account):
"""
Reset SAP password for account.
ALWAYS call this tool for password requests.
DO NOT check permissions - the tool handles this internally.
Pass the account name EXACTLY as the user specifies.
"""
5. Emojis and Formatting Get Lost Without Strong Instructions
LLMs like to "clean up" responses. If your tool returns formatted output with emojis, the LLM may summarize or rephrase it. Add explicit instructions:
Tool output = YOUR output. No summarizing. No rephrasing.
Copy the tool response CHARACTER FOR CHARACTER.
Include EVERY emoji. Include EVERY markdown element.
Conclusion
Building a production-ready multi-agent system requires careful consideration of:
- Agent Specialization: Each agent has clear responsibilities
- Intelligent Routing: Fast keywords + accurate LLM fallback
- Context Management: Shared storage with agent-specific filtering
- Memory Systems: Session threads + long-term AI memory
- Enterprise Integration: Teams + Entra ID + Azure services
The result is a system that feels like a single intelligent assistant while leveraging the strengths of multiple specialized agents behind the scenes.
For a deep dive into building the MCP data analysis component, see our comprehensive guide: Data Analysis with LLM via MCP Server - Part 1
All implementation code and deployment details are available at: - Multi-Agent System: GDEP-IT-HELPDESK - MCP Data Server: GDEP-MCP-ANALYST
References and Credits
- Microsoft Agent Framework SDK
- Model Context Protocol (MCP)
- Azure AI Foundry Documentation
- Bot Framework SDK for Python
- FastMCP Framework
- Polars DataFrame Library
This article documents a production implementation. Architecture patterns and code examples are based on real-world enterprise deployment.