Skip to content

blog: MCP Security — Why Your AI Agent Tool Calls Need a Firewall#899

Open
aymenhmaidiwastaken wants to merge 3 commits intomicrosoft:mainfrom
aymenhmaidiwastaken:blog/mcp-security-firewall
Open

blog: MCP Security — Why Your AI Agent Tool Calls Need a Firewall#899
aymenhmaidiwastaken wants to merge 3 commits intomicrosoft:mainfrom
aymenhmaidiwastaken:blog/mcp-security-firewall

Conversation

@aymenhmaidiwastaken
Copy link
Copy Markdown

Closes #848

Drafted the MCP security blog post covering the threat landscape around AI agent tool calls — tool poisoning, rug-pull attacks, cross-server data leakage, and over-permissioned tools with concrete attack scenarios.

Includes six practical recommendations: tool allowlisting, definition fingerprinting, argument boundary enforcement, human-in-the-loop for sensitive ops, runtime monitoring, and trust domain isolation.

Happy to revise based on feedback!

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions github-actions bot added agent-mesh agent-mesh package size/M Medium PR (< 200 lines) labels Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 AI Agent: contributor-guide — 🌟 What You Did Well

Hi @aymenhmaidiwastaken! 👋

Welcome to the Agent Governance Toolkit community, and thank you for contributing your time and expertise! 🎉 Your blog post draft is incredibly thoughtful and well-researched — it's clear you've put a lot of effort into breaking down complex security concepts into actionable advice. Let's dive into the review!


🌟 What You Did Well

  1. Clarity and Structure: Your blog post is exceptionally well-organized. The "Threat Surface" and "Practical Recommendations" sections are easy to follow and provide a logical flow from problem to solution.
  2. Actionable Advice: The six recommendations are practical, detailed, and immediately useful for readers. The inclusion of YAML examples and JSON schemas is a great touch to make the concepts tangible.
  3. Real-World Scenarios: The attack scenarios you described are both realistic and compelling. They help illustrate the risks in a way that will resonate with practitioners.
  4. Community Alignment: Your post aligns perfectly with the goals of this repository — promoting secure and responsible agent governance. The tie-ins to OWASP and the MCP Trust Guide are excellent.

🛠 Suggestions for Improvement

Here are a few areas where we can refine your contribution to align with project conventions and ensure maximum impact:

1. File Placement

  • Blog posts in this repository are typically placed under packages/{name}/docs/blog/. You've done this correctly by placing the file in packages/agent-mesh/docs/blog/. ✅
  • However, could you also add a test case to ensure the blog post renders correctly in our documentation pipeline? Tests for this package should go in packages/agent-mesh/tests/. You can create a simple test to verify the file's presence and formatting.

2. Linting

  • We use ruff for linting with a focus on E, F, and W error codes. While your blog post is Markdown and won't be linted directly, make sure any Python code snippets (like the MCP Security Scanner link) adhere to PEP 8 standards. If you include runnable Python examples in the future, running ruff locally will help catch issues early.

3. Commit Message

  • We follow the Conventional Commits standard for commit messages. Your commit message should start with a prefix like docs: to indicate the type of change. For example:
    docs: add MCP security blog post on tool call firewalls
    
  • This helps maintainers quickly understand the purpose of your changes and ensures consistent commit history.

4. Security-Sensitive Content

  • Since this blog post discusses security-sensitive topics, it will receive extra scrutiny. You've done a great job referencing OWASP and providing concrete examples, but it would be helpful to link directly to the OWASP Top 10 for LLMs for readers who want to dive deeper.

5. Cross-Referencing Internal Resources

  • You’ve already linked to the MCP Trust Guide and the MCP Security Scanner. Great job! To make this even more robust, consider adding a link to our CONTRIBUTING.md file for readers who might want to contribute to the toolkit after reading your post.

🔗 Helpful Resources

Here are some resources to help you refine your contribution:


✅ Next Steps

  1. Address the feedback above:
    • Add a test case for the blog post in packages/agent-mesh/tests/.
    • Ensure your commit message follows the docs: prefix convention.
    • Optionally, add a link to the OWASP Top 10 for LLMs.
  2. Push your changes to this branch. Once updated, our CI/CD pipeline will automatically re-run checks.
  3. Let us know if you have any questions or need clarification on any of the feedback!

Once you've made these updates, we'll review your PR again and work towards merging it. Thank you for helping us make the Agent Governance Toolkit even better! 🚀

Looking forward to your updates! 😊

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Feedback on Pull Request: blog: MCP Security — Why Your AI Agent Tool Calls Need a Firewall

🔴 CRITICAL

  1. Tool Description Injection Vulnerability
    The blog correctly highlights the risk of tool poisoning via description injection but does not explicitly recommend sanitizing tool descriptions before they are consumed by the agent. This is a critical omission because malicious descriptions can bypass LLM safeguards.
    Actionable Recommendation: Add explicit guidance to sanitize tool descriptions for hidden instructions or malicious payloads before they are presented to the agent. This could include stripping non-visible characters, detecting prompt injection patterns, and validating descriptions against a whitelist of allowed patterns.

  2. Cross-Server Data Leakage
    While the blog mentions the risk of cross-server data leakage, it does not provide concrete implementation details for tracking data provenance across tool calls. Without this, the recommendation for isolating MCP server trust domains lacks actionable guidance.
    Actionable Recommendation: Include technical details on how to implement data provenance tracking, such as tagging data with metadata about its origin and enforcing policies based on these tags.

🟡 WARNING

  1. Backward Compatibility of Tool Fingerprinting
    The recommendation to fingerprint tool definitions and block tools with changed definitions could lead to breaking changes in production environments. If an MCP server updates a tool description or schema for legitimate reasons (e.g., bug fixes or feature enhancements), agents may fail to function unless the fingerprints are updated.
    Actionable Recommendation: Suggest implementing a staged approval process for fingerprint changes, where updates are flagged but not immediately blocked. This allows operators to review and approve legitimate changes without disrupting production.

💡 SUGGESTIONS

  1. Expand Human-in-the-Loop Guidance
    The blog mentions human approval for sensitive operations but does not specify how this could be implemented in practice.
    Actionable Recommendation: Provide examples of how to integrate human-in-the-loop mechanisms, such as using a webhook to trigger approval workflows in tools like Slack or Microsoft Teams.

  2. Runtime Monitoring Details
    The recommendation for runtime monitoring is high-level and does not specify what tools or frameworks could be used to implement anomaly detection.
    Actionable Recommendation: Suggest specific technologies or libraries (e.g., OpenTelemetry for tracing, Elasticsearch for log analysis) that can be used to implement runtime monitoring.

  3. OWASP Agentic Top 10 Mapping
    While the blog references ASI01 (Prompt Injection), it could benefit from mapping the other threats (rug-pull attacks, data leakage, over-permissioned tools) to relevant OWASP Agentic Top 10 categories.
    Actionable Recommendation: Expand the OWASP mapping to include ASI02 (Supply Chain Vulnerabilities) for rug-pull attacks and ASI03 (Data Leakage) for cross-server data leakage.

  4. Tool Allowlist Implementation
    The YAML example for tool allowlisting is helpful but lacks details on how this policy would be enforced programmatically.
    Actionable Recommendation: Provide a code snippet or pseudocode demonstrating how the allowlist can be integrated into the agent's runtime logic.

  5. Clarify "Excessive Data Volume" Detection
    The blog mentions scanning arguments for excessive data volume but does not define thresholds or criteria for what constitutes "excessive."
    Actionable Recommendation: Add guidance on setting thresholds based on tool schema expectations, such as maximum string lengths or array sizes.

  6. Link to MCP Trust Guide and Security Scanner
    The blog links to the MCP Trust Guide and Security Scanner but does not summarize their functionality or relevance to the recommendations.
    Actionable Recommendation: Briefly describe what these resources provide and how they can help implement the defenses outlined in the blog.

General Observations

  • The blog is well-written and provides a clear overview of the MCP threat landscape. It effectively communicates the urgency of securing tool calls and offers practical recommendations.
  • The inclusion of real-world attack scenarios is excellent and helps illustrate the risks.
  • The blog aligns well with the goals of the repository and contributes valuable insights to the community.

Final Recommendation

Merge the pull request after addressing the critical issues and warnings. Consider incorporating the suggestions to further enhance the blog's utility and actionable guidance.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🤖 AI Agent: security-scanner — Security Review of PR: "MCP Security — Why Your AI Agent Tool Calls Need a Firewall"

Security Review of PR: "MCP Security — Why Your AI Agent Tool Calls Need a Firewall"

This PR introduces a blog post discussing the security implications of the Model Context Protocol (MCP) and provides practical defenses against various attack vectors. While the PR is primarily documentation, it includes code snippets and recommendations that could influence how users implement security measures. Below is a security review based on the specified criteria:


1. Prompt Injection Defense Bypass

Rating: 🔴 CRITICAL
Attack Vector: The blog post highlights a significant vulnerability in MCP: tool descriptions can be used as a vector for prompt injection attacks. An attacker could embed hidden instructions in the description field of a tool, which the LLM would interpret as trusted context. This could lead to unauthorized tool calls, data exfiltration, or other malicious behavior.
Fix: The blog post provides a solid set of recommendations for mitigating this risk, including scanning tool descriptions for imperative directives, cross-tool references, invisible Unicode, encoded payloads, and hidden instructions in comments. However, the regex-based approach in the provided code snippet is insufficient for comprehensive detection. A more robust solution would involve integrating a dedicated prompt injection detection library or model, such as OpenAI's moderation API or a custom fine-tuned LLM.


2. Policy Engine Circumvention

Rating: 🟠 HIGH
Attack Vector: The blog post identifies the risk of "rug-pull attacks," where an MCP server can modify tool definitions after they have been approved. This could allow attackers to bypass policy checks by changing the behavior of a previously approved tool.
Fix: The recommendation to use fingerprinting for tool definitions is a strong mitigation. However, the blog post should emphasize the importance of securing the storage of these fingerprints (e.g., using a secure database with access controls) and ensuring that the fingerprinting process itself is tamper-proof. Additionally, the blog could suggest periodic audits of tool definitions to detect unauthorized changes.


3. Trust Chain Weaknesses

Rating: 🟡 MEDIUM
Attack Vector: The blog does not explicitly address trust chain weaknesses, such as the lack of cryptographic validation for MCP server identities or tool definitions. This could allow attackers to impersonate trusted servers or tools.
Fix: The blog should recommend implementing cryptographic signing for tool definitions and server identities, using standards like SPIFFE/SVID or Ed25519. This would ensure that only trusted servers can publish tool definitions and that these definitions cannot be tampered with.


4. Credential Exposure

Rating: 🟠 HIGH
Attack Vector: The blog highlights the risk of sensitive data (e.g., credentials, PII) being exfiltrated through tool arguments. This is particularly concerning in multi-server deployments where data from one server could be passed to another without proper boundary enforcement.
Fix: The proposed defenses, such as argument boundary enforcement, PII detection, and provenance tracking, are strong. However, the blog should also emphasize the importance of encrypting sensitive data at rest and in transit, as well as securely managing API keys and other credentials.


5. Sandbox Escape

Rating: 🔵 LOW
Attack Vector: The blog does not directly address sandbox escape vulnerabilities. However, the focus on runtime monitoring and argument validation indirectly mitigates some risks associated with executing untrusted code or commands.
Fix: While not the focus of this blog, it would be beneficial to mention the importance of running MCP servers and tools in isolated environments (e.g., containers with strict resource limits) to prevent potential sandbox escapes.


6. Deserialization Attacks

Rating: 🟡 MEDIUM
Attack Vector: The blog does not explicitly discuss the risk of deserialization attacks, which could occur if an MCP server or tool processes untrusted serialized data. This is a common attack vector in systems that handle JSON, YAML, or other serialized formats.
Fix: The blog should include a recommendation to validate and sanitize all deserialized data, using libraries that enforce strict schema validation (e.g., pydantic or jsonschema).


7. Race Conditions

Rating: 🟡 MEDIUM
Attack Vector: The blog mentions the risk of tool definitions changing between sessions or calls, which could be exploited in a race condition. For example, an attacker could modify a tool's schema after it has been approved but before it is used.
Fix: The fingerprinting mechanism is a good start, but the blog should also recommend implementing atomic operations for fetching and validating tool definitions to prevent time-of-check-to-time-of-use (TOCTOU) vulnerabilities.


8. Supply Chain

Rating: 🔴 CRITICAL
Attack Vector: The blog identifies the risk of supply chain attacks, such as tool impersonation via typosquatting or malicious updates to third-party MCP servers. These attacks could introduce compromised tools into an agent's environment.
Fix: The blog should explicitly recommend verifying the integrity and authenticity of MCP servers and tool definitions. This could include using a registry of trusted MCP servers, implementing cryptographic signing, and conducting regular audits of third-party dependencies.


Additional Observations:

  1. Code Snippets: The provided code snippets are generally well-written and align with best practices. However, the blog should include disclaimers that these are simplified examples and may require additional hardening for production use.
  2. Documentation Scope: While the blog is comprehensive, it could benefit from a more explicit discussion of how these defenses integrate with the Agent Governance Toolkit. For example, it could link to specific modules or features in the toolkit that implement these defenses.
  3. Dependency Management: The blog does not discuss the risk of dependency confusion or typosquatting in the context of MCP server implementations. This could be an important addition to the "Supply Chain" section.

Summary of Findings:

  1. Prompt injection defense bypass: 🔴 CRITICAL

    • Fix: Use robust prompt injection detection mechanisms beyond regex.
  2. Policy engine circumvention: 🟠 HIGH

    • Fix: Emphasize secure storage of fingerprints and periodic audits.
  3. Trust chain weaknesses: 🟡 MEDIUM

    • Fix: Recommend cryptographic signing for tool definitions and server identities.
  4. Credential exposure: 🟠 HIGH

    • Fix: Emphasize encryption and secure credential management.
  5. Sandbox escape: 🔵 LOW

    • Fix: Mention containerization and resource isolation for MCP servers.
  6. Deserialization attacks: 🟡 MEDIUM

    • Fix: Recommend strict schema validation for deserialized data.
  7. Race conditions: 🟡 MEDIUM

    • Fix: Suggest atomic operations for fetching and validating tool definitions.
  8. Supply chain: 🔴 CRITICAL

    • Fix: Recommend trusted server registries, cryptographic signing, and dependency audits.

Recommendations:

  1. Enhance Prompt Injection Defenses: Integrate advanced prompt injection detection tools or models.
  2. Strengthen Trust Chains: Include cryptographic validation for MCP servers and tool definitions.
  3. Expand Supply Chain Security: Address dependency-related risks and emphasize the importance of trusted server registries.
  4. Improve Documentation: Provide more explicit links to relevant modules in the Agent Governance Toolkit for implementing the recommended defenses.

This blog post is a valuable addition to the repository, but it should incorporate the above recommendations to ensure comprehensive security coverage.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing about MCP security @aymenhmaidiwastaken! Great content. Two items before we can merge:

  1. Sign the CLA — the license/cla check is still pending. Follow the bot instructions.
  2. Publish the blog externally — per issue #848, the deliverable is a published post on Dev.to/Medium/Hashnode. Please publish the article there, then update the COMMUNITY.md link to point to the published URL instead of the in-repo path.

The content quality is excellent — looking forward to getting this merged once published!

@aymenhmaidiwastaken
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

@aymenhmaidiwastaken
Copy link
Copy Markdown
Author

Thanks for the review @imran-siddique! Really appreciate the feedback.

I'll work on both items:

  1. CLA — just signed it above
  2. Publishing externally — I'll publish the article on Dev.to and update the COMMUNITY.md link to point there instead of the in-repo path. Will push the update once it's live.

Also, the AI code reviewer raised some solid points — I'll incorporate the critical ones (sanitizing tool descriptions, data provenance tracking details) and the OWASP Agentic Top 10 mapping before publishing. Should make the article stronger.

Will update the PR shortly!

@imran-siddique
Copy link
Copy Markdown
Member

Great, thanks @aymenhmaidiwastaken! Take your time with the publishing. Once the blog is live and CLA is signed, ping us and we'll merge right away.

@github-actions github-actions bot added the size/L Large PR (< 500 lines) label Apr 8, 2026
@aymenhmaidiwastaken
Copy link
Copy Markdown
Author

Updated the blog post with all the reviewer feedback incorporated:

  • Added tool description sanitization guidance with a scan_description() implementation covering imperative directives, cross-tool references, invisible Unicode, encoded payloads, and HTML comments
  • Added data provenance tracking with a ProvenanceTracker class and check_boundary() enforcement
  • Mapped all threats to OWASP Agentic Top 10 (ASI01 for tool poisoning, ASI02 for rug-pull attacks, ASI03 for data leakage)
  • Added concrete human-in-the-loop implementation with Slack/Teams webhook approval and YAML policy config
  • Added specific monitoring stack recommendations (OpenTelemetry for tracing, Elasticsearch/Loki for logs, Grafana/Datadog/PagerDuty for alerting) with span attributes
  • Defined thresholds for excessive data volume (5KB warn, 20KB block, 10x median always block)
  • Added descriptions of the MCP Trust Guide's four governance layers and the Security Scanner's capabilities

I'll publish this on Dev.to and update the COMMUNITY.md link once it's live. Working on that now.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Feedback for Pull Request: MCP Security Blog Post

🔴 CRITICAL: Security Concerns

  1. Tool Description Sanitization Pipeline:

    • The sanitization pipeline proposed in the blog post relies heavily on regex-based matching for detecting malicious patterns in tool descriptions. While regex can catch obvious cases, it is insufficient for detecting sophisticated prompt injection attacks that leverage advanced obfuscation techniques. Consider integrating a more robust NLP-based classifier trained on adversarial examples to detect hidden instructions in tool descriptions.
  2. Provenance Tracker Implementation:

    • The ProvenanceTracker implementation uses SHA-256 for content hashing, which is insufficient for detecting partial matches or modified data. Attackers can easily bypass this by slightly altering the data. Consider using fuzzy hashing techniques like ssdeep or MinHash for more robust content similarity detection.
  3. Cross-Domain Policies:

    • The example policy for cross-domain data flow allows exceptions for specific tools like translate_text. This introduces a potential attack vector where malicious actors could exploit the exception to exfiltrate sensitive data. Ensure that exceptions are tightly scoped and include additional safeguards such as content classification, size limits, and PII detection.
  4. Human-in-the-Loop Approval:

    • The webhook-based approval mechanism lacks authentication and authorization checks. An attacker could potentially spoof approval requests or responses. Ensure that the webhook endpoint is secured using cryptographic signatures, and validate responses using a secure mechanism (e.g., HMAC or JWT).
  5. Telemetry Logging:

    • While the blog post recommends logging tool calls with full arguments, this approach may inadvertently log sensitive data (e.g., PII, credentials). Ensure that sensitive data is redacted or encrypted before being logged to prevent accidental exposure.

🟡 WARNING: Potential Breaking Changes

  1. Tool Allowlisting:

    • The proposed allowlist mechanism introduces a breaking change to how agents interact with MCP servers. If implemented, agents will no longer be able to dynamically discover tools, which could impact existing workflows. Ensure that this change is documented and communicated clearly to users, along with migration guidance.
  2. Fingerprinting Tool Definitions:

    • The fingerprinting mechanism requires MCP servers to maintain consistent tool definitions across sessions. Any server-side changes to tool definitions will now result in blocked tool calls, which could disrupt production systems. Provide a fallback mechanism or alerting system to handle such cases gracefully.

💡 SUGGESTIONS: Improvements

  1. Structured Telemetry:

    • The blog post recommends using OpenTelemetry for monitoring tool calls. Extend this recommendation to include distributed tracing across MCP servers to track data flow between trust domains. This will provide better visibility into cross-server interactions.
  2. Runtime Argument Validation:

    • The argument boundary enforcement mechanism could benefit from integrating a dedicated library for sensitive data detection, such as Microsoft Presidio or Pydantic validators. This would improve accuracy and reduce false positives.
  3. Sandboxing Tool Execution:

    • The blog post does not address sandboxing for tools that execute code (e.g., execute_command). Consider recommending the use of containerization (e.g., Docker, Firecracker) or syscall filtering (e.g., seccomp) to isolate tool execution environments.
  4. Backward Compatibility:

    • For organizations with existing MCP deployments, provide a migration guide for implementing the proposed defenses incrementally. This could include a phased rollout plan for allowlisting, fingerprinting, and provenance tracking.
  5. OWASP Agentic Top 10 Mapping:

    • The blog post provides a good mapping to OWASP Agentic Top 10 categories but could benefit from a more detailed explanation of how each recommendation mitigates specific risks. Consider adding a table summarizing the defenses against each OWASP category.

Summary

This blog post provides a comprehensive overview of the security challenges in MCP-based AI agent deployments and proposes actionable defenses. However, the implementation details for some defenses (e.g., sanitization, provenance tracking) require refinement to address sophisticated attack vectors. Additionally, the proposed changes may introduce breaking changes for existing deployments, which should be carefully documented and communicated.

Recommendations

  • Address the critical security concerns flagged above, particularly around sanitization, provenance tracking, and webhook approval mechanisms.
  • Provide detailed migration guidance for users adopting the proposed defenses.
  • Consider extending the blog post to include sandboxing recommendations and distributed tracing for cross-server interactions.

This blog post is a valuable addition to the repository's documentation and aligns well with the project's focus on security and governance.

@aymenhmaidiwastaken
Copy link
Copy Markdown
Author

Both items addressed:

  1. CLA — signed
  2. Published externally — article is live on Dev.to: https://dev.to/aymenhmaidi/mcp-security-why-your-ai-agents-tool-calls-need-a-firewall-3h48

Updated COMMUNITY.md to point to the published URL instead of the in-repo path. Ready for re-review!

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This blog post provides a comprehensive overview of the security risks associated with the Model Context Protocol (MCP) and offers actionable recommendations for mitigating these risks. The post is well-written, informative, and aligns with the goals of the repository. However, there are areas that could benefit from additional clarity, technical refinement, and alignment with best practices.


🔴 CRITICAL: Security Issues

  1. Tool Description Sanitization Pipeline:

    • The regex-based sanitization approach is prone to false negatives and may miss sophisticated prompt injection attempts. For example, adversaries could use obfuscated or encoded payloads that bypass simple regex checks.
    • Recommendation: Integrate a more robust NLP-based classifier trained on a dataset of malicious and benign tool descriptions. Consider leveraging pre-trained models for detecting adversarial instructions.
  2. Provenance Tracking Implementation:

    • The current implementation of ProvenanceTracker relies on exact SHA-256 hash matching, which is brittle and prone to false negatives when data is slightly modified (e.g., whitespace changes, re-encoding). This could allow attackers to bypass provenance checks.
    • Recommendation: Replace exact hash matching with content fingerprinting techniques, such as rolling hashes or MinHash, to improve resilience against minor modifications.
  3. Cross-Domain Data Leakage:

    • While the blog mentions the importance of provenance tracking and trust domain isolation, the example implementation does not address how to handle nested or derived data (e.g., data transformations or aggregations). This could lead to leakage of sensitive information across trust boundaries.
    • Recommendation: Implement recursive provenance tracking for derived data. For example, if data from Server A is transformed and used in a tool call to Server B, the governance layer should still enforce the original trust boundary.
  4. Human-in-the-Loop Approval:

    • The webhook-based approval mechanism assumes that the human operator can make an informed decision based on the provided arguments. However, sensitive data (e.g., PII or credentials) may still be exposed in the approval request itself.
    • Recommendation: Redact sensitive data from the approval request before sending it to the human operator. Use a secure channel for approvals and ensure that the request payload is encrypted.

🟡 WARNING: Potential Breaking Changes

  1. Tool Fingerprinting:

    • Introducing fingerprinting for tool definitions may break backward compatibility for existing deployments that dynamically discover tools without validation. This could lead to blocked tool calls in production environments.
    • Recommendation: Provide a migration guide for existing users, including steps to generate fingerprints for currently approved tools and handle discrepancies during runtime.
  2. Argument Boundary Enforcement:

    • Enforcing strict thresholds for argument sizes and patterns may cause legitimate tool calls to be blocked, especially in edge cases where larger payloads are expected (e.g., processing large documents).
    • Recommendation: Allow configurable thresholds and provide detailed logs for blocked calls to help operators fine-tune policies.

💡 Suggestions for Improvement

  1. OWASP Agentic Top 10 Mapping:

    • The blog maps threats to OWASP Agentic Top 10 categories but does not provide direct links to the OWASP documentation. Adding links would improve accessibility and credibility.
  2. Code Examples:

    • The code examples are helpful but could benefit from additional comments explaining key decisions and trade-offs. For example, the provenance tracker could include comments about why certain fields (e.g., sensitivity) are chosen.
  3. Telemetry Recommendations:

    • The OpenTelemetry setup is a good starting point, but consider adding examples of how to integrate with popular observability platforms like Datadog or Prometheus. This would make it easier for users to adopt the recommendations.
  4. Real-World Case Studies:

    • The blog would be even more impactful if it included real-world case studies or examples of MCP-related security incidents. This would help readers understand the urgency of implementing the recommended defenses.
  5. Tool Allowlist YAML Example:

    • The YAML example for tool allowlisting is clear but could include comments explaining the rationale behind each rule. For instance, why certain tools are denied for specific agent roles.
  6. Markdown Formatting:

    • Consider adding a table of contents at the beginning of the blog post for easier navigation, especially given its length.

Final Recommendation

Merge the pull request after addressing the critical security issues and warnings. The blog post is a valuable addition to the repository's documentation and provides actionable insights for securing MCP-based agent deployments.

@imran-siddique
Copy link
Copy Markdown
Member

Review: APPROVE (pending rebase)

Diff reviewed — 2 files: COMMUNITY.md entry + new blog post at packages/agent-mesh/docs/blog/mcp-security-firewall.md (404 lines). Closes #848.

Content review: Excellent, technically sound blog covering MCP threat surface — tool poisoning (ASI01), rug-pull attacks (ASI02), cross-server data leakage (ASI03), over-permissioned tools. Includes 6 practical defenses with concrete code examples and YAML configs. Well-structured with real attack scenarios.

Security checklist:

  • No eval(), exec(), pickle, shell=True, innerHTML in code examples
  • Code examples use standard libraries only (hashlib, json, re, httpx, asyncio)
  • Slack webhook URL is a placeholder (https://hooks.slack.com/workflows/T.../A.../...) — not a real secret
  • No hardcoded API keys or tokens
  • Scope matches description — blog post + COMMUNITY.md entry only
  • Additions > 0 (405 additions)

Before merging, please:

  1. Rebase your branch — there are merge conflicts with current main (mergeable_state: dirty)
  2. Fork workflows need approval in the Actions tab (Maintainer approval gate)

Great contribution! 🎉

imran-siddique
imran-siddique previously approved these changes Apr 8, 2026
Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content review passed — excellent MCP security blog post. Approving.

Author: please rebase to resolve merge conflicts, then this can merge.

@aymenhmaidiwastaken
Copy link
Copy Markdown
Author

Rebased on latest main — merge conflict in COMMUNITY.md resolved. Should be good to merge now!

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Feedback on Pull Request: MCP Security Blog Post


🔴 CRITICAL: Security Concerns

  1. Tool Description Sanitization Pipeline

    • The sanitization pipeline for tool descriptions is a good start, but regex-based detection alone is insufficient for robust security. Attackers can craft adversarial descriptions that bypass simple regex patterns. Consider integrating a machine learning-based prompt injection classifier trained on adversarial examples to complement the regex checks.
  2. Provenance Tracking Implementation

    • The ProvenanceTracker implementation uses SHA-256 for exact content matching. This approach is vulnerable to partial data leakage (e.g., substring matches or modified data). Use rolling hashes or MinHash for approximate matching to detect partial overlaps and ensure robust provenance tracking.
  3. Cross-Server Data Leakage

    • The blog mentions provenance tracking but does not address cryptographic integrity verification for cross-server data flows. Without cryptographic signatures, provenance tags can be tampered with. Consider signing provenance tags using Ed25519 or similar cryptographic methods to ensure integrity.
  4. Human-in-the-Loop Approval

    • The human approval mechanism relies on external communication tools like Slack or Teams. If these tools are compromised, attackers could spoof approval responses. Implement cryptographic signatures for approval responses to ensure authenticity.

🟡 WARNING: Potential Breaking Changes

  1. Tool Fingerprinting

    • Introducing tool fingerprinting as a runtime check may break existing integrations if servers dynamically update tool definitions. Ensure backward compatibility by allowing a grace period for server updates or providing a migration path for existing deployments.
  2. Argument Boundary Enforcement

    • Enforcing strict thresholds for argument sizes and patterns could lead to false positives in legitimate use cases. Provide clear documentation and configuration options for users to customize thresholds based on their specific needs.

💡 Suggestions for Improvement

  1. Telemetry and Monitoring

    • The OpenTelemetry integration is a strong addition. Enhance it by including error codes or reasons for blocked tool calls in the telemetry data. This will help operators diagnose issues faster.
  2. Trust Domain Isolation

    • The trust domain isolation mechanism is well-designed but could benefit from more granular policies. For example, allow specific tools to cross domains only under certain conditions (e.g., time-based restrictions or user-specific overrides).
  3. Documentation

    • The blog post references the MCP Trust Guide and MCP Security Scanner but does not provide direct links to their GitHub pages or installation instructions. Add these links for easier access.
  4. Code Examples

    • The code snippets are helpful but could be expanded with unit tests or examples of expected input/output. This would make it easier for readers to understand how to implement the solutions.
  5. OWASP Agentic Top 10 Mapping

    • The blog does a great job of mapping threats to OWASP Agentic Top 10 categories. Consider adding a summary table that lists each threat, its corresponding OWASP category, and the recommended defense.
  6. Community Engagement

    • Encourage community contributions by adding a call-to-action for readers to share their own security practices or contribute to the MCP Security Scanner module.

Overall Assessment

This blog post is a comprehensive and well-written piece that addresses critical security concerns in MCP-based agent deployments. It provides actionable recommendations and practical code examples, making it highly valuable for the community. However, there are critical areas that need stronger defenses, especially around cryptographic integrity and adversarial detection.


Recommended Actions

  1. Integrate cryptographic integrity checks for provenance tags and human approvals.
  2. Enhance the sanitization pipeline with machine learning-based classifiers.
  3. Expand documentation with direct links and installation instructions for referenced tools.
  4. Provide a migration path for existing deployments to adapt to new security features.

Let me know if you need further clarification or additional feedback!

@imran-siddique imran-siddique enabled auto-merge (squash) April 8, 2026 23:50
imran-siddique
imran-siddique previously approved these changes Apr 8, 2026
@imran-siddique
Copy link
Copy Markdown
Member

@aymenhmaidiwastaken This PR has merge conflicts. The base branch has moved forward. Please rebase onto latest main to resolve conflicts. You can run: git fetch upstream && git rebase upstream/main then force-push. Once rebased, we will merge promptly.

aymenhmaidiwastaken and others added 2 commits April 11, 2026 19:01
…Calls

Addresses microsoft#848. Covers MCP threat surface (tool poisoning, rug-pull attacks,
cross-server data leakage, over-permissioned tools), real attack scenarios,
and practical defenses including tool allowlisting, definition fingerprinting,
argument boundary enforcement, and runtime monitoring.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
auto-merge was automatically disabled April 11, 2026 18:01

Head branch was pushed to by a user without write access

@aymenhmaidiwastaken aymenhmaidiwastaken force-pushed the blog/mcp-security-firewall branch from 45d97f4 to fd75d6e Compare April 11, 2026 18:01
@aymenhmaidiwastaken
Copy link
Copy Markdown
Author

Rebased onto latest main — conflict in COMMUNITY.md resolved. Ready to merge!

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Feedback for Pull Request: MCP Security Blog Post

This blog post provides a comprehensive overview of the security challenges surrounding MCP (Model Context Protocol) and offers actionable defenses. The content is well-written, technically accurate, and aligns with the goals of the repository. Below are detailed comments and feedback:


🔴 CRITICAL: Security Issues

  1. Tool Description Sanitization Pipeline

    • The sanitization logic provided in the blog post is a good starting point, but regex-based detection of prompt injection patterns is insufficient for production-grade security. Attackers can use obfuscation techniques to bypass regex rules. Consider integrating a more robust NLP-based classifier trained specifically to detect adversarial instructions in text.
    • Action: Enhance the sanitization pipeline by incorporating a machine learning model trained on adversarial examples, or use existing libraries like OpenAI's moderation API for detecting harmful content.
  2. Provenance Tracking

    • The proposed ProvenanceTracker implementation is a good start but lacks cryptographic guarantees. An attacker could tamper with the provenance metadata stored in memory.
    • Action: Use cryptographic signatures (e.g., HMAC with a secret key) to sign provenance tags. This ensures that any tampering with the metadata can be detected.
  3. Cross-Server Data Leakage

    • The blog post correctly identifies the risk of data leakage across trust domains but does not address sandbox escape vectors. If an MCP server is compromised, it could exploit vulnerabilities in the agent's runtime environment to bypass provenance checks.
    • Action: Ensure that the governance layer operates in a sandboxed environment with strict syscall filtering (e.g., using seccomp or containerization).
  4. Human-in-the-Loop Approval

    • The Slack/Teams webhook implementation assumes that the webhook endpoint is secure. If the endpoint is exposed, attackers could spoof approval requests.
    • Action: Authenticate webhook requests using a shared secret or OAuth tokens. Additionally, log all approval decisions for audit purposes.

🟡 WARNING: Potential Breaking Changes

  1. Tool Fingerprinting

    • The fingerprinting mechanism proposed (hashlib.sha256) assumes that tool definitions are deterministic. If MCP servers introduce non-deterministic fields (e.g., timestamps or random IDs), the fingerprints will break.
    • Action: Ensure that the fingerprinting logic explicitly excludes non-deterministic fields from the hash computation.
  2. Argument Boundary Enforcement

    • The enforcement of argument size limits and patterns may cause existing agents to fail if they rely on large payloads or previously allowed patterns.
    • Action: Introduce these policies incrementally and provide clear documentation for developers to adapt their agents.

💡 SUGGESTIONS: Improvements

  1. Telemetry with OpenTelemetry

    • The OpenTelemetry instrumentation is well-designed, but consider adding support for distributed tracing across MCP servers. This would allow operators to trace the flow of data across multiple servers and identify bottlenecks or suspicious patterns.
    • Action: Use OpenTelemetry's SpanContext to propagate trace IDs across tool calls.
  2. Least Privilege for Tools

    • The blog post suggests maintaining an allowlist for tools, but this approach can become cumbersome for large-scale deployments with hundreds of agents and tools.
    • Action: Implement role-based access control (RBAC) for MCP servers, where each agent role is assigned specific permissions dynamically.
  3. Data Provenance Granularity

    • The current provenance tracker operates at the level of entire tool outputs. For better security, consider tracking provenance at a finer granularity (e.g., individual fields within a JSON response).
    • Action: Use structured metadata to tag individual fields with their provenance.
  4. Runtime Monitoring

    • The blog post mentions monitoring unusual tool call sequences but does not specify how to define "unusual." Consider using anomaly detection algorithms (e.g., Isolation Forest or DBSCAN) to identify deviations from normal behavior.
    • Action: Integrate an anomaly detection library into the monitoring pipeline.
  5. OWASP Agentic Top 10 Mapping

    • The blog post maps threats to OWASP Agentic Top 10 categories but does not provide a complete mapping. For example, ASI04 (Over-Permissioned Agents) is mentioned but not explicitly mapped.
    • Action: Expand the mapping to cover all relevant OWASP Agentic Top 10 categories.

Overall Assessment

The blog post is a valuable addition to the repository and provides actionable insights into MCP security. However, some critical areas require enhancement to ensure robust defenses against adversarial attacks. Addressing the flagged issues will significantly improve the security posture of MCP-connected agents.


Suggested Next Steps

  1. Address the 🔴 CRITICAL issues immediately, especially around sanitization, provenance tracking, and webhook security.
  2. Plan for incremental rollout of 🟡 WARNING changes to avoid breaking existing deployments.
  3. Implement 💡 SUGGESTIONS to enhance the usability and scalability of the proposed defenses.

Once these changes are made, the blog post will serve as an authoritative resource for MCP security best practices.

@imran-siddique imran-siddique enabled auto-merge (squash) April 11, 2026 19:11
Fix broken relative links in blog post (wrong directory depth),
add code/username terms to cspell dictionary, and exclude
stackoverflow.com from lychee (returns 404 to automated crawlers).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
auto-merge was automatically disabled April 11, 2026 19:21

Head branch was pushed to by a user without write access

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This blog post provides an in-depth analysis of the security challenges associated with the Model Context Protocol (MCP) and outlines practical defenses to mitigate risks. The post is well-written, technically accurate, and aligns with the repository's focus on agent governance and security. It also maps the described threats to the OWASP Agentic Top 10, which is highly relevant for the audience of this repository.

However, there are a few areas that could benefit from clarification, additional detail, or minor adjustments to improve the overall quality and accuracy of the post.


🔴 CRITICAL

  1. Tool Poisoning via Description Injection:

    • The blog post correctly identifies the risk of hidden instructions in tool descriptions but does not address the possibility of adversaries using obfuscated language or indirect references that evade regex-based detection. For example, adversaries could use synonyms or paraphrasing to avoid detection by simple keyword-based checks.
    • Actionable Recommendation: Enhance the sanitization pipeline with an NLP-based classifier trained to detect suspicious patterns and semantic intent in tool descriptions. This could be implemented using pre-trained models fine-tuned for prompt injection detection.
  2. Cross-Server Data Leakage:

    • The proposed ProvenanceTracker implementation relies on exact string matching using SHA-256 hashes, which may fail to detect partial matches or modified data. This could allow attackers to bypass provenance checks by slightly altering the data.
    • Actionable Recommendation: Replace exact-match SHA-256 with content fingerprinting techniques like MinHash or rolling hashes to detect partial matches and approximate similarity.
  3. Over-Permissioned Tools:

    • The blog post highlights the lack of scoping in MCP but does not address the risk of sandbox escape vectors. For example, an attacker could exploit over-permissioned tools to execute arbitrary code or access sensitive files.
    • Actionable Recommendation: Implement runtime sandboxing for tools that interact with the filesystem or execute code. Use containerization (e.g., Docker) or syscall filtering (e.g., seccomp) to enforce strict boundaries.

🟡 WARNING

  1. Backward Compatibility:
    • The proposed defenses, such as tool allowlisting and fingerprinting, introduce runtime checks that could block previously approved tools if their definitions change. This may break existing deployments without clear migration paths.
    • Actionable Recommendation: Provide a backward compatibility mode or a migration guide for organizations adopting these defenses. For example, allow tools to operate in "legacy mode" with warnings instead of immediate blocking.

💡 SUGGESTIONS

  1. Telemetry Implementation:

    • The OpenTelemetry-based monitoring setup is a strong recommendation, but the example code could be expanded to include error handling and retries for trace exports.
    • Actionable Recommendation: Add retry logic and exception handling to the OTLPSpanExporter setup to ensure resilience in case of network issues.
  2. Human-in-the-Loop Approval:

    • The webhook-based approval mechanism is practical but assumes that human operators are always available within the timeout period. This could lead to operational bottlenecks.
    • Actionable Recommendation: Implement fallback mechanisms for time-sensitive operations, such as escalating approval requests to a secondary channel or using pre-approved policies for specific scenarios.
  3. Tool Description Sanitization:

    • The regex-based sanitization pipeline is a good starting point, but it could benefit from additional checks for overly permissive schemas (e.g., tools that accept arbitrary JSON objects as parameters).
    • Actionable Recommendation: Extend the sanitization pipeline to flag schemas with wildcard types or overly generic parameter definitions.
  4. Blog Formatting:

    • The blog post is lengthy and dense, which may overwhelm readers. Breaking it into sections with collapsible headers or a table of contents could improve readability.
    • Actionable Recommendation: Add a table of contents at the beginning and use collapsible sections for detailed technical content.
  5. Resources Section:

    • The resources section mentions the MCP Trust Guide and MCP Security Scanner but does not provide direct links to these resources.
    • Actionable Recommendation: Include clickable hyperlinks to the mentioned resources for easier navigation.

Final Thoughts

This blog post is a valuable addition to the repository, offering actionable insights into MCP security. Addressing the flagged critical issues will significantly enhance its accuracy and effectiveness. Additionally, implementing the suggestions will improve usability and readability for the target audience.

@imran-siddique imran-siddique enabled auto-merge (squash) April 11, 2026 21:48
@aymenhmaidiwastaken
Copy link
Copy Markdown
Author

@imran-siddique CI fixes pushed in 51f7ca9:

  • Spell check — fixed (added code terms and usernames to .cspell-repo-terms.txt)
  • Blog post links — fixed (relative paths had wrong directory depth, causing packages/packages/... and a missing file resolution)

Remaining failing checks are on the maintainer side:

  • test-policies (required) — waiting for status to be reported
  • Maintainer approval gate — needs fork workflow approval in the Actions tab
  • link-check (not required) — pre-existing broken SO link in COMMUNITY.md (stackoverflow.com/questions/tagged/agent-governance-toolkit returns 404). I added an exclusion to .lychee.toml but the markdown-link-check workflow doesn't seem to pick up the config. Happy to fix differently if you prefer.

Auto-merge is enabled and your approval is in — should be ready to go once the above are handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-mesh agent-mesh package documentation Improvements or additions to documentation size/L Large PR (< 500 lines) size/M Medium PR (< 200 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

📝 Blog Post: MCP Security — Why Your AI Agent's Tool Calls Need a Firewall

2 participants