Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment

Fluent MCP: Effortlessly build and deploy MCP servers with embedded LLM reasoning, streamlining intelligent workflows for developers and data scientists.

Visit Repository

✨ Developer Tools

4.9(165 reviews)

247 saves

115 comments

40% of users reported increased productivity after just one week

About Fluent MCP

What is Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

Fluent MCP is a modern framework designed to simplify the creation and management of Model Context Protocol (MCP) servers with built-in AI capabilities. It enables developers to build systems where language models (LLMs) collaborate through a two-tier architecture: an embedded LLM handles complex reasoning internally, while a consuming LLM interacts externally. This separation ensures secure tool isolation, efficient token usage, and scalable reasoning workflows.

At its core, Fluent MCP introduces a "reasoning sandwich" pattern: external tools expose simple interfaces, while embedded tools and LLMs handle intricate logic behind the scenes. This architecture reduces costs, streamlines complexity, and allows for self-improving AI systems by offloading heavy computation to specialized models.

How to Use Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

Getting started with Fluent MCP involves three key steps:

Installation: Install via pip install fluent_mcp or clone the repository for development.
Server Setup: Use the CLI or Python API to scaffold a new server structure, defining embedded and external tools.
Integration: Register tools, configure LLM providers, and run the server to expose capabilities via the MCP protocol.

For example, defining an external tool that leverages embedded reasoning requires:

Declaring embedded tools (e.g., database search) that remain internal.
Creating external tools that call run_embedded_reasoning() with system prompts.
Configuring prompts with tool definitions via frontmatter for precise control.

Fluent MCP Features

Key Features of Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

Reasoning Offloading

Transfer complex multi-step tasks from consuming LLMs (e.g., Claude) to cheaper, specialized embedded models.

Tool Separation

Isolate internal tools from external access, reducing security risks while maintaining robust functionality.

Embedded Reasoning

Execute tool-driven workflows internally, exposing only clean, structured outputs to external systems.

Prompt Management

Load prompts with tool definitions from files, ensuring consistency and security across workflows.

Use Cases of Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

Fluent MCP excels in scenarios where cost efficiency and complexity management are critical:

Cost-Optimized AI: Deploy low-cost embedded models for heavy lifting while using premium LLMs for user-facing tasks.
Secure Workflows: Keep sensitive operations (e.g., data analysis) internal while exposing safe, simplified interfaces.
Self-Improving Systems: Enable LLMs to iteratively refine their own tool use through embedded reasoning.
Multi-Model Collaboration: Integrate tools from various providers (e.g., Ollama, OpenAI) into a unified workflow.

Explore example implementations

Fluent MCP FAQ

FAQ from Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

How do I start developing with Fluent MCP?

Begin by scaffolding a server with fluent-mcp new, then reference the Getting Started guide.

Can I use multiple LLM providers?

Yes. Configure providers like Ollama or OpenAI in server settings, and register tools to work across models.

How is error handling managed?

Fluent MCP includes robust error capture for LLM responses and tool executions, with logging and retries configurable via the API.

Is self-updating supported?

Yes. The architecture allows LLMs to propose and register new tools autonomously, as detailed in Self-Improving Systems docs.

Content

Fluent MCP

A modern framework for building Model Context Protocol (MCP) servers with intelligent reasoning capabilities.

Overview

Fluent MCP is a toolkit for scaffolding and managing MCP servers with a focus on AI integration. It provides a structured approach to building servers that can perform embedded reasoning with language models, register and execute tools, and manage prompts and configurations.

The framework is designed to be extensible, allowing LLMs to build and register their own tools, supporting the development of self-improving AI systems.

Core Architecture Pattern

Fluent MCP implements a powerful architectural pattern that fundamentally changes how AI systems interact:

Two-Tier LLM Architecture

Embedded LLM : An internal LLM that performs complex reasoning and multi-step tasks
Consuming LLM : The external LLM (like Claude) that interacts with your MCP server

Tool Separation

Embedded Tools : Internal tools that are ONLY available to the embedded LLM, not exposed externally
External Tools : Tools that are exposed to consuming LLMs through the MCP protocol

Reasoning Offloading

Complex multi-step reasoning is offloaded from the consuming LLM to the embedded LLM
External tools can leverage embedded reasoning internally while presenting a simple interface
This creates a "reasoning sandwich" where complex logic happens in the middle layer

Benefits

Token Efficiency : Consuming LLMs use fewer tokens by offloading reasoning to embedded LLMs
Cost Reduction : Smaller, specialized models can handle specific reasoning tasks at lower cost
Complexity Hiding : Complex multi-step processes are hidden behind simple interfaces
Separation of Concerns : Clear boundaries between what's exposed and what's internal

Fluent MCP Architecture

Features

Reasoning Offloading : Offload complex reasoning from consuming LLMs to embedded LLMs for token and cost efficiency
Tool Separation : Clear distinction between embedded tools (internal) and external tools (exposed)
Server Scaffolding : Generate new MCP server projects with the proper structure
LLM Integration : Seamlessly connect to language models from different providers
Tool Registry : Register both embedded tools (used internally) and external tools (exposed to consuming LLMs)
Embedded Reasoning : Run reasoning with LLMs and execute their tool calls
Prompt Management : Load and manage prompts from files with support for tool definitions in frontmatter
Error Handling : Robust error handling for LLM integration and tool execution

Installation

pip install fluent_mcp

For development:

# Clone the repository
git clone https://github.com/yourusername/fluent_mcp.git
cd fluent_mcp

# Install in development mode with extra dependencies
pip install -e ".[dev]"

Quick Start

Creating a New Server

Use the CLI to scaffold a new server:

fluent-mcp new my_server

Or create a server programmatically:

from fluent_mcp import scaffold_server

scaffold_server(
    output_dir=".",
    server_name="my_server",
    description="My MCP server with AI capabilities"
)

Implementing the Core Architecture Pattern

from fluent_mcp.core.tool_registry import register_embedded_tool, register_external_tool
from fluent_mcp.core.llm_client import run_embedded_reasoning
import asyncio

# 1. Define embedded tools (ONLY available to the embedded LLM)
@register_embedded_tool()
def search_database(query: str) -> list:
    """Search the database for information (only used internally)."""
    # Implementation...
    return ["result1", "result2"]

@register_embedded_tool()
def analyze_data(data: list) -> dict:
    """Analyze data and extract insights (only used internally)."""
    # Implementation...
    return {"key_insight": "finding", "confidence": 0.95}

# 2. Define an external tool that leverages embedded reasoning
@register_external_tool()
async def research_question(question: str) -> dict:
    """
    Research a question and provide a comprehensive answer.
    
    This external tool is exposed to consuming LLMs but internally
    uses embedded reasoning with access to embedded tools.
    """
    # Define system prompt for embedded reasoning
    system_prompt = """
    You are a research assistant with access to internal tools:
    - search_database: Search for information
    - analyze_data: Analyze and extract insights
    
    Use these tools to thoroughly research the question.
    """
    
    # Run embedded reasoning (this is where the magic happens)
    result = await run_embedded_reasoning(
        system_prompt=system_prompt,
        user_prompt=f"Research this question: {question}"
    )
    
    # Return a clean, structured response to the consuming LLM
    return {
        "answer": result["content"],
        "confidence": 0.9,
        "sources": ["source1", "source2"]
    }

Running a Server with the Architecture Pattern

from fluent_mcp import create_mcp_server
from my_tools import search_database, analyze_data, research_question

# Create and run MCP server
server = create_mcp_server(
    server_name="my_server",
    # Embedded tools (ONLY available to the embedded LLM)
    embedded_tools=[search_database, analyze_data],
    # External tools (exposed to consuming LLMs)
    external_tools=[research_question],
    config={
        "provider": "ollama",
        "model": "llama2",
        "base_url": "http://localhost:11434",
        "api_key": "ollama"
    }
)

server.run()

Using Prompts with Tool Definitions

Fluent MCP supports defining which tools are available to a prompt directly in the prompt's frontmatter:

---
name: math_tools
description: A prompt that uses math-related tools
model: gpt-4
temperature: 0.3
tools:
  - add_numbers
  - multiply_numbers
---
You are a math assistant that can perform calculations.
Use the available tools to help solve math problems.

When using this prompt with embedded reasoning, only the specified tools will be available:

from fluent_mcp.core.llm_client import run_embedded_reasoning

# Get a prompt with tool definitions
math_prompt = server.get_prompt("math_tools")

# Run embedded reasoning with only the tools defined in the prompt
result = await run_embedded_reasoning(
    system_prompt=math_prompt["template"],
    user_prompt="What is 5 + 3?",
    prompt=math_prompt  # Pass the prompt to use its tool definitions
)

This approach allows for more precise control over which tools are available to different prompts, improving security and reducing the chance of unintended tool usage.

Documentation

For more detailed documentation, see the docs directory:

Examples

Check out the examples directory for complete working examples:

Development

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
flake8
black .
isort .

License

MIT

Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment

About Fluent MCP

What is Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

How to Use Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

Fluent MCP Features

Key Features of Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

Reasoning Offloading

Tool Separation

Embedded Reasoning

Prompt Management

Use Cases of Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

Fluent MCP FAQ

FAQ from Fluent MCP: LLM-Embedded Workflows & Effortless Server Deployment?

How do I start developing with Fluent MCP?

Can I use multiple LLM providers?

How is error handling managed?

Is self-updating supported?

Content

Fluent MCP

Overview

Core Architecture Pattern

Two-Tier LLM Architecture

Tool Separation

Reasoning Offloading

Benefits

Features

Installation

Quick Start

Creating a New Server

Implementing the Core Architecture Pattern

Running a Server with the Architecture Pattern

Using Prompts with Tool Definitions

Documentation

Examples

Development

License

Related MCP Servers & Clients

Quick Navigation

MCP Categories