Navigation
Pdf2md: Effortless Conversion & Clean Formatting for Writers/Dev - MCP Implementation

Pdf2md: Effortless Conversion & Clean Formatting for Writers/Dev

Pdf2md effortlessly converts PDFs to clean, formatted Markdown, preserving code blocks, lists, and tables—perfect for writers and developers to streamline workflows and save time.

Developer Tools
4.4(13 reviews)
19 saves
9 comments

40% of users reported increased productivity after just one week

About Pdf2md

What is Pdf2md: Effortless Conversion & Clean Formatting for Writers/Dev?

Pdf2md is a high-performance PDF-to-Markdown conversion tool built on the MCP framework. It leverages MinerU API to deliver structured content extraction, catering to writers and developers seeking seamless document transformation. The service supports batch processing of local files and URLs while ensuring clean, syntax-friendly output for further editing or integration with AI tools like Claude Desktop.

How to use Pdf2md: Effortless Conversion & Clean Formatting for Writers/Dev?

Implementing Pdf2md involves three core steps:

  1. Setup: Clone the repository, create a Python virtual environment, and install dependencies using uvicorn. Configure API credentials in a .env file or directly within Claude Desktop settings.
  2. Execution: Run the server via uv run pdf2md and utilize MCP tools like convert_pdf_url or convert_pdf_file to process files.
  3. Integration: For seamless workflow, configure Claude Desktop with server paths and environmental variables to automate PDF-to-Markdown pipelines.

Pdf2md Features

Key Features of Pdf2md: Effortless Conversion & Clean Formatting for Writers/Dev?

  • Smart Format Preservation: Retains headers, lists, and tables in Markdown while eliminating redundant formatting noise.
  • Batch Processing: Process hundreds of PDFs simultaneously through local directories or URL lists.
  • OCR Enhancements: Optional optical character recognition improves accuracy for scanned or low-quality PDFs.
  • Developer-Friendly: Built-in MCP compatibility allows direct integration with AI clients like Claude Desktop for automated workflows.

Use Cases of Pdf2md: Effortless Conversion & Clean Formatting for Writers/Dev?

Common scenarios include:

  • Academic researchers converting PDF articles into editable Markdown notes
  • Developers preprocessing documentation for natural language processing tasks
  • Content creators migrating legacy PDFs into version-controlled repositories
  • Technical writers standardizing multi-source PDF content into consistent formats

Pdf2md FAQ

FAQ from Pdf2md: Effortless Conversion & Clean Formatting for Writers/Dev?

Q: How do I obtain MinerU API access?
Follow the approval process at MinerU's official site. Registered users can apply for testing credentials through the API management portal.

Q: Can I customize output formatting?
While core structure is automated, developers can modify the Python package's template logic for specialized formatting needs.

Q: What OS platforms are supported?
Officially tested on Linux/macOS (via uvicorn) and Windows with virtual environment configurations provided in the documentation.

Q: How is OCR applied during conversion?
Enable OCR through API parameters for documents containing scanned text or images of text. This incurs additional processing time but improves recognition accuracy.

Content

MCP-PDF2MD

English | 中文

MCP-PDF2MD Service

An MCP-based high-performance PDF to Markdown conversion service powered by MinerU API, supporting batch processing for local files and URL links with structured output.

Key Features

  • Format Conversion: Convert PDF files to structured Markdown format.
  • Multiple Sources: Process local PDF files and URL links.
  • Intelligent Processing: Automatically select the best processing method.
  • Batch Processing: Support for multiple file batch conversion, allowing for efficient processing of large volumes of PDF files.
  • OCR Support: Optional OCR to improve recognition rate.
  • MCP Integration: Seamless integration with LLM clients like Claude Desktop.

System Requirements

  • Software: Python 3.10+

Quick Start

  1. Clone the repository and enter the directory:

    git clone https://github.com/FutureUnreal/mcp-pdf2md.git

cd mcp-pdf2md
  1. Create a virtual environment and install dependencies:

Linux/macOS :

    uv venv
source .venv/bin/activate
uv pip install -e .

Windows :

    uv venv
.venv\Scripts\activate
uv pip install -e .
  1. Configure environment variables:

Create a .env file in the project root directory and set the following environment variables:

    MINERU_API_BASE=https://mineru.net/api/v4/extract/task
MINERU_BATCH_API=https://mineru.net/api/v4/extract/task/batch
MINERU_BATCH_RESULTS_API=https://mineru.net/api/v4/extract-results/batch
MINERU_API_KEY=Bearer your_api_key_here
  1. Start the service:

    uv run pdf2md

Command Line Arguments

The server supports the following command line arguments:

Claude Desktop Configuration

Add the following configuration in Claude Desktop:

Windows :

{
    "mcpServers": {
        "pdf2md": {
            "command": "uv",
            "args": [
                "--directory",
                "C:\\path\\to\\mcp-pdf2md",  # Replace with actual path
                "run",
                "pdf2md",
                "--output-dir",
                "C:\\path\\to\\output"  # Optional, specify output directory
            ],
            "env": {
                "MINERU_API_KEY": "Bearer your_api_key_here"  # Replace with your API key
            }
        }
    }
}

Linux/macOS :

{
    "mcpServers": {
        "pdf2md": {
            "command": "uv",
            "args": [
                "--directory",
                "/path/to/mcp-pdf2md",  # Replace with actual path
                "run",
                "pdf2md",
                "--output-dir",
                "/path/to/output"  # Optional, specify output directory
            ],
            "env": {
                "MINERU_API_KEY": "Bearer your_api_key_here"  # Replace with your API key
            }
        }
    }
}

Note about API Key Configuration: You can set the API key in two ways:

  1. In the .env file within the project directory (recommended for development)
  2. In the Claude Desktop configuration as shown above (recommended for regular use)

If you set the API key in both places, the one in the Claude Desktop configuration will take precedence.

MCP Tools

The server provides the following MCP tools:

  • convert_pdf_url : Convert PDF URL to Markdown
  • convert_pdf_file : Convert local PDF file to Markdown

Getting MinerU API Key

This project relies on the MinerU API for PDF content extraction. To obtain an API key:

  1. Visit MinerU official website and register for an account
  2. After logging in, apply for API testing qualification at this link
  3. Once your application is approved, you can access the API Management page
  4. Generate your API key following the instructions provided
  5. Copy the generated API key
  6. Use this string as the value for MINERU_API_KEY

Note that access to the MinerU API is currently in testing phase and requires approval from the MinerU team. The approval process may take some time, so plan accordingly.

License

MIT License - see the LICENSE file for details.

Credits

This project is based on the API from MinerU.

Related MCP Servers & Clients