ElevenLabs Scribe MCP Server
A Model Control Protocol (MCP) server implementation for ElevenLabs' Scribe speech-to-text API, providing real-time transcription capabilities with advanced context management and bidirectional streaming.
Features
- Real-time Transcription : Stream audio directly from your microphone and get instant transcriptions
- File-based Transcription : Upload audio files for batch processing
- MCP Protocol Support : Full implementation of the Model Control Protocol for better context management
- WebSocket Support : Real-time bidirectional communication
- Context Management : Maintain conversation context for improved transcription accuracy
- Multiple Audio Formats : Support for various audio formats with automatic conversion
- Language Detection : Automatic language detection and confidence scoring
- Event Detection : Identify speech and non-speech audio events
Installation
- Clone the repository:
git clone https://github.com/aromanstatue/MCP-Elevenlab-Scribe-ASR.git
cd MCP-Elevenlab-Scribe-ASR
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -e .
- Create a
.env
file with your ElevenLabs API key:
ELEVENLABS_API_KEY=your-api-key-here
Usage
Starting the Server
python -m elevenlabs_scribe_mcp_server.main
The server will start on port 8000 by default (or the next available port).
Using the Example Client
- File Transcription:
python examples/client_example.py --file path/to/audio.wav
- Microphone Transcription:
python examples/client_example.py --mic
API Endpoints
- REST API:
POST /transcribe
: Upload an audio file for transcription
GET /health
: Health check endpoint
- WebSocket API:
ws://localhost:8000/ws/transcribe
: Real-time audio transcription
MCP Protocol
The server implements the Model Control Protocol (MCP) with the following message types:
INIT
: Initialize a new transcription session
START
: Begin audio streaming
AUDIO
: Send audio data
TRANSCRIPTION
: Receive transcription results
ERROR
: Error messages
STOP
: End audio streaming
DONE
: Complete session
Development
Running Tests
pytest tests/
Project Structure
elevenlabs-scribe-mcp-server/
├── elevenlabs_scribe_mcp_server/
│ ├── __init__.py
│ ├── main.py # FastAPI server
│ └── mcp/
│ ├── __init__.py
│ ├── protocol.py # MCP protocol handler
│ ├── types.py # Protocol types
│ └── elevenlabs.py # ElevenLabs implementation
├── examples/
│ └── client_example.py # Example client
├── tests/
│ └── test_transcribe.py # Test suite
├── pyproject.toml # Project metadata
└── README.md
Requirements
- Python 3.8+
- FastAPI
- Uvicorn
- PyAudio (for microphone support)
- aiohttp
- python-dotenv
- pydantic
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
)
- Commit your changes (
git commit -m 'Add amazing feature'
)
- Push to the branch (
git push origin feature/amazing-feature
)
- Open a Pull Request
License
MIT License - see LICENSE file for details.
Acknowledgments
- ElevenLabs for their excellent Scribe API
- FastAPI for the modern web framework
- The Python community for the amazing tools and libraries