A high-performance LLM inference server with OpenAI-compatible API.
# Install dependencies
pip install -r requirements.txt
# Copy environment file
cp .env.example .env
# Edit .env with your settings
# Start the server
python -m uvicorn src.main:app --host 0.0.0.0 --port 8000
# Or use the start script
./scripts/start_server.sh
GET /health - Health checkPOST /api/v1/chat/completions - Chat completionsPOST /api/v1/completions - Text completionsPOST /api/v1/embeddings - Create embeddingsGET /api/v1/models - List models# Build and run with Docker
cd docker
docker-compose up -d
See .env.example for all configuration options.