# lemonade
**Repository Path**: lizeyujack/lemonade
## Basic Information
- **Project Name**: lemonade
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-02-05
- **Last Updated**: 2026-02-05
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
## 🍋 Lemonade: Local LLMs with GPU and NPU acceleration
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs.
Apps like [n8n](https://n8n.io/integrations/lemonade-model/), [VS Code Copilot](https://marketplace.visualstudio.com/items?itemName=lemonade-sdk.lemonade-sdk), [Morphik](https://www.morphik.ai/docs/local-inference#lemonade), and many more use Lemonade to seamlessly run LLMs on any PC.
## Getting Started
1. **Install**: [Windows](https://lemonade-server.ai/install_options.html#windows) · [Linux](https://lemonade-server.ai/install_options.html#linux) · [Docker](https://lemonade-server.ai/install_options.html#docker) · [Source](https://lemonade-server.ai/install_options.html)
2. **Get Models**: Browse and download with the [Model Manager](#model-library)
3. **Chat**: Try models with the built-in chat interface
4. **Mobile**: Take your lemonade to go: [iOS](https://apps.apple.com/us/app/lemonade-mobile/id6757372210) · Android (soon) · [Source](https://github.com/lemonade-sdk/lemonade-mobile)
5. **Connect**: Use Lemonade with your favorite apps:
Want your app featured here? Discord · GitHub Issue · Email
## Using the CLI
To run and chat with Gemma 3:
```
lemonade-server run Gemma-3-4b-it-GGUF
```
To install models ahead of time, use the `pull` command:
```
lemonade-server pull Gemma-3-4b-it-GGUF
```
To check all models available, use the `list` command:
```
lemonade-server list
```
> **Tip**: You can use `--llamacpp vulkan/rocm` to select a backend when running GGUF models.
## Model Library
Lemonade supports **GGUF**, **FLM**, and **ONNX** models across CPU, GPU, and NPU (see [supported configurations](#supported-configurations)).
Use `lemonade-server pull` or the built-in **Model Manager** to download models. You can also import custom GGUF/ONNX models from Hugging Face.
**[Browse all built-in models →](https://lemonade-server.ai/docs/server/server_models/)**
## Image Generation
Lemonade supports image generation using Stable Diffusion models via [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp).
```bash
# Pull an image generation model
lemonade-server pull SD-Turbo
# Start the server
lemonade-server serve
```
Available models: **SD-Turbo** (fast, 4-step), **SDXL-Turbo**, **SD-1.5**, **SDXL-Base-1.0**
> See `examples/api_image_generation.py` for complete examples.
## Supported Configurations
Lemonade supports the following configurations, while also making it easy to switch between them at runtime.
| Hardware | Engine: OGA | Engine: llamacpp | Engine: FLM | Windows | Linux |
|----------|-------------|------------------|------------|---------|-------|
| **🧠 CPU** | All platforms | All platforms | - | ✅ | ✅ |
| **🎮 GPU** | — | Vulkan: All platforms
ROCm: Selected AMD platforms*
Metal: Apple Silicon | — | ✅ | ✅ |
| **🤖 NPU** | AMD Ryzen™ AI 300 series | — | Ryzen™ AI 300 series | ✅ | — |
* See supported AMD ROCm platforms
| Architecture |
Platform Support |
GPU Models |
| gfx1151 (STX Halo) |
Windows, Ubuntu |
Ryzen AI MAX+ Pro 395 |
| gfx120X (RDNA4) |
Windows, Ubuntu |
Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT |
| gfx110X (RDNA3) |
Windows, Ubuntu |
Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT |
## Project Roadmap
| Under Development | Under Consideration | Recently Completed |
|---------------------------------------------------|------------------------------------------------|------------------------------------------|
| macOS | vLLM support | Image generation (stable-diffusion.cpp) |
| Apps marketplace | Text to speech | General speech-to-text support (whisper.cpp) |
| lemonade-eval CLI | MLX support | ROCm support for Ryzen AI 360-375 (Strix) APUs |
| | ryzenai-server dedicated repo | Lemonade desktop app |
| | Enhanced custom model support | |
## Integrate Lemonade Server with Your Application
You can use any OpenAI-compatible client library by configuring it to use `http://localhost:8000/api/v1` as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
| Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
|--------|-----|------|----|---------|----|-------|------|-----|
| [openai-python](https://github.com/openai/openai-python) | [openai-cpp](https://github.com/olrea/openai-cpp) | [openai-java](https://github.com/openai/openai-java) | [openai-dotnet](https://github.com/openai/openai-dotnet) | [openai-node](https://github.com/openai/openai-node) | [go-openai](https://github.com/sashabaranov/go-openai) | [ruby-openai](https://github.com/alexrudall/ruby-openai) | [async-openai](https://github.com/64bit/async-openai) | [openai-php](https://github.com/openai-php/client) |
### Python Client Example
```python
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:8000/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)
```
For more detailed integration instructions, see the [Integration Guide](./docs/server/server_integration.md).
## FAQ
To read our frequently asked questions, see our [FAQ Guide](./docs/faq.md)
## Contributing
We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our [contribution guide](./docs/contribute.md).
New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.
## Maintainers
This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @Geramy @ramkrishna2910 @siavashhub @sofiageo @vgodsoe, and sponsored by AMD. You can reach us by filing an [issue](https://github.com/lemonade-sdk/lemonade/issues), emailing [lemonade@amd.com](mailto:lemonade@amd.com), or joining our [Discord](https://discord.gg/5xXzkMu8Zk).
## License and Attribution
This project is:
- Built with C++ (server) and Python (SDK) with ❤️ for the open source community,
- Standing on the shoulders of great tools from:
- [ggml/llama.cpp](https://github.com/ggml-org/llama.cpp)
- [OnnxRuntime GenAI](https://github.com/microsoft/onnxruntime-genai)
- [Hugging Face Hub](https://github.com/huggingface/huggingface_hub)
- [OpenAI API](https://github.com/openai/openai-python)
- [IRON/MLIR-AIE](https://github.com/Xilinx/mlir-aie)
- and more...
- Accelerated by mentorship from the OCV Catalyst program.
- Licensed under the [Apache 2.0 License](https://github.com/lemonade-sdk/lemonade/blob/main/LICENSE).
- Portions of the project are licensed as described in [NOTICE.md](./NOTICE.md).