# lemonade **Repository Path**: lizeyujack/lemonade ## Basic Information - **Project Name**: lemonade - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-05 - **Last Updated**: 2026-02-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## 🍋 Lemonade: Local LLMs with GPU and NPU acceleration

Lemonade Banner

Download | Documentation | Discord

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Apps like [n8n](https://n8n.io/integrations/lemonade-model/), [VS Code Copilot](https://marketplace.visualstudio.com/items?itemName=lemonade-sdk.lemonade-sdk), [Morphik](https://www.morphik.ai/docs/local-inference#lemonade), and many more use Lemonade to seamlessly run LLMs on any PC. ## Getting Started 1. **Install**: [Windows](https://lemonade-server.ai/install_options.html#windows) · [Linux](https://lemonade-server.ai/install_options.html#linux) · [Docker](https://lemonade-server.ai/install_options.html#docker) · [Source](https://lemonade-server.ai/install_options.html) 2. **Get Models**: Browse and download with the [Model Manager](#model-library) 3. **Chat**: Try models with the built-in chat interface 4. **Mobile**: Take your lemonade to go: [iOS](https://apps.apple.com/us/app/lemonade-mobile/id6757372210) · Android (soon) · [Source](https://github.com/lemonade-sdk/lemonade-mobile) 5. **Connect**: Use Lemonade with your favorite apps:

Want your app featured here? Discord · GitHub Issue · Email

## Using the CLI To run and chat with Gemma 3: ``` lemonade-server run Gemma-3-4b-it-GGUF ``` To install models ahead of time, use the `pull` command: ``` lemonade-server pull Gemma-3-4b-it-GGUF ``` To check all models available, use the `list` command: ``` lemonade-server list ``` > **Tip**: You can use `--llamacpp vulkan/rocm` to select a backend when running GGUF models. ## Model Library

Lemonade supports **GGUF**, **FLM**, and **ONNX** models across CPU, GPU, and NPU (see [supported configurations](#supported-configurations)). Use `lemonade-server pull` or the built-in **Model Manager** to download models. You can also import custom GGUF/ONNX models from Hugging Face. **[Browse all built-in models →](https://lemonade-server.ai/docs/server/server_models/)**
## Image Generation Lemonade supports image generation using Stable Diffusion models via [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp). ```bash # Pull an image generation model lemonade-server pull SD-Turbo # Start the server lemonade-server serve ``` Available models: **SD-Turbo** (fast, 4-step), **SDXL-Turbo**, **SD-1.5**, **SDXL-Base-1.0** > See `examples/api_image_generation.py` for complete examples. ## Supported Configurations Lemonade supports the following configurations, while also making it easy to switch between them at runtime. | Hardware | Engine: OGA | Engine: llamacpp | Engine: FLM | Windows | Linux | |----------|-------------|------------------|------------|---------|-------| | **🧠 CPU** | All platforms | All platforms | - | ✅ | ✅ | | **🎮 GPU** | — | Vulkan: All platforms
ROCm: Selected AMD platforms*
Metal: Apple Silicon | — | ✅ | ✅ | | **🤖 NPU** | AMD Ryzen™ AI 300 series | — | Ryzen™ AI 300 series | ✅ | — |

* See supported AMD ROCm platforms

Architecture	Platform Support	GPU Models
gfx1151 (STX Halo)	Windows, Ubuntu	Ryzen AI MAX+ Pro 395
gfx120X (RDNA4)	Windows, Ubuntu	Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT
gfx110X (RDNA3)	Windows, Ubuntu	Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

## Project Roadmap | Under Development | Under Consideration | Recently Completed | |---------------------------------------------------|------------------------------------------------|------------------------------------------| | macOS | vLLM support | Image generation (stable-diffusion.cpp) | | Apps marketplace | Text to speech | General speech-to-text support (whisper.cpp) | | lemonade-eval CLI | MLX support | ROCm support for Ryzen AI 360-375 (Strix) APUs | | | ryzenai-server dedicated repo | Lemonade desktop app | | | Enhanced custom model support | | ## Integrate Lemonade Server with Your Application You can use any OpenAI-compatible client library by configuring it to use `http://localhost:8000/api/v1` as the base URL. A table containing official and popular OpenAI clients on different languages is shown below. Feel free to pick and choose your preferred language. | Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP | |--------|-----|------|----|---------|----|-------|------|-----| | [openai-python](https://github.com/openai/openai-python) | [openai-cpp](https://github.com/olrea/openai-cpp) | [openai-java](https://github.com/openai/openai-java) | [openai-dotnet](https://github.com/openai/openai-dotnet) | [openai-node](https://github.com/openai/openai-node) | [go-openai](https://github.com/sashabaranov/go-openai) | [ruby-openai](https://github.com/alexrudall/ruby-openai) | [async-openai](https://github.com/64bit/async-openai) | [openai-php](https://github.com/openai-php/client) | ### Python Client Example ```python from openai import OpenAI # Initialize the client to use Lemonade Server client = OpenAI( base_url="http://localhost:8000/api/v1", api_key="lemonade" # required but unused ) # Create a chat completion completion = client.chat.completions.create( model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model messages=[ {"role": "user", "content": "What is the capital of France?"} ] ) # Print the response print(completion.choices[0].message.content) ``` For more detailed integration instructions, see the [Integration Guide](./docs/server/server_integration.md). ## FAQ To read our frequently asked questions, see our [FAQ Guide](./docs/faq.md) ## Contributing We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our [contribution guide](./docs/contribute.md). New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.

## Maintainers This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @Geramy @ramkrishna2910 @siavashhub @sofiageo @vgodsoe, and sponsored by AMD. You can reach us by filing an [issue](https://github.com/lemonade-sdk/lemonade/issues), emailing [lemonade@amd.com](mailto:lemonade@amd.com), or joining our [Discord](https://discord.gg/5xXzkMu8Zk). ## License and Attribution This project is: - Built with C++ (server) and Python (SDK) with ❤️ for the open source community, - Standing on the shoulders of great tools from: - [ggml/llama.cpp](https://github.com/ggml-org/llama.cpp) - [OnnxRuntime GenAI](https://github.com/microsoft/onnxruntime-genai) - [Hugging Face Hub](https://github.com/huggingface/huggingface_hub) - [OpenAI API](https://github.com/openai/openai-python) - [IRON/MLIR-AIE](https://github.com/Xilinx/mlir-aie) - and more... - Accelerated by mentorship from the OCV Catalyst program. - Licensed under the [Apache 2.0 License](https://github.com/lemonade-sdk/lemonade/blob/main/LICENSE). - Portions of the project are licensed as described in [NOTICE.md](./NOTICE.md).