The easiest, smallest and fastest local LLM runtime and API server.

bash <(curl -sSfL 'https://code.flows.network/webhook/iwYN1SdN3AmPgR5ao5Gt/run-llm.sh')
Get Started

Powered by WasmEdge and Rust.

Lightweight

Total dependency of LlamaEdge is 30MB vs 5GB for Python

Very fast

Automagically use the device’s local hardware and software acceleration.

Cross-platform LLM agents and web services in Rust or JavaScript

Write once run anywhere, for GPUs

Create an LLM web service on a MacBook, deploy it on a Nvidia device.

Native to the heterogeneous edge

Orchestrate and move an LLM app across CPUs, GPUs and NPUs.

2~4MB
Inference app
30MB
Total dependency
1000+
Llama2 series of models
100%
Native spped

FAQs

Learn more about LlamaEdge

Q: Why can’t I just use the OpenAI API?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.
Q: Why can’t I just start an OpenAI-compatible API server myself?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.
Q: Why can’t I use Python to run the LLM?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.
Q: Why can’t I just use native (C/C++ compiled) inference engines?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.