Twoody Private LLM

Choose, install and control your open-source models.

Private LLM lets Twoody Server route requests to a model you control: MLX on Mac, Ollama, llama.cpp, vLLM, TGI or an explicitly configured cloud provider.

MLX on Mac

Twoody Mac installs the MLX stack, downloads weights and exposes a compatible local server.

Replaceable providers

OpenAI-compatible providers let you switch runtimes without rewriting the product experience.

Model per use case

Fast, coding, reasoning, long documents: the right model depends on the task.

工作方式

01

Detect

Twoody knows connected machines and their capabilities.

02

Install

The user starts model download from the app.

03

Select

The model becomes the active provider for the chosen mode.

04

Observe

RAM, latency and tok/s show whether the machine keeps up.

重要细节

Machine guide

  • 24-32 GB RAM: trials, solo use, 3B-8B models.
  • 48-64 GB RAM: comfortable use, Qwen 8B/14B, Qwen Coder 14B.
  • 128 GB+: team use, Qwen 32B, headroom for context and documents.

Benchmark wording

  • Tok/s are rough figures measured through MLX.
  • Speed depends on machine, context, quantization and load.
  • Quality depends on model, prompt, tools and documents.

FAQ

Is it only local?

No. Private LLM foregrounds local mode, but Twoody Server can also route to an explicitly configured cloud provider.

Who chooses the model?

The user or admin depending on context. The website should show remote install and selection.