TGI runtime

Text Generation Inference pour servir des LLM dans une stack privée. Text Generation Inference for serving LLMs in a private stack.

TGI, Text Generation Inference, sert des modèles dans l'écosystème Hugging Face. Twoody peut l'utiliser comme provider tout en gardant apps, documents et permissions dans Twoody Server. TGI, Text Generation Inference, serves models in the Hugging Face ecosystem. Twoody can use it as a provider while keeping apps, documents and permissions in Twoody Server.

Voir Private LLM See Private LLM Voir Twoody Server See Twoody Server

TGI comme serveur d'inférence TGI as inference server

TGI sert le modèle. Twoody Server garde le routage, les documents et l'expérience. TGI serves the model. Twoody Server keeps routing, documents and experience.

Modèle HF HF model

Modèle compatible. Compatible model.

TGI TGI

Serving inference. Inference serving.

Twoody Server Twoody Server

Provider et contexte. Provider and context.

Expérience Twoody Twoody experience

Apps, voix, documents. Apps, voice, documents.

Ce que ça fait What it does

Serving dédié Dedicated serving

TGI est une option pour déployer des modèles de génération de texte. TGI is an option for deploying text-generation models.

Infrastructure contrôlée Controlled infrastructure

Il peut s'intégrer dans un choix self-hosted ou hébergeur privé. It can fit a self-hosted or private-host choice.

Couche d'usage Usage layer

Twoody relie TGI aux apps, documents, outils et permissions. Twoody connects TGI to apps, documents, tools and permissions.

Comment ça marche How it works

Choisir Choose

Choisir un modèle compatible et l'infrastructure. Choose a compatible model and infrastructure.

Déployer Deploy

Lancer TGI comme endpoint d'inférence. Run TGI as an inference endpoint.

Connecter Connect

Déclarer TGI comme provider dans Twoody Server. Register TGI as a provider in Twoody Server.

Gouverner Govern

Garder permissions, documents et logs côté Twoody. Keep permissions, documents and logs on the Twoody side.

Détails techniques Technical details

Model IDs Model IDs

TGI s'inscrit bien quand les modèles, versions et artefacts Hugging Face doivent rester déclarés côté infra. TGI fits well when Hugging Face models, versions and artifacts need to stay declared on the infra side.

Endpoint privé Private endpoint

Twoody Server route vers TGI comme provider explicite et conserve auth, documents, outils et logs côté produit. Twoody Server routes to TGI as an explicit provider and keeps auth, documents, tools and logs on the product side.

Débit Throughput

Les métriques importantes sont concurrence, streaming, latence, tok/s, erreurs et saturation de l'infrastructure. Important metrics are concurrency, streaming, latency, tok/s, errors and infrastructure saturation.

FAQ

TGI est-il uniquement cloud ? Is TGI cloud-only?

Non. Il peut être déployé sur une infrastructure que vous contrôlez selon vos choix opérationnels. No. It can be deployed on infrastructure you control depending on operational choices.

Pourquoi Twoody au-dessus de TGI ? Why Twoody above TGI?

TGI sert des modèles. Twoody ajoute l'expérience utilisateur, les documents, le tunnel et les permissions. TGI serves models. Twoody adds user experience, documents, tunnel and permissions.

Sources officielles Official sources

Hugging Face TGI docs huggingface/text-generation-inference GitHub

Pages liees Related pages

vLLM runtime vLLM runtime Local LLM Server Local LLM Server Twoody Private LLM Twoody Private LLM Guide des runtimes Runtime guide Ollama runtime Ollama runtime MLX runtime MLX runtime llama.cpp runtime llama.cpp runtime