You're viewing documentation for the legacy environment (Kubernetes API), which will be permanently shut down on April 13. For the new multi-zone environment, see our current documentation →

Think Models

IMPORTANT NOTE: Think Models is currently in public beta and may contain bugs, incomplete features, or undergo significant changes based on user feedback before the final release.

Model

The specification of the model to run. In general, a HuggingFace handle, such as mistralai/Ministral-8B-Instruct-2410

Modelinstance

A dedicated instance of a model, created by a user. This will be associated with a name, and user-specified options (e.g. size, arguments, ...)

Modelinstance size

The set of resources allocated for a specific instance, for instance 1-b200-27c-240g, with the most important of it being the GPU used (assigned in a static and isolated way, e.g. each GPU resource could be used by at most one model instance).

Shared Models

Shared Models are managed by evroc. With Shared Models you're up and running in no time with reliable and performant inference. All the Shared Models have OpenAI API compatible endpoints.