Kubernetes API¶

Packages¶

kubeai.org/v1

kubeai.org/v1¶

Package v1 contains API Schema definitions for the kubeai v1 API group

Resource Types¶

Model

Adapter¶

Appears in: - ModelSpec

Field	Description	Default	Validation
`name` string	Name must be a lowercase string with no spaces.		MaxLength: 63 Pattern: `^[a-z0-9-]+$` Required: {}
`url` string

File¶

File represents a file to be mounted in the model pod.

Appears in: - ModelSpec

Field	Description	Default	Validation
`path` string	Path where the file should be mounted in the pod. Must be an absolute path.		MaxLength: 1024 Required: {}
`content` string	Content of the file to be mounted. Will be injected into a ConfigMap and mounted in the model Pods.		MaxLength: 100000 Required: {}

LoadBalancing¶

Appears in: - ModelSpec

Field	Description	Default	Validation
`strategy` LoadBalancingStrategy		LeastLoad	Enum: [LeastLoad PrefixHash] Optional: {}
`prefixHash` PrefixHash		{ }	Optional: {}

LoadBalancingStrategy¶

Underlying type: string

Validation: - Enum: [LeastLoad PrefixHash]

Appears in: - LoadBalancing

Field	Description
`LeastLoad`
`PrefixHash`

Model¶

Model resources define the ML models that will be served by KubeAI.

Field	Description	Default	Validation
`apiVersion` string	`kubeai.org/v1`
`kind` string	`Model`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` ModelSpec
`status` ModelStatus

ModelFeature¶

Underlying type: string

Validation: - Enum: [TextGeneration TextEmbedding Reranking SpeechToText]

Appears in: - ModelSpec

ModelSpec¶

ModelSpec defines the desired state of Model.

Appears in: - Model

Field	Description	Default	Validation
`url` string	URL of the model to be served. Currently the following formats are supported: For VLLM, FasterWhisper, Infinity engines: "hf:///" "pvc://" "pvc:///" "gs:///" (only with cacheProfile) "oss:///" (only with cacheProfile) "s3:///" (only with cacheProfile) For OLlama engine: "ollama://"		Required: {}
`adapters` Adapter array
`features` ModelFeature array	Features that the model supports. Dictates the APIs that are available for the model.		Enum: [TextGeneration TextEmbedding Reranking SpeechToText]
`engine` string	Engine to be used for the server process.		Enum: [OLlama VLLM FasterWhisper Infinity] Required: {}
`resourceProfile` string	ResourceProfile required to serve the model. Use the format ":". Example: "nvidia-gpu-l4:2" - 2x NVIDIA L4 GPUs. Must be a valid ResourceProfile defined in the system config.
`cacheProfile` string	CacheProfile to be used for caching model artifacts. Must be a valid CacheProfile defined in the system config.
`image` string	Image to be used for the server process. Will be set from ResourceProfile + Engine if not specified.
`args` string array	Args to be added to the server process.
`env` object (keys:string, values:string)	Env variables to be added to the server process.
`envFrom` EnvFromSource array	Env variables to be added to the server process from Secret or ConfigMap.
`replicas` integer	Replicas is the number of Pod replicas that should be actively serving the model. KubeAI will manage this field unless AutoscalingDisabled is set to true.
`minReplicas` integer	MinReplicas is the minimum number of Pod replicas that the model can scale down to. Note: 0 is a valid value.		Minimum: 0 Optional: {}
`maxReplicas` integer	MaxReplicas is the maximum number of Pod replicas that the model can scale up to. Empty value means no limit.		Minimum: 1
`autoscalingDisabled` boolean	AutoscalingDisabled will stop the controller from managing the replicas for the Model. When disabled, metrics will not be collected on server Pods.
`targetRequests` integer	TargetRequests is average number of active requests that the autoscaler will try to maintain on model server Pods.	100	Minimum: 1
`scaleDownDelaySeconds` integer	ScaleDownDelay is the minimum time before a deployment is scaled down after the autoscaling algorithm determines that it should be scaled down.	30
`owner` string	Owner of the model. Used solely to populate the owner field in the OpenAI /v1/models endpoint. DEPRECATED.		Optional: {}
`loadBalancing` LoadBalancing	LoadBalancing configuration for the model. If not specified, a default is used based on the engine and request.	{ }
`files` File array	Files to be mounted in the model Pods.		MaxItems: 10
`priorityClassName` string	PriorityClassName sets the priority class for all pods created for this model. If specified, the PriorityClass must exist before the model is created. This is useful for implementing priority and preemption for models.		Optional: {}

ModelStatus¶

ModelStatus defines the observed state of Model.

Appears in: - Model

Field	Description	Default	Validation
`replicas` ModelStatusReplicas
`cache` ModelStatusCache

ModelStatusCache¶

Appears in: - ModelStatus

Field	Description	Default	Validation
`loaded` boolean

ModelStatusReplicas¶

Appears in: - ModelStatus

Field	Description	Default	Validation
`all` integer
`ready` integer

PrefixHash¶

Appears in: - LoadBalancing

Field	Description	Default	Validation
`meanLoadFactor` integer	MeanLoadPercentage is the percentage that any given endpoint's load must not exceed over the mean load of all endpoints in the hash ring. Defaults to 125% which is a widely accepted value for the Consistent Hashing with Bounded Loads algorithm.	125	Minimum: 100 Optional: {}
`replication` integer	Replication is the number of replicas of each endpoint on the hash ring. Higher values will result in a more even distribution of load but will decrease lookup performance.	256	Optional: {}
`prefixCharLength` integer	PrefixCharLength is the number of characters to count when building the prefix to hash.	100	Optional: {}