Kubernetes API¶
Packages¶
kubeai.org/v1¶
Package v1 contains API Schema definitions for the kubeai v1 API group
Resource Types¶
Adapter¶
Appears in: - ModelSpec
Field | Description | Default | Validation |
---|---|---|---|
name string |
Name must be a lowercase string with no spaces. | MaxLength: 63 Pattern: ^[a-z0-9-]+$ Required: {} |
|
url string |
File¶
File represents a file to be mounted in the model pod.
Appears in: - ModelSpec
Field | Description | Default | Validation |
---|---|---|---|
path string |
Path where the file should be mounted in the pod. Must be an absolute path. |
MaxLength: 1024 Required: {} |
|
content string |
Content of the file to be mounted. Will be injected into a ConfigMap and mounted in the model Pods. |
MaxLength: 100000 Required: {} |
LoadBalancing¶
Appears in: - ModelSpec
Field | Description | Default | Validation |
---|---|---|---|
strategy LoadBalancingStrategy |
LeastLoad | Enum: [LeastLoad PrefixHash] Optional: {} |
|
prefixHash PrefixHash |
{ } | Optional: {} |
LoadBalancingStrategy¶
Underlying type: string
Validation: - Enum: [LeastLoad PrefixHash]
Appears in: - LoadBalancing
Field | Description |
---|---|
LeastLoad |
|
PrefixHash |
Model¶
Model resources define the ML models that will be served by KubeAI.
Field | Description | Default | Validation |
---|---|---|---|
apiVersion string |
kubeai.org/v1 |
||
kind string |
Model |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
||
spec ModelSpec |
|||
status ModelStatus |
ModelFeature¶
Underlying type: string
Validation: - Enum: [TextGeneration TextEmbedding SpeechToText]
Appears in: - ModelSpec
ModelSpec¶
ModelSpec defines the desired state of Model.
Appears in: - Model
Field | Description | Default | Validation |
---|---|---|---|
url string |
URL of the model to be served. Currently the following formats are supported: For VLLM, FasterWhisper, Infinity engines: "hf:// "pvc:// "pvc:// "gs:// "oss:// "s3:// For OLlama engine: "ollama:// |
Required: {} |
|
adapters Adapter array |
|||
features ModelFeature array |
Features that the model supports. Dictates the APIs that are available for the model. |
Enum: [TextGeneration TextEmbedding SpeechToText] |
|
engine string |
Engine to be used for the server process. | Enum: [OLlama VLLM FasterWhisper Infinity] Required: {} |
|
resourceProfile string |
ResourceProfile required to serve the model. Use the format " Example: "nvidia-gpu-l4:2" - 2x NVIDIA L4 GPUs. Must be a valid ResourceProfile defined in the system config. |
||
cacheProfile string |
CacheProfile to be used for caching model artifacts. Must be a valid CacheProfile defined in the system config. |
||
image string |
Image to be used for the server process. Will be set from ResourceProfile + Engine if not specified. |
||
args string array |
Args to be added to the server process. | ||
env object (keys:string, values:string) |
Env variables to be added to the server process. | ||
envFrom EnvFromSource array |
Env variables to be added to the server process from Secret or ConfigMap. | ||
replicas integer |
Replicas is the number of Pod replicas that should be actively serving the model. KubeAI will manage this field unless AutoscalingDisabled is set to true. |
||
minReplicas integer |
MinReplicas is the minimum number of Pod replicas that the model can scale down to. Note: 0 is a valid value. |
Minimum: 0 Optional: {} |
|
maxReplicas integer |
MaxReplicas is the maximum number of Pod replicas that the model can scale up to. Empty value means no limit. |
Minimum: 1 |
|
autoscalingDisabled boolean |
AutoscalingDisabled will stop the controller from managing the replicas for the Model. When disabled, metrics will not be collected on server Pods. |
||
targetRequests integer |
TargetRequests is average number of active requests that the autoscaler will try to maintain on model server Pods. |
100 | Minimum: 1 |
scaleDownDelaySeconds integer |
ScaleDownDelay is the minimum time before a deployment is scaled down after the autoscaling algorithm determines that it should be scaled down. |
30 | |
owner string |
Owner of the model. Used solely to populate the owner field in the OpenAI /v1/models endpoint. DEPRECATED. |
Optional: {} |
|
loadBalancing LoadBalancing |
LoadBalancing configuration for the model. If not specified, a default is used based on the engine and request. |
{ } | |
files File array |
Files to be mounted in the model Pods. | MaxItems: 10 |
|
priorityClassName string |
PriorityClassName sets the priority class for all pods created for this model. If specified, the PriorityClass must exist before the model is created. This is useful for implementing priority and preemption for models. |
Optional: {} |
ModelStatus¶
ModelStatus defines the observed state of Model.
Appears in: - Model
Field | Description | Default | Validation |
---|---|---|---|
replicas ModelStatusReplicas |
|||
cache ModelStatusCache |
ModelStatusCache¶
Appears in: - ModelStatus
Field | Description | Default | Validation |
---|---|---|---|
loaded boolean |
ModelStatusReplicas¶
Appears in: - ModelStatus
Field | Description | Default | Validation |
---|---|---|---|
all integer |
|||
ready integer |
PrefixHash¶
Appears in: - LoadBalancing
Field | Description | Default | Validation |
---|---|---|---|
meanLoadFactor integer |
MeanLoadPercentage is the percentage that any given endpoint's load must not exceed over the mean load of all endpoints in the hash ring. Defaults to 125% which is a widely accepted value for the Consistent Hashing with Bounded Loads algorithm. |
125 | Minimum: 100 Optional: {} |
replication integer |
Replication is the number of replicas of each endpoint on the hash ring. Higher values will result in a more even distribution of load but will decrease lookup performance. |
256 | Optional: {} |
prefixCharLength integer |
PrefixCharLength is the number of characters to count when building the prefix to hash. | 100 | Optional: {} |