Skip to content

Configure reranking models

KubeAI supports reranking models via the Infinity and vLLM (Recommended for GPU) engine.

Install BAAI/bge-reranker-base model using Infinity

Create a file named kubeai-rerank-models.yaml with the following content:

catalog:
  bge-rerank-base-cpu:
    enabled: true
    features: ["Reranking"]
    owner: baai
    url: "hf://BAAI/bge-reranker-base"
    engine: Infinity
    #engine: VLLM
    resourceProfile: cpu:1
    # resourceProfile: nvidia-gpu-l4:1
    minReplicas: 1

Apply the kubeai-models helm chart:

helm install kubeai-models kubeai/models -f ./kubeai-rerank-models.yaml.yaml

Once the pod is ready, you can call the rerank endpoint:

import requests
resp = requests.post(
    "http://localhost:8000/vllm/v1/rerank",
    json={
        "model": "bge-rerank-base-cpu",
        "query": "Which document talks about apples?",
        "documents": ["An apple a day keeps the doctor away", "Oranges are tasty"],
    },
)
print(resp.json())