Skip to Content
PostsLLM context size support in Ollama Helm-chart

TL;DR

The Ollama Helm chart (version 1.5.0+) now supports creating custom LLM configurations via Modelfiles during deployment. Example of Llama3.1 deployment with a 16384 token context size:

cat <<EOF | helm upgrade ollama ollama-helm/ollama --install --create-namespace -n ollama --version 1.5.0 -f - persistentVolume: enabled: true size: 50Gi ollama: gpu: enabled: true models: pull: - llama3.1:8b create: - name: llama3.1-ctx16384 template: | FROM llama3.1:8b PARAMETER num_ctx 16384 run: - llama3.1-ctx16384 extraEnv: - name: OLLAMA_KEEP_ALIVE value: 24h EOF

Long read

The Ollama Helm-chart provides a convenient way to deploy self-hosted Large Language Models (LLMs) on Kubernetes. You can deploy models from the ollama model library using the pull and run configurations:

cat <<EOF | helm upgrade ollama ollama-helm/ollama --install --create-namespace -n ollama --version 1.5.0 -f - persistentVolume: enabled: true size: 50Gi ollama: gpu: enabled: true models: pull: - llama3.1:8b run: - llama3.1:8b extraEnv: - name: OLLAMA_KEEP_ALIVE value: 24h EOF

Ollama also supports HugginFace:

cat <<EOF | helm upgrade ollama ollama-helm/ollama --install --create-namespace -n ollama --version 1.5.0 -f - persistentVolume: enabled: true size: 50Gi ollama: gpu: enabled: true models: pull: - hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0 run: - hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0 extraEnv: - name: OLLAMA_KEEP_ALIVE value: 24h EOF

In practice, we usually need to change parameters of deployed models, but Ollama helm-chart does not support configuration of models at deployment stage. For exampe, to change the context size, which is 2048 tokens by default, you previously had to:

  1. Use the Ollama REST API:
curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "Why is the sky blue?", "options": { "num_ctx": 4096 } }'
  1. Utilize any Ollama REST API client like Open-WebUI
  2. Utilize any Ollama Client library

All the options mentioned above require additional steps after deployment, but we would like to define custom settings during the deployment phase. Fortunately, ollama supports Modelfile which can be used to solve the issue.

I created a new issue in Ollama helm-chart repository with proposal for improvement and implemented the functionality in my personal helm repository, so you can deploy ollama custom model this way:

⚠️
Warning

The example below uses my personal helm-chart implementation at https://olegsmetanin.github.io/helm-charts/, this is not the official Ollama Helm-chart!

$ helm repo add olegsmetanin https://olegsmetanin.github.io/helm-charts $ helm upgrade ollama olegsmetanin/ollama \ --install --create-namespace -n ollama --version 0.1.0 \ --set 'persistentVolume.enabled=true' \ --set 'persistentVolume.size=20Gi' \ --set 'ollama.gpu.enabled=false' \ --set 'ollama.models.pull[0]=llama3.1:8b' \ --set 'ollama.models.create[0].name=llama3.1-ctx16384' \ --set 'ollama.models.create[0].model=FROM llama3.1:8b\\nPARAMETER num_ctx 16384' \ --set 'ollama.models.run[0]=llama3.1-ctx16384' \ --set 'extraEnv[0].name=OLLAMA_KEEP_ALIVE' \ --set 'extraEnv[0].value=24h'

2025-02-14 my proposal was implemented by Ollama Helm-chart maintainers in official repository. Now we can deploy Llama3.1 with a 16384 token context size using the following example:

cat <<EOF | helm upgrade ollama ollama-helm/ollama --install --create-namespace -n ollama --version 1.5.0 -f - persistentVolume: enabled: true size: 50Gi ollama: gpu: enabled: true models: pull: - llama3.1:8b create: - name: llama3.1-ctx16384 template: | FROM llama3.1:8b PARAMETER num_ctx 16384 run: - llama3.1-ctx16384 extraEnv: - name: OLLAMA_KEEP_ALIVE value: 24h EOF

Happy LLM deployment!

Last updated on