Posts Tagged with “llm”

Basic LLM performance testing of A100, RTX A6000, H100, H200 Spot GPU instances from DataCrunch

It looks like DataCrunch currently offers one of the best cost-effective pricing plans for running GPU-powered Virtual machines for LLM applications. In this guide we will provision Spot GPU-powered VM at DataCrunch, create a Kubernetes cluster, install the NVIDIA Helm chart to unlock GPU-capabilities for the applications, install the Ollama Helm chart to run a self-hosted LLM in Kubernetes, configure OpenWebUI to use a remote self-hosted LLM and test LLM performance for different configurations (A100, RTX A6000, H100, H200). Read More →

Basic LLM performance testing of GPU-powered Kubernetes nodes from Rackspace Spot

It looks like Rackspace Spot currently offers attractive pricing plans for running GPU-powered Kubernetes nodes for LLM applications. In this guide we will create a managed Kubernetes cluster in Rackspace Spot, install the NVIDIA Helm chart to unlock GPU-capabilities for the applications, install the Ollama Helm chart to run a self-hosted LLM in Kubernetes, configure OpenWebUI to use a remote self-hosted LLM and test LLM performance for different configurations (A30, H100). Read More →

LLM context size support in Ollama Helm-chart

Ollama Helm chart simplifies the deployment of self-hosted LLMs on Kubernetes. However, it originally didn't support model creation. To bridge this gap, I extended the chart in my helm repository and proposed this functionality to the main Ollama Helm chart. Today, that proposal was successfully implemented by chart maintainers. Let's look how it works. Read More →