Skip to main content

Setup · ~15 minutes

Step-by-step setup

A beginner-friendly walkthrough for macOS, Linux, and Windows. Copy each command in order — by the end Claude Code will be running on free NVIDIA models.

1

Get your free NVIDIA API key

Create a free account at build.nvidia.com, verify your phone number to unlock model access, then copy your API key (it starts with nvapi-).

# 1. Go to: https://build.nvidia.com
# 2. Create free account & verify phone
# 3. Click any model → "Get API Key"
# 4. Copy it — starts with:
nvapi-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
🔒Keep this key private — never commit it to git or share in screenshots.
2

Install Claude Code CLI

Install the Claude Code CLI. The native installer is recommended — it auto-updates. You need v2.1.129 or later for the /model gateway picker to work.

# Option 1 — Native installer (recommended, auto-updates)
curl -fsSL https://claude.ai/install.sh | bash

# Option 2 — Homebrew (manual updates)
brew install --cask claude-code
# To update later: brew upgrade claude-code

# Verify (need v2.1.129+)
claude --version
🔒macOS default shell is Zsh. If you use Bash, replace ~/.zshrc with ~/.bashrc
3

Create config.yaml

Save this file as ~/litellm-nim/config.yaml. Indentation matters — 2 spaces for - model_name, 4 for litellm_params, 6 for fields inside it.

mkdir -p ~/litellm-nim
cd ~/litellm-nim
config.yaml
model_list:
  # ═══ CORE SLOTS — Claude Code uses these automatically ═══

  # SONNET slot — daily driver. Fast, clean, reliable.
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: nvidia_nim/mistralai/mistral-medium-3.5-128b
      api_key: os.environ/NVIDIA_NIM_API_KEY

  # OPUS slot — heavy reasoning for hard, multi-file work.
  - model_name: claude-opus-4-6
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-pro
      api_key: os.environ/NVIDIA_NIM_API_KEY

  # HAIKU slot — background tasks, file reads. Must be fast.
  - model_name: claude-haiku-4-5
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-flash
      api_key: os.environ/NVIDIA_NIM_API_KEY

  # ═══ SPECIALTY MODELS — switch with /model claude-<name> ═══

  - model_name: claude-mistral
    litellm_params:
      model: nvidia_nim/mistralai/mistral-medium-3.5-128b
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-deepseek
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-pro
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-deepseek-flash
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-flash
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-glm
    litellm_params:
      model: nvidia_nim/z-ai/glm-5.1
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-minimax
    litellm_params:
      model: nvidia_nim/minimaxai/minimax-m3
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-gemma
    litellm_params:
      model: nvidia_nim/google/gemma-4-31b-it
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-step
    litellm_params:
      model: nvidia_nim/stepfun-ai/step-3.7-flash
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-kimi
    litellm_params:
      model: nvidia_nim/moonshotai/kimi-k2.6
      api_key: os.environ/NVIDIA_NIM_API_KEY

  - model_name: claude-nemotron
    litellm_params:
      model: nvidia_nim/nvidia/nemotron-3-ultra-550b-a55b
      api_key: os.environ/NVIDIA_NIM_API_KEY

litellm_settings:
  drop_params: true
  request_timeout: 300

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  retry_after: 5
  allowed_fails: 2
  cooldown_time: 60

general_settings:
  master_key: "sk-litellm-local"
🔒Validate YAML before starting: python3 -c "import yaml; yaml.safe_load(open('config.yaml')); print('YAML OK')"
4

Start LiteLLM via Docker

Make sure Docker Desktop is running. This starts the proxy on localhost:4001. You want to see "Application startup complete" in the logs.

cd ~/litellm-nim

docker run -d \
  -p 4001:4000 \
  -e NVIDIA_NIM_API_KEY="nvapi-YOUR_KEY_HERE" \
  -v $(pwd)/config.yaml:/app/config.yaml \
  --name litellm-nim \
  --restart always \
  docker.litellm.ai/berriai/litellm:main-stable \
  --config /app/config.yaml

# Verify startup
docker logs litellm-nim
Expected output
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:4000
🔒Replace nvapi-YOUR_KEY_HERE with your real key. If logs show a traceback, your config.yaml has a YAML error.
5

Add shell alias

This alias routes Claude Code through your local proxy and enables the /model gateway picker. Use ANTHROPIC_AUTH_TOKEN (not API_KEY) to avoid an auth-conflict warning.

# Add to ~/.zshrc
export NVIDIA_NIM_API_KEY="nvapi-YOUR_KEY_HERE"

alias claude-nim='\
  ANTHROPIC_BASE_URL="http://localhost:4001" \
  ANTHROPIC_AUTH_TOKEN="sk-litellm-local" \
  CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 \
  ANTHROPIC_MODEL="claude-sonnet-4-6" \
  ANTHROPIC_DEFAULT_OPUS_MODEL="claude-opus-4-6" \
  ANTHROPIC_DEFAULT_SONNET_MODEL="claude-sonnet-4-6" \
  ANTHROPIC_DEFAULT_HAIKU_MODEL="claude-haiku-4-5" \
  claude'
🔒Replace nvapi-YOUR_KEY_HERE with your real key. ANTHROPIC_BASE_URL uses port 4001.

Launch Claude Code

Reload your shell config and launch. Inside Claude Code, type /model (bare) to see your NVIDIA models in the gateway picker.

source ~/.zshrc
claude-nim
🔒If asked "Do you want to use this API key?" — select Yes. Then type /model to switch models.

Quick validation

Run these checks to confirm the setup is working before adding aliases.

# 1) Confirm container is running
docker ps --filter "name=litellm-nim"

# 2) Confirm proxy responds
# macOS / Linux / WSL
curl http://localhost:4001/v1/models

# Windows PowerShell
iwr http://localhost:4001/v1/models

# 3) Launch Claude Code via proxy
# macOS / Linux / WSL
ANTHROPIC_BASE_URL="http://localhost:4001" \
  ANTHROPIC_AUTH_TOKEN="sk-litellm-local" \
  ANTHROPIC_MODEL="claude-sonnet-4-6" claude

# Windows PowerShell
$env:ANTHROPIC_BASE_URL="http://localhost:4001"
$env:ANTHROPIC_AUTH_TOKEN="sk-litellm-local"
$env:ANTHROPIC_MODEL="claude-sonnet-4-6"
claude
Tip: After this works, add the permanent alias/function to your shell profile.

Pro tip

Triple your rate limit with 3 free keys

NVIDIA's free tier allows ~40 requests/minute per API key. Create 3 free accounts, add all 3 keys to LiteLLM's config, and the router automatically distributes requests across them — effectively tripling your limit.

Keys3 free accounts
Limit~120 req/min combined
Strategysimple-shuffle

Step 1 — Set env vars

# Add all 3 keys to ~/.zshrc (macOS) or ~/.bashrc (Linux)
export NVIDIA_KEY_1="nvapi-YOUR_FIRST_KEY"
export NVIDIA_KEY_2="nvapi-YOUR_SECOND_KEY"
export NVIDIA_KEY_3="nvapi-YOUR_THIRD_KEY"

Step 2 — Update config.yaml (round-robin routing)

model_list:
  # Rotate 3 free keys for 3× the rate limit (40 req/min each)
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: nvidia_nim/mistralai/mistral-medium-3.5-128b
      api_key: os.environ/NVIDIA_KEY_1
      rpm: 38
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: nvidia_nim/mistralai/mistral-medium-3.5-128b
      api_key: os.environ/NVIDIA_KEY_2
      rpm: 38
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: nvidia_nim/mistralai/mistral-medium-3.5-128b
      api_key: os.environ/NVIDIA_KEY_3
      rpm: 38

  - model_name: claude-opus-4-6
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-pro
      api_key: os.environ/NVIDIA_KEY_1
      rpm: 38
  - model_name: claude-opus-4-6
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-pro
      api_key: os.environ/NVIDIA_KEY_2
      rpm: 38
  - model_name: claude-opus-4-6
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-pro
      api_key: os.environ/NVIDIA_KEY_3
      rpm: 38

  - model_name: claude-haiku-4-5
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-flash
      api_key: os.environ/NVIDIA_KEY_1
      rpm: 38
  - model_name: claude-haiku-4-5
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-flash
      api_key: os.environ/NVIDIA_KEY_2
      rpm: 38
  - model_name: claude-haiku-4-5
    litellm_params:
      model: nvidia_nim/deepseek-ai/deepseek-v4-flash
      api_key: os.environ/NVIDIA_KEY_3
      rpm: 38

litellm_settings:
  drop_params: true
  request_timeout: 300

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  retry_after: 5
  allowed_fails: 2
  cooldown_time: 60

general_settings:
  master_key: "sk-litellm-local"

Step 3 — Restart Docker with all 3 keys

docker run -d \
  -p 4001:4000 \
  -e NVIDIA_KEY_1="$NVIDIA_KEY_1" \
  -e NVIDIA_KEY_2="$NVIDIA_KEY_2" \
  -e NVIDIA_KEY_3="$NVIDIA_KEY_3" \
  -v $(pwd)/config.yaml:/app/config.yaml \
  --name litellm-nim \
  --restart always \
  docker.litellm.ai/berriai/litellm:main-stable \
  --config /app/config.yaml

Hit a snag?

Check the troubleshooting guide for the most common setup errors.

Troubleshooting →