用 Nginx map 為 vLLM 推理服務加上 API Key 驗證

日期：2026-03-24 環境：Ubuntu 24.04 / Nginx / vLLM (Docker)

TL;DR

vLLM 內建的 --api-key 只支援單一金鑰且僅保護 /v1 路徑。改用 Nginx map 指令在反向代理層做驗證，可以支援多組 API key、個別身份識別、存取日誌追蹤，且不需要動 vLLM 本身。

背景

將 LLM 推理服務對外開放時，需要防止未授權存取。常見需求：

多組 API key 對應不同使用者/應用
能個別撤銷某把 key 而不影響其他人
存取日誌能追蹤「誰在打」

如果前面已經有 Nginx 做反向代理，直接在 Nginx 層處理驗證是最自然的選擇。

vLLM 內建驗證的限制

vLLM 提供兩種方式設定 API key：

# 方式一：啟動參數
vllm serve model --api-key "your-key"

# 方式二：環境變數
VLLM_API_KEY="your-key" vllm serve model

限制：

問題	說明
單一金鑰	所有使用者共用一把，洩漏就全部完蛋
僅保護 `/v1`	其他端點（如 `/metrics`、`/health`）不受保護
無身份識別	log 裡看不出是誰在打
無法個別撤銷	換 key 就是全部人都要換

參考：vLLM Security 文件明確指出 API key 僅保護 OpenAI 相容端點。

解法：Nginx map + 反向代理驗證

架構

Client (帶 Authorization header)
  → Nginx (:8001)
    ├── 驗證 API key（map 對應身份）
    ├── 無效 key → 401 Unauthorized
    ├── 有效 key → proxy_pass 到 vLLM
    └── 記錄身份到 access log
  → vLLM (:8000 localhost)

vLLM 只綁 localhost，外部流量必須經過 Nginx。

步驟一：建立 API Key Map 檔

在 /etc/nginx/api_keys.conf：

# API Key Map
# 格式: "Bearer <key>" <identity>;
# 新增 key: 加一行 → sudo nginx -t && sudo systemctl reload nginx

map $http_authorization $api_client {
    default "";
    "Bearer abc123...your-key-here..." "app-frontend";
    "Bearer def456...another-key..."   "partner-api";
}

運作原理：

map 指令讀取請求的 Authorization header（Nginx 自動映射為 $http_authorization）
比對 key 值，將結果存入 $api_client 變數
未匹配的 key → $api_client 為空字串
map 在 http context 定義，但只在變數被使用時才求值（lazy evaluation），不影響效能

步驟二：在 nginx.conf 引入 map 和自訂 log 格式

在 http {} 區塊中加入：

# 增加 bucket size（API key 字串較長時需要）
map_hash_bucket_size 128;

# 引入 key map
include /etc/nginx/api_keys.conf;

# 自訂 log 格式，記錄 API 身份
log_format vllm_api '[$time_local] client=$api_client remote=$remote_addr '
                    'method=$request_method uri=$request_uri status=$status '
                    'bytes=$body_bytes_sent time=$request_time';

關於 map_hash_bucket_size： Nginx 用 hash table 儲存 map 的 key。預設 bucket size 是 64 bytes，而一個 64 字元的 hex API key 加上 "Bearer " 前綴就超過了。設為 128 即可。詳見 Nginx Hash 設定文件。

步驟三：設定 site config 的驗證邏輯

server {
  listen 8001;
  server_name your-server;

  access_log /var/log/nginx/vllm_access.log vllm_api;

  location / {
    # API key 驗證 — 空字串代表未匹配任何 key
    if ($api_client = "") {
      return 401 '{"error": "Unauthorized"}';
    }

    proxy_pass http://localhost:8000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-API-Client $api_client;

    # Streaming 支援（LLM 回應通常是 SSE stream）
    proxy_buffering off;
    proxy_request_buffering off;
    chunked_transfer_encoding on;

    # LLM 推理可能很慢，設長一點
    proxy_connect_timeout 360s;
    proxy_send_timeout 360s;
    proxy_read_timeout 360s;
  }
}

步驟四：測試並套用

# 語法檢查
sudo nginx -t

# 套用設定（不中斷連線）
sudo systemctl reload nginx

關於 Nginx `if` 的注意事項

Nginx 社群有句名言：「if is evil」。但這有前提：

情境	安全性
`if` + `return`	✅ 安全 — 直接回應，不進入後續處理
`if` + `rewrite`	✅ 安全 — 改寫 URI
`if` + `proxy_pass`	⚠️ 危險 — 可能產生隱含的 nested location
`if` 在 server context	⚠️ 可能不如預期 — 與 location 的互動有已知問題

我們的用法（if + return 401）屬於安全的模式：匹配時直接回 401，不會進入 proxy_pass 邏輯。

參考：nginx-wiki: If Is Evil

驗證結果

# 無 key → 401
$ curl -s http://server:8001/v1/models
{"error": "Unauthorized"}

# 錯誤 key → 401
$ curl -s -H "Authorization: Bearer wrong-key" http://server:8001/v1/models
{"error": "Unauthorized"}

# 正確 key → 200
$ curl -s -H "Authorization: Bearer <valid-key>" http://server:8001/v1/models
{"object":"list","data":[{"id":"model-name",...}]}

Log 輸出：

[24/Mar/2026:10:38:15 +0800] client= remote=::1 ... status=401
[24/Mar/2026:10:38:15 +0800] client=app-frontend remote=::1 ... status=200

client= 為空代表未授權請求；有值代表已識別身份。

日常管理

新增 API Key

# 1. 生成 key
openssl rand -hex 32

# 2. 加到 /etc/nginx/api_keys.conf
#    "Bearer <new-key>" "new-client-name";

# 3. 測試 + 套用
sudo nginx -t && sudo systemctl reload nginx

撤銷 API Key

從 api_keys.conf 刪除該行，reload 即可。其他 key 不受影響。

查看誰在打

# 即時監控
tail -f /var/log/nginx/vllm_access.log

# 統計各 client 的請求數
awk '{print $2}' /var/log/nginx/vllm_access.log | sort | uniq -c | sort -rn

進階：未來可擴充的方向

需求	做法
Rate limiting	Nginx `limit_req_zone` 搭配 `$api_client` 做 per-client 限流
Token 預算控制	在應用層追蹤每個 client 的 token 消耗量
自動輪換 key	搭配 script 定期更新 `api_keys.conf`
更完整的身份驗證	改用 OAuth 2.0 / JWT，但架構會複雜很多

學到的事

Nginx map 是 lazy evaluation — 定義在 http 層但只在變數被引用時才執行比對，不會拖慢每個請求
map_hash_bucket_size 要配合 key 長度 — 64 字元 hex key 加上 "Bearer " 超過預設 64 bytes，需調到 128
if + return 在 location 內是安全的 — 只有 if 搭配 proxy_pass 等 content handler 時才會出問題
vLLM 的 --api-key 只保護 /v1 — 如果需要全路徑保護，必須在反向代理層處理
LLM 推理端點的保護不只是 auth — 長期來看，rate limiting 和 token 預算管理同樣重要，因為一個惡意請求可能消耗大量 GPU 運算

參考資料

Nginx map module 官方文件 — map 指令的完整語法和參數
Nginx Hash 設定說明 — 理解 bucket size 和 max size 的關係
nginx-wiki: If Is Evil — if 指令的已知問題和安全用法
vLLM Security 文件 — 內建 API key 的限制說明
LLM API Security: Rate Limiting, Auth, and Abuse Prevention — LLM 端點保護的全面指南