VictoriaMetrics

Prometheus 本地 TSDB 适合保存短期指标——几天到几周。当需要保留更长时间（几个月甚至一年以上）、多个 Prometheus 实例的数据汇总、或应对高基数指标场景时，本地 TSDB 的磁盘和查询性能会碰到瓶颈。VictoriaMetrics 就是解决这类问题的：它兼容 Prometheus 查询接口和 remote_write 协议，可以作为 Prometheus 的远端存储，也可以用在自己的采集体系里。

一、在指标体系中的位置

VictoriaMetrics 对外暴露的查询接口和 Prometheus 的 HTTP API 兼容。Grafana 和 Nightingale 用同样的 PromQL 就能查到数据，不需要在查询侧做额外适配。

二、单节点部署

实验环境通常用单节点（single-node）版本，一个二进制文件同时承担写入、查询和存储：

bash

# 下载 VictoriaMetrics 单节点二进制
curl -fL -o victoria-metrics-amd64.tar.gz \
  https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.119.0/victoria-metrics-linux-amd64-v1.119.0.tar.gz

tar -xzf victoria-metrics-amd64.tar.gz

install -m 0755 victoria-metrics-prod /usr/local/bin/victoria-metrics

systemd unit：

ini

[Unit]
Description=VictoriaMetrics
After=network-online.target
Wants=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/victoria-metrics \
  --storageDataPath=/var/lib/victoria-metrics \
  --retentionPeriod=12 \
  --httpListenAddr=:8428
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

参数	作用
`--storageDataPath`	数据存储目录
`--retentionPeriod`	保留月数，`12` 表示保留 12 个月
`--httpListenAddr`	监听端口，默认 `:8428`

启动后验证：

bash

# 确认 ready
curl -s http://127.0.0.1:8428/-/ready

# 查一条 PromQL，确认查询接口正常
curl -sG \
  --data-urlencode 'query=up' \
  http://127.0.0.1:8428/api/v1/query

三、Prometheus 对接 remote_write

Prometheus 通过 remote_write 把本地数据实时复制到 VictoriaMetrics。配置加在 prometheus.yml 里：

yaml

remote_write:
  - url: http://127.0.0.1:8428/api/v1/write
    # Prometheus 本地保留短期的（比如 7 天），VictoriaMetrics 保留长期的（比如 12 个月）

加了 remote_write 之后，Prometheus 本地 TSDB 仍然正常保留数据（按 --storage.tsdb.retention.time），VictoriaMetrics 侧独立保留（按 --retentionPeriod）。两边是独立的存储，VictoriaMetrics 挂了不会影响 Prometheus 本地数据，恢复后数据可以补推。

如果还需要从 VictoriaMetrics 读回历史数据（比如 Prometheus 重启后本地数据没了，想补回一部分），可以再加 remote_read：

yaml

remote_read:
  - url: http://127.0.0.1:8428/api/v1/read

不过实际环境里 remote_read 用得不如 remote_write 多——Grafana 和 Nightingale 直接查 VictoriaMetrics 就行，不需要再绕回 Prometheus 读。

四、和 Prometheus TSDB 的差异

方面	Prometheus TSDB	VictoriaMetrics
存储压缩	一般	更高，同等数据量下磁盘占用明显更少
查询性能	小规模够用	大时间范围、高基数查询更稳定
集群模式	本身不支持，需要 Thanos/Mimir	内置集群模式（victoria-metrics-cluster）
部署复杂度	单二进制	单节点也是单二进制
PromQL 兼容	完整	大部分兼容，部分边缘函数行为略有差异

VictoriaMetrics 的压缩率通常是 Prometheus TSDB 的 3-7 倍，具体和数据形态有关。同样保留 30 天的数据，VictoriaMetrics 可能只占 Prometheus TSDB 几分之一的磁盘。

五、常见操作

写入验证：

bash

# 写入一个测试指标
curl -s -X POST http://127.0.0.1:8428/api/v1/import/prometheus \
  --data 'test_metric{job="test"} 1'

# 确认能查到
curl -sG \
  --data-urlencode 'query=test_metric' \
  http://127.0.0.1:8428/api/v1/query

查看存储统计：

bash

# TSDB 状态，包含序列数、数据点、磁盘占用等
curl -s http://127.0.0.1:8428/api/v1/status/tsdb

查看当前活跃序列数——这个值和 Prometheus 的 prometheus_tsdb_head_series 类似，用来判断标签基数是否在增长：

bash

curl -s http://127.0.0.1:8428/api/v1/status/tsdb \
  | python3 -c "import sys,json; d=json.load(sys.stdin); print(json.dumps(d['data']['totalSeries'], indent=2))"

高基数排查时，VictoriaMetrics 的 /api/v1/status/tsdb 会返回按指标名前缀统计的序列数，能很快定位是哪类指标的标签取值太多。

六、用在地域集群和长期存储

VictoriaMetrics 的集群模式（victoria-metrics-cluster）把写入、查询和存储拆成不同组件，适合跨地域和多 Prometheus 汇总场景：

单节点版本适合实验环境和中小规模。集群版本适合多 Prometheus 汇总、跨机房容灾、长期大批量存储。当前实验环境用单节点版本就够了。

VictoriaMetrics 也自带告警规则计算（vmalert），不过这套实验里告警计算放在 Prometheus 和 Nightingale 侧，vmalert 暂时不展开。

VictoriaMetrics ​

一、在指标体系中的位置 ​

二、单节点部署 ​

三、Prometheus 对接 remote_write ​

四、和 Prometheus TSDB 的差异 ​

五、常见操作 ​

六、用在地域集群和长期存储 ​

VictoriaMetrics

一、在指标体系中的位置

二、单节点部署

三、Prometheus 对接 remote_write

四、和 Prometheus TSDB 的差异

五、常见操作

六、用在地域集群和长期存储