Distributed Tracing (OpenTelemetry)#
Salt can emit OpenTelemetry spans for every inter-process hop, so a single
job (salt '*' test.ping) becomes a single distributed trace that crosses
the CLI, the master, the minion, the return path, and any reactor or syndic
forwarding in between.
The implementation uses standard W3C TraceContext (traceparent /
tracestate) for propagation and ships spans through an OTLP exporter.
Jaeger ingests OTLP natively, as do most modern tracing backends
(Tempo, Honeycomb, Datadog OTLP, etc.).
Trace context propagates inside the AES-encrypted Salt envelope: an attacker on the wire cannot see the trace headers, and authenticated participants (master / minion / syndic) decode them after AES decryption.
Tracing is disabled by default and is a complete no-op when not configured. No spans are created, no exporter is initialised, and no background threads are started.
Configuration#
Add a tracing block to the master and minion configs. The block is
identical on both daemons, and applies to salt-cli, salt-call,
salt-api and salt-ssh as well.
tracing:
enabled: true
exporter: otlp-http # otlp-http | otlp-grpc | console
endpoint: "" # OTel SDK default endpoint when empty
service_name: "" # auto-derived when empty
sampler: parent_based # parent_based | always_on | always_off | trace_id_ratio
sampler_arg: 1.0
resource_attributes: {}
insecure: true # gRPC TLS disabled (ignored for HTTP)
headers: {} # OTLP authentication headers
enabledMaster switch. When
false(the default), everything in this module is a no-op.exporterotlp-http(default) sends spans via HTTP/protobuf toendpoint. Pure-Python; ships in salt's base requirements; works on every interpreter.otlp-grpcsends via gRPC. Requiresopentelemetry-exporter-otlp-proto-grpcto be installed separately (it pulls ingrpcio, which lacks prebuilt wheels for some platform / interpreter combinations).consoleprints spans to stdout for debugging.endpointOTLP collector URL. When empty, the OTel SDK default is used (
http://localhost:4318/v1/tracesfor HTTP,http://localhost:4317for gRPC).service_nameThe
service.nameresource attribute. When empty, Salt fills this in automatically:salt-master,salt-minion-<id>,salt-cli,salt-call,salt-api.samplerWhich sampler to install on the
TracerProvider.parent_based(default): follow the parent's sample decision; root spans are sampled. Usesampler_arg< 1.0 to apply a ratio to root spans.always_on: sample every span.always_off: drop every span (testing only).trace_id_ratio: samplesampler_argfraction of trace IDs.
resource_attributesExtra attributes merged into the OTel Resource (e.g.
deployment.environment: prod).insecureDisable gRPC TLS to the collector. Ignored for the HTTP exporter.
headersAdditional headers sent on every OTLP request, e.g.
Authorization: Bearer <token>for a hosted collector.
Hops covered#
A single salt '*' test.ping produces a trace spanning at least:
salt.cli.test.ping— root span on the CLI.salt.req.send.publish— CLI → master request.salt.req.recv.publish— master receives the request.salt.pub.send— master publishes the job.salt.minion.recv.test.ping— minion receives the published command.salt.minion.exec.test.ping— minion executes the function.salt.req.send._return— minion returns to master.salt.req.recv._return— master receives the return.
Other instrumented hops:
Event bus (
fire_event/get_event) — every IPC and TCP-IPC event carries trace context in its data dict.Reactor — extracts trace context from incoming events and parents the reaction span correctly.
Syndic forwarding — both inbound (from upstream master) and outbound (to downstream minions).
Salt-SSH — propagates trace context as the
TRACEPARENTenvironment variable on the remote shim.Salt-API — extracts the
traceparentHTTP header from incoming requests; webhooks inject context into the events they fire.
Running a quick demo#
Spin up an all-in-one Jaeger:
docker run -d --name jaeger \
-p 16686:16686 -p 4318:4318 \
jaegertracing/all-in-one:latest
Configure master + minion with:
tracing:
enabled: true
exporter: otlp-http
endpoint: http://localhost:4318/v1/traces
sampler: always_on
Start them, run salt '*' test.ping, then visit
http://localhost:16686 and search for the salt-cli service. You
should see a single trace with spans hanging off three services:
salt-cli, salt-master and salt-minion-<id>.
Fork handling#
The OTel BatchSpanProcessor runs a background thread that does not
survive fork(). Salt rebuilds the provider in every forked child the
first time a tracing API is invoked, so worker processes spun up by the
master / minion get their own functioning exporter without any caller
action. Unflushed spans queued by the parent at the instant of fork may
be lost; for short-lived spans this is rarely visible, but if you observe
gaps consider lowering BatchSpanProcessor queue intervals via the OTel
environment variables.
Payload overhead#
When tracing is enabled and a recording span is active, every Salt request
and event grows by roughly 60 bytes (the W3C traceparent string).
When no recording span is active — for example, an internal periodic event
fired outside a request handler — no headers are added.