salt.runners.cluster#

Salt runner for cluster ring management and inspection.

Query-only operator surface for the Raft-backed cluster. Reads come from the per-master persisted Raft state on disk, so the runner does not need IPC into the publish daemon's RaftService (which is a separate process and not reachable from a runner subprocess).

CLI Examples:

# Show this master's view of the cluster voter/learner set.
salt-run cluster.members

# Show this master's current ring state.
salt-run cluster.ring_info

New in version 3009.0.

salt.runners.cluster.collect_from_peers(channels=(), banks=('jobs/loads', 'jobs/minions', 'jobs/endtimes', 'jobs/nocache'))#

Pull cache contents from every peer to this master.

The migration "going out" runner — reverses cluster.sync_roots() direction. This master fires a cluster/runner/collect_from_peers event; the publish daemon broadcasts a collect-request to every peer. Each peer streams its cache contents for the requested channels back over the existing state-sync chunk transport, and this master's receiver applies them locally.

Use to gather full coverage before flipping cluster.route_clear() a data type back to broadcast: after every master has run this runner successfully, every master holds the full keyspace again and a route flip won't strand reads.

Two channel families are supported:

channels — fixed state-sync channels keys and denied_keys (the join-time minion-key transport).
banks — arbitrary salt.cache.Cache banks (e.g. the salt_cache returner's jobs/* banks). Each bank name is wrapped as a bank:<bank> channel on the wire and the peer streams it via salt.cluster.state_sync.iter_bank_chunks().

Parameters:

channels -- Iterable of fixed state-sync channel names (subset of {"keys", "denied_keys"}). Defaults to empty; only set when migrating PKI banks (the default keys/denied_keys layout is intentionally broadcast in this branch — see MULTI_RING_DESIGN.md).
banks -- Iterable of salt.cache.Cache bank names. Defaults to the four jobs/* banks written by salt.returners.salt_cache, which is the production case for multi-ring migrations.

Fire-and-forget: the runner returns immediately after the event is on the bus. Poll local cache contents (or tail the master log for state-sync ... installed N items) to confirm delivery from each peer.

CLI Examples:

# Default: collect the jobs/* banks from every peer.
salt-run cluster.collect_from_peers

# Collect a specific bank only.
salt-run cluster.collect_from_peers banks='["jobs/loads"]'

# Operator migrating a routed PKI-keys bank (rare).
salt-run cluster.collect_from_peers channels='["keys"]' banks='[]'

salt.runners.cluster.members()#

Return this master's view of the cluster's committed Raft membership.

Reads the persisted Raft log and snapshot from the local SaltStorage and replays committed CONFIG entries through a fresh MembershipStateMachine. The returned set is what this master has applied locally — in a healthy cluster every master converges to the same answer, but the response is local-only and may briefly diverge during membership changes.

Output:

{
    "node_id":            str,         # this master's interface
    "voters":             [str, ...],  # sorted
    "learners":           [str, ...],  # sorted
    "membership_version": int,         # log index of latest CONFIG entry
    "voter_count":        int,
    "learner_count":      int,
}

membership_version is -1 when no CONFIG entry has been applied yet (e.g. a fresh master that has not finished joining).

CLI Example:

salt-run cluster.members

New in version 3009.0.

salt.runners.cluster.migrate_jobs_to_cache(dry_run=False)#

Migrate job-cache state from the local_cache returner layout into the bank layout salt.returners.salt_cache uses.

The default master_job_cache: local_cache returner writes each JID to <cachedir>/jobs/<2-hex>/<28-hex>/{.load.p, .minions.p, <minion_id>/return.p, …}. Operators flipping to master_job_cache: salt_cache (the multi-ring-capable returner) start with an empty bank set — every job submitted before the flip becomes invisible to the new returner.

This one-shot runner walks the old filesystem layout and populates the salt_cache banks:

<cachedir>/jobs/<2>/<28>/.load.p        -> bank "jobs/loads",     key=jid
<cachedir>/jobs/<2>/<28>/.minions.p     -> bank "jobs/minions",   key=jid
<cachedir>/jobs/<2>/<28>/endtime        -> bank "jobs/endtimes",  key=jid
<cachedir>/jobs/<2>/<28>/nocache        -> bank "jobs/nocache",   key=jid
<cachedir>/jobs/<2>/<28>/<m>/return.p   -> bank "jobs/returns/<jid>", key=<m>
<cachedir>/jobs/<2>/<28>/<m>/out.p      -> folded into the same record

The original files are left in place — operators who want to reclaim the disk can rm -rf <cachedir>/jobs after confirming the new banks are correct (running cluster.members / salt-run jobs.list_jobs against the new returner is the smoke check).

Parameters:: dry_run -- If True, walk and count without writing any cache entries. Use to verify the runner sees every JID before committing.

Returns a structured result:

{
    "status":           "ok" | "skipped",
    "scanned":          int,   # JIDs walked
    "migrated":         int,   # JIDs successfully written
    "skipped":          int,   # malformed entries the runner ignored
    "returns_migrated": int,   # minion return records written
    "dry_run":          bool,
    "jobs_root":        str,   # path that was walked
}

CLI Examples:

# Preview without writing anything.
salt-run cluster.migrate_jobs_to_cache dry_run=True

# Actually copy the state across.
salt-run cluster.migrate_jobs_to_cache

Operationally: stop the master before flipping master_job_cache so new writes don't race the migration, run this runner, restart the master with the new opt set.

salt.runners.cluster.ring_create(name, voters)#

Create a named ring with the given founding voters.

Fires a cluster/runner/ring_create event on the master's local bus; the publish daemon intercepts it and proposes a RING_REGISTRY entry through the cluster Raft group. Each master that is in voters will then bring up the per-ring Raft group locally when the registry entry commits.

Parameters:

name -- Operator-chosen ring identifier (e.g. "jobs").
voters -- List of master node-ids (interface addresses) to serve as the founding voter set of the ring.

Asymmetric with cluster.ring_destroy: this runner only requests creation — bring-up of the per-ring Node is driven by the registry's commit callback inside the daemon.

CLI Example:

salt-run cluster.ring_create name=jobs voters='["m1","m2","m3"]'

salt.runners.cluster.ring_destroy(name)#

Mark the named ring as destroyed.

Fires a cluster/runner/ring_destroy event; the publish daemon proposes a RING_REGISTRY entry with status="destroyed". Once committed, every master that hosted the ring's Raft group tears it down locally. The on-disk state is left in place so an operator who re-creates the same ring picks up the persisted state.

Parameters:: name -- Ring identifier (must match the name used at ring_create() time).

CLI Example:

salt-run cluster.ring_destroy name=jobs

salt.runners.cluster.ring_info()#

Return a snapshot of this master's ring state.

Reads the per-process ring populated by RaftService. Output:

{
    "is_clustered": bool,
    "node_count":   int,
    "nodes":        [str, ...],   # sorted
    "vnodes":       int,
}

Note that runners run in their own subprocess; the ring instance they see is not the publish daemon's ring. In the current design that subprocess never has a populated ring, so this function will always report is_clustered=False until stage 2 introduces a process-shared ring (see GAPS.md). The signature is stable so the caller's contract does not change when the backing source does.

CLI Example:

salt-run cluster.ring_info

salt.runners.cluster.ring_set(name=None, members=None, replicas=None)#

Propose a new policy for the named ring.

Fires a cluster/runner/ring_set event; the publish daemon proposes a RING_CONFIG entry on the ring's own Raft log (not the cluster log). Partial updates are honoured — omit a knob to keep its existing value.

Parameters:

name -- Ring identifier (required).
members -- "self" (ring is self-only — gate writes broadcast) or "voters" (ring tracks the ring's committed voter set — gate writes shard). None keeps the existing value.
replicas -- Integer >= 1. None keeps the existing value.

Must be invoked on a master that is a leader of the named ring's Raft group. Operators typically discover this by checking cluster.members first to find the ring's current leader.

CLI Example:

salt-run cluster.ring_set name=jobs members=voters replicas=2

salt.runners.cluster.rings()#

Return the cluster-log multi-ring registry as this master sees it.

Reads the persisted cluster Raft log on this master and replays every committed RING_REGISTRY entry through a fresh RingRegistryStateMachine. The result is the registry view this master has applied locally — in a healthy cluster every master converges to the same answer, but during a membership change a follower may lag by a heartbeat.

Output:

{
    "node_id": str,                # this master's interface
    "rings": {
        "<ring_id>": {
            "founding_voters": [str, ...],
            "status":          "active" | "destroyed",
        },
        ...
    },
    "active_rings":       [str, ...],   # sorted, status=="active" only
    "registry_version":   int,          # log index of last commit, -1 if none
}

CLI Example:

salt-run cluster.rings

New in version 3009.0.

salt.runners.cluster.route_clear(data_type)#

Clear the route for a data type, returning it to broadcast.

Fires a cluster/runner/route_clear event; the publish daemon proposes a ROUTE entry mapping data_type to None. Once committed, every master mirrors the data type's writes again (the pre-multi-ring default).

Parameters:: data_type -- Logical cache identifier (e.g. "jobs").

CLI Example:

salt-run cluster.route_clear data_type=jobs

salt.runners.cluster.route_set(data_type, ring)#

Route a data type to a named ring.

Fires a cluster/runner/route_set event; the publish daemon proposes a ROUTE entry through the cluster Raft group. Once committed, gate sites in salt.master consult the routing table when they receive a write for data_type and defer to that ring's HashRing.owns() answer.

Parameters:

data_type -- Logical cache identifier (e.g. "jobs").
ring -- Ring name to route to (must have been created via ring_create()).

CLI Example:

salt-run cluster.route_set data_type=jobs ring=jobs

salt.runners.cluster.routes()#

Return the cluster-log data-type -> ring routing table as this master sees it.

Reads the persisted cluster Raft log and replays every committed ROUTE entry through a fresh RoutingStateMachine. Same caveats as rings(): a follower's view may briefly lag the leader during a routing change.

Output:

{
    "node_id":         str,
    "routes":          {"<data_type>": "<ring_id>" | None, ...},
    "routing_version": int,           # log index of last commit, -1
    "drop_stats":      {              # see ring_membership.drop_stats
        "<data_type>": {
            "ring_id":           str,
            "not_a_member":      int,
            "other_ring_member": int,
        },
        ...
    },
}

The drop_stats field is local-process only — it reflects what this master has gated since startup. not_a_member is the misconfig signal: a non-zero count means traffic for the named data type landed on a master that isn't in the routed ring (the load balancer probably needs adjusting).

Note: the runner subprocess and the publish daemon are separate processes with their own counter state, so this surface reflects the runner's view, not the daemon's. For an operational signal use grep "ring_membership: dropping" in the master log.

CLI Example:

salt-run cluster.routes

New in version 3009.0.

salt.runners.cluster.shed_status()#

Read this master's local cluster-shed-status.json sentinel, if any.

The sentinel is written by the master daemon whenever it runs a local shed (either operator-triggered cluster.shed_unowned, or a peer-triggered fan-out via cluster.shed_unowned_all). Operators check this file cluster-wide to confirm shed completed on every master.

Returns {"status": "missing"} when no sentinel has been written yet — typical on a master that has never run shed.

CLI Example:

salt-run cluster.shed_status

salt.runners.cluster.shed_unowned(ring, banks=('jobs/loads', 'jobs/minions', 'jobs/endtimes', 'jobs/nocache'), subbank_template='jobs/returns/{key}', driver=None, dry_run=False)#

Drop cache entries this master does not own for the named ring.

The migration "going in" runner. After cluster.ring_create/route_set have wired ring into the routing table and the per-ring Raft group has elected a leader, every master still has the full keyspace in its caches (a legacy of the pre-multi-ring broadcast era). This runner walks the configured cache banks on this master and deletes the entries that hash to other ring members.

Parameters:

ring -- Ring identifier whose voter set defines ownership.
banks -- Cache banks to scan. Defaults match the salt.returners.salt_cache job layout (jobs/loads is the primary JID index; the others are sibling banks keyed by JID). Operators routing other caches override.
subbank_template -- Optional str.format-able template. When set, for each unowned key found in the first banks entry the runner also flushes the templated bank in its entirety — used for the salt_cache returner's per-JID returns bank ("jobs/returns/{key}"). Pass None for caches without sub-banks.
driver -- Optional override for the salt.cache.Cache driver. Defaults to the cache: opt — the same driver the returner writes through.
dry_run -- If True, compute the counts but don't flush anything. Use to preview the partition before committing.

Returns a structured result:

{
    "status":           "ok" | "skipped",
    "ring":             str,
    "dropped":          int,   # primary-bank entries flushed
    "kept":             int,   # primary-bank entries this master owns
    "subbanks_dropped": int,   # cascade banks flushed wholesale
    "dry_run":          bool,
}

Reads membership from local persisted Raft state (same path cluster.members already uses) so the runner subprocess can answer "what does the ring look like?" without IPC into the publish daemon.

CLI Examples:

# Preview which JIDs would be dropped on this master.
salt-run cluster.shed_unowned ring=jobs dry_run=True

# Commit the deletions on the default jobs/* banks.
salt-run cluster.shed_unowned ring=jobs

# Shard a different cache type (the keys/denied-keys banks
# are intentionally broadcast and should NOT be sharded; this
# example assumes the operator has built a routed
# ``inventory`` cache).
salt-run cluster.shed_unowned ring=inventory \
    banks='["inventory/items"]' subbank_template=None

salt.runners.cluster.shed_unowned_all(ring, banks=('jobs/loads', 'jobs/minions', 'jobs/endtimes', 'jobs/nocache'), subbank_template='jobs/returns/{key}', driver=None, dry_run=False)#

Fan-out shed_unowned() across every master in the cluster.

The single-master shed_unowned() runner drops the local master's unowned cache entries. For a complete migration the operator has to run that on every ring member — error-prone and verbose for clusters with more than three or four masters. This runner solves the operator UX:

Fires a cluster/runner/shed_unowned_all event from the runner subprocess on this master.
The publish daemon intercepts the event, broadcasts a cluster/peer/shed-request event (cluster_aes-encrypted) to every peer carrying the runner's parameters.
Each peer's daemon intercepts the request and runs the same shed-unowned logic locally, writing a per-master sentinel at cachedir/cluster-shed-status.json so the operator can poll for results without tailing logs.
The originator also runs its own local shed inline so the runner returns with a useful result even before peer sentinels appear.

Parameters:

ring -- Ring identifier whose voter set defines ownership. Same shape as shed_unowned().
banks -- Cache banks to scan; defaults to the salt_cache jobs layout.
subbank_template -- Cascade bank template; defaults to "jobs/returns/{key}". Pass None to disable the cascade.
driver -- Optional salt.cache.Cache driver override. Defaults to the cache: opt.
dry_run -- When True, runs the partition preview on every master without committing.

Returns the same shape as shed_unowned() for this master's local pass, plus a fan_out field naming the cluster/peer/shed-request event that fanned to peers. Per-peer results land in their own sentinel files; operators can collect them with cluster.shed_status.

CLI Example:

# Preview shed across every master in the cluster.
salt-run cluster.shed_unowned_all ring=jobs dry_run=True

# Commit shed across every master.
salt-run cluster.shed_unowned_all ring=jobs

salt.runners.cluster.sync_roots(roots='both')#

Push this master's file_roots and/or pillar_roots to every other cluster master.

Runs the operator-driven counterpart of the bulk state-sync that fires automatically during a cluster join. Use it when the canonical content on this master has changed and you want every peer to pick up the new files without restarting them or waiting for the next join handshake.

The runner fires a local event; the master daemon picks it up and fans out chunks to every peer over the encrypted cluster pub bus (same transport as the join-time state-sync). Returns immediately after the event is fired — the actual sync runs asynchronously in the master process. Check each peer's master log for the state-sync ... installed N items lines to confirm delivery.

Parameters:: roots -- "file", "pillar", or "both" (default "both"). Selects which content trees to sync.

CLI Example:

# Push both file_roots and pillar_roots to all peers
salt-run cluster.sync_roots

# Push only file_roots
salt-run cluster.sync_roots roots=file

New in version 3009.0.

Table of Contents

salt.runners.cluster#