Reproducibility Policy

Closes phase-00 issue #7. Pinned by the regression batteries in tests/core/test_seed_policy.py, tests/core/test_deterministic_replay.py, and tests/core/test_execution_cache.py.

macroforecast v0.1 promises that the same recipe produces the same artifacts bit-for-bit, on the same machine and across machines that share the package version + dependency lockfile. This page documents what “reproducible” means in practice, what knobs control it, and what is deliberately out of scope.

Public API

import macroforecast

# Run any recipe (inline YAML, dict, or Path).
result = macroforecast.run("recipe.yaml", output_directory="out/")

# Re-execute the stored manifest and verify per-cell sink hashes match.
replication = macroforecast.replicate("out/manifest.json")
assert replication.recipe_match
assert replication.sink_hashes_match

Seed-policy modes (L0)

The L0 layer’s reproducibility_mode axis selects one of two regimes:

Mode

When

Seed source

Best for

seeded_reproducible (default)

every run is a deterministic replay

0_meta.leaf_config.random_seed (default 0)

paper replication, regression tests, multi-cell sweeps

exploratory

seed is left to whatever process state happens to be

none

one-off interactive runs where determinism doesn’t matter

strict and any other unknown value are rejected by the L0 schema validator. Pass random_seed explicitly when you want a non-zero base.

0_meta:
  fixed_axes:
    reproducibility_mode: seeded_reproducible
  leaf_config:
    random_seed: 42

_resolve_seed(recipe_root) returns:

  • the explicit leaf_config.random_seed if present,

  • 0 for the default seeded_reproducible mode,

  • None for exploratory (or any other non-seeded_reproducible value).

What _apply_seed actually seeds

A best-effort propagation that covers every RNG macroforecast or its dependencies are likely to touch:

Library

Call

Python random module

random.seed(seed)

NumPy global state

np.random.seed(seed % 2**32)

Process env (hash-seed-sensitive iteration)

os.environ.setdefault("PYTHONHASHSEED", str(seed))

PyTorch (when installed)

torch.manual_seed(seed) + torch.cuda.manual_seed_all(seed)

scikit-learn estimators receive random_state=seed_int from the L4 recipe params (_build_l4_model) – the global numpy seed isn’t enough for sklearn because most estimators capture random_state=None and call check_random_state once. Pin random_state per estimator if you need deterministic ensembles.

Cell-index seed schedule

A multi-cell sweep is not run with the same seed in every cell. The sweep loop applies base_seed + (cell_index - 1) so:

  • Cell 1 uses random_seed.

  • Cell 2 uses random_seed + 1.

  • … cell N uses random_seed + N - 1.

This means two cells of the same recipe with different {sweep: [...]} values produce different RNG streams (bug-catching: see test_distinct_cells_get_distinct_seeds), but a re-run of the same sweep produces identical streams cell-by-cell.

Bit-exact replicate

macroforecast.replicate(manifest_path) reads the stored manifest, expands the same sweep, and re-executes every cell. The returned ReplicationResult carries:

  • recipe_match: bool – the canonicalized recipe dict round-trips identically (key order, sweep marker placement, etc.).

  • sink_hashes_match: bool – every cell’s per-sink SHA-256 matches the original.

  • per_cell_match: dict[str, bool] – per-cell breakdown.

Two sinks are exempt from the strict equality check because they legitimately encode environmental data:

  • l1_data_definition_v1 – carries leaf_config.cache_root which depends on the local filesystem layout.

  • l8_artifacts_v1 – records the absolute paths of exported files.

The other eight sinks (L1 regime, L2, L3 features + metadata, L4 forecasts + models + training, L5 evaluation, plus L6 / L7 / L8 outputs when produced) are byte-equal across runs.

Shared raw cache (cache_root)

Multi-cell sweeps that hit the same FRED vintage many times share the on-disk raw cache when you pass cache_root=:

macroforecast.run(
    "recipe.yaml",
    output_directory="out/sweep_a",
    cache_root="/var/macroforecast/raw_cache",
)

Resolution order (first non-None wins):

  1. The explicit cache_root= argument.

  2. recipe['1_data']['leaf_config']['cache_root'] (recipe-level override).

  3. output_directory / ".raw_cache" (auto-derived).

  4. The raw loader’s package default.

The effective value is recorded in manifest.json[ "cache_root"] and on the ManifestExecutionResult.cache_root attribute, so a follow-up run or a downstream auditor can verify exactly which cache backed the artifacts.

Determinism boundaries

Boundary

Guarantee

Caveats

Two re-runs of the same recipe in the same Python session

byte-identical sinks (excluding l1_data_definition_v1 + l8_artifacts_v1 when output paths differ)

Two re-runs in different processes with the same package + lockfile

byte-identical sinks (validated by tests/core/test_v01_1_hot_patch.py::test_hash_sink_with_set_payload_is_stable_across_processes, which rotates PYTHONHASHSEED)

compute_mode = parallel cell loop

byte-identical sinks vs. serial run for the same cells (validated by tests/core/test_compute_mode_parallel.py::test_parallel_matches_serial_sink_hashes)

l8_artifacts_v1 legitimately differs because of output paths

Across machines with the same package version + lockfile

numerical equality at machine epsilon

floating-point summation order across BLAS implementations can drift on the last bit

Across xgboost / lightgbm / catboost versions

best-effort

C++ trees are sensitive to library upgrades; pin via the lockfile

Deep-NN families (lstm / gru / transformer)

seeded (we call torch.manual_seed) but not guaranteed bit-exact across torch versions or CUDA driver versions

install torch[cpu] for tighter portability

Across shap versions or with shap not installed

best-effort

the L7 SHAP path falls back to a coefficient / permutation proxy when shap is missing; the proxy is itself deterministic

Worked examples

Single-path recipe -> identical artifacts twice

import macroforecast
from pathlib import Path

a = macroforecast.run("recipe.yaml", output_directory=Path("out/a"))
b = macroforecast.run("recipe.yaml", output_directory=Path("out/b"))

# Every cell's sink hashes match (excluding path-dependent l1, l8).
for left, right in zip(a.cells, b.cells):
    for sink_name in left.sink_hashes:
        if sink_name in {"l1_data_definition_v1", "l8_artifacts_v1"}:
            continue
        assert left.sink_hashes[sink_name] == right.sink_hashes[sink_name]

Sweep variant ID -> distinct seed

recipe = """
0_meta:
  fixed_axes: {reproducibility_mode: seeded_reproducible}
  leaf_config: {random_seed: 100}
3_feature_engineering:
  nodes:
    - {id: lag_x, type: step, op: lag, params: {n_lag: {sweep: [1, 2, 3, 4]}}, ...}
"""
result = macroforecast.run(recipe)
# Cells get seeds 100, 101, 102, 103.

Replicate the manifest

import macroforecast

primary = macroforecast.run("paper_recipe.yaml", output_directory="paper_out/")
replication = macroforecast.replicate("paper_out/manifest.json")
assert replication.sink_hashes_match

Out of scope

  • GPU determinism beyond torch.manual_seed. Set torch.use_deterministic_algorithms(True) and the relevant cuDNN flags yourself if you need bit-exact CUDA output – that is a platform-specific decision.

  • Reproducibility across BLAS implementations (OpenBLAS vs. MKL vs. Apple Accelerate). The L4 estimators are deterministic given fixed parameters, but floating-point reductions are not associative.

  • Reproducibility across Python versions. The package targets python>=3.10; minor versions are tested in CI but cross-version hash equality is not guaranteed.