Reproducibility Policy

Closes phase-00 issue #7. Pinned by the regression batteries in tests/core/test_seed_policy.py, tests/core/test_deterministic_replay.py, and tests/core/test_execution_cache.py.

macroforecast v0.1 promises that the same recipe produces the same artifacts bit-for-bit, on the same machine and across machines that share the package version + dependency lockfile. This page documents what “reproducible” means in practice, what knobs control it, and what is deliberately out of scope.

Public API

import macroforecast

# Run any recipe (inline YAML, dict, or Path).
result = macroforecast.run("recipe.yaml", output_directory="out/")

# Re-execute the stored manifest and verify per-cell sink hashes match.
replication = macroforecast.replicate("out/manifest.json")
assert replication.recipe_match
assert replication.sink_hashes_match

Seed-policy modes (L0)

The L0 layer’s reproducibility_mode axis selects one of two regimes:

Mode	When	Seed source	Best for
`seeded_reproducible` (default)	every run is a deterministic replay	`0_meta.leaf_config.random_seed` (default `0`)	paper replication, regression tests, multi-cell sweeps
`exploratory`	seed is left to whatever process state happens to be	none	one-off interactive runs where determinism doesn’t matter

strict and any other unknown value are rejected by the L0 schema validator. Pass random_seed explicitly when you want a non-zero base.

0_meta:
  fixed_axes:
    reproducibility_mode: seeded_reproducible
  leaf_config:
    random_seed: 42

_resolve_seed(recipe_root) returns:

the explicit leaf_config.random_seed if present,
0 for the default seeded_reproducible mode,
None for exploratory (or any other non-seeded_reproducible value).

What `_apply_seed` actually seeds

A best-effort propagation that covers every RNG macroforecast or its dependencies are likely to touch:

Library	Call
Python `random` module	`random.seed(seed)`
NumPy global state	`np.random.seed(seed % 2**32)`
Process env (hash-seed-sensitive iteration)	`os.environ.setdefault("PYTHONHASHSEED", str(seed))`
PyTorch (when installed)	`torch.manual_seed(seed)` + `torch.cuda.manual_seed_all(seed)`

scikit-learn estimators receive random_state=seed_int from the L4 recipe params (_build_l4_model) – the global numpy seed isn’t enough for sklearn because most estimators capture random_state=None and call check_random_state once. Pin random_state per estimator if you need deterministic ensembles.

Cell-index seed schedule

A multi-cell sweep is not run with the same seed in every cell. The sweep loop applies base_seed + (cell_index - 1) so:

Cell 1 uses random_seed.
Cell 2 uses random_seed + 1.
… cell N uses random_seed + N - 1.

This means two cells of the same recipe with different {sweep: [...]} values produce different RNG streams (bug-catching: see test_distinct_cells_get_distinct_seeds), but a re-run of the same sweep produces identical streams cell-by-cell.

Bit-exact replicate

macroforecast.replicate(manifest_path) reads the stored manifest, expands the same sweep, and re-executes every cell. The returned ReplicationResult carries:

recipe_match: bool – the canonicalized recipe dict round-trips identically (key order, sweep marker placement, etc.).
sink_hashes_match: bool – every cell’s per-sink SHA-256 matches the original.
per_cell_match: dict[str, bool] – per-cell breakdown.

Two sinks are exempt from the strict equality check because they legitimately encode environmental data:

l1_data_definition_v1 – carries leaf_config.cache_root which depends on the local filesystem layout.
l8_artifacts_v1 – records the absolute paths of exported files.

The other eight sinks (L1 regime, L2, L3 features + metadata, L4 forecasts + models + training, L5 evaluation, plus L6 / L7 / L8 outputs when produced) are byte-equal across runs.

Shared raw cache (`cache_root`)

Multi-cell sweeps that hit the same FRED vintage many times share the on-disk raw cache when you pass cache_root=:

macroforecast.run(
    "recipe.yaml",
    output_directory="out/sweep_a",
    cache_root="/var/macroforecast/raw_cache",
)

Resolution order (first non-None wins):

The explicit cache_root= argument.
recipe['1_data']['leaf_config']['cache_root'] (recipe-level override).
output_directory / ".raw_cache" (auto-derived).
The raw loader’s package default.

The effective value is recorded in manifest.json[ "cache_root"] and on the ManifestExecutionResult.cache_root attribute, so a follow-up run or a downstream auditor can verify exactly which cache backed the artifacts.

Determinism boundaries

Boundary	Guarantee	Caveats
Two re-runs of the same recipe in the same Python session	byte-identical sinks (excluding `l1_data_definition_v1` + `l8_artifacts_v1` when output paths differ)	–
Two re-runs in different processes with the same package + lockfile	byte-identical sinks (validated by `tests/core/test_v01_1_hot_patch.py::test_hash_sink_with_set_payload_is_stable_across_processes`, which rotates `PYTHONHASHSEED`)	–
`compute_mode = parallel` cell loop	byte-identical sinks vs. serial run for the same cells (validated by `tests/core/test_compute_mode_parallel.py::test_parallel_matches_serial_sink_hashes`)	`l8_artifacts_v1` legitimately differs because of output paths
Across machines with the same package version + lockfile	numerical equality at machine epsilon	floating-point summation order across BLAS implementations can drift on the last bit
Across `xgboost` / `lightgbm` / `catboost` versions	best-effort	C++ trees are sensitive to library upgrades; pin via the lockfile
Deep-NN families (`lstm` / `gru` / `transformer`)	seeded (we call `torch.manual_seed`) but not guaranteed bit-exact across torch versions or CUDA driver versions	install `torch[cpu]` for tighter portability
Across `shap` versions or with `shap` not installed	best-effort	the L7 SHAP path falls back to a coefficient / permutation proxy when `shap` is missing; the proxy is itself deterministic

Worked examples

Single-path recipe -> identical artifacts twice

import macroforecast
from pathlib import Path

a = macroforecast.run("recipe.yaml", output_directory=Path("out/a"))
b = macroforecast.run("recipe.yaml", output_directory=Path("out/b"))

# Every cell's sink hashes match (excluding path-dependent l1, l8).
for left, right in zip(a.cells, b.cells):
    for sink_name in left.sink_hashes:
        if sink_name in {"l1_data_definition_v1", "l8_artifacts_v1"}:
            continue
        assert left.sink_hashes[sink_name] == right.sink_hashes[sink_name]

Sweep variant ID -> distinct seed

recipe = """
0_meta:
  fixed_axes: {reproducibility_mode: seeded_reproducible}
  leaf_config: {random_seed: 100}
3_feature_engineering:
  nodes:
    - {id: lag_x, type: step, op: lag, params: {n_lag: {sweep: [1, 2, 3, 4]}}, ...}
"""
result = macroforecast.run(recipe)
# Cells get seeds 100, 101, 102, 103.

Replicate the manifest

import macroforecast

primary = macroforecast.run("paper_recipe.yaml", output_directory="paper_out/")
replication = macroforecast.replicate("paper_out/manifest.json")
assert replication.sink_hashes_match

Out of scope

GPU determinism beyond torch.manual_seed. Set torch.use_deterministic_algorithms(True) and the relevant cuDNN flags yourself if you need bit-exact CUDA output – that is a platform-specific decision.
Reproducibility across BLAS implementations (OpenBLAS vs. MKL vs. Apple Accelerate). The L4 estimators are deterministic given fixed parameters, but floating-point reductions are not associative.
Reproducibility across Python versions. The package targets python>=3.10; minor versions are tested in CI but cross-version hash equality is not guaranteed.

Related tests

Test file	Pins
`tests/core/test_seed_policy.py`	`_resolve_seed`, `_apply_seed` contract
`tests/core/test_deterministic_replay.py`	identical recipe twice -> identical sinks + byte-identical CSVs
`tests/core/test_execution_cache.py`	`cache_root` precedence + shared cache + independence
`tests/core/test_v01_1_hot_patch.py`	`set` hashing across `PYTHONHASHSEED`
`tests/core/test_compute_mode_parallel.py`	parallel run matches serial run
`tests/core/test_execute_recipe_dispatch.py`	str-vs-Path dispatch + deprecation