Reproducibility Policy
Closes phase-00 issue #7. Pinned by the regression batteries in
tests/core/test_seed_policy.py,tests/core/test_deterministic_replay.py, andtests/core/test_execution_cache.py.
macroforecast v0.1 promises that the same recipe produces the same artifacts bit-for-bit, on the same machine and across machines that share the package version + dependency lockfile. This page documents what “reproducible” means in practice, what knobs control it, and what is deliberately out of scope.
Public API
import macroforecast
# Run any recipe (inline YAML, dict, or Path).
result = macroforecast.run("recipe.yaml", output_directory="out/")
# Re-execute the stored manifest and verify per-cell sink hashes match.
replication = macroforecast.replicate("out/manifest.json")
assert replication.recipe_match
assert replication.sink_hashes_match
Seed-policy modes (L0)
The L0 layer’s reproducibility_mode axis selects one of two regimes:
Mode |
When |
Seed source |
Best for |
|---|---|---|---|
|
every run is a deterministic replay |
|
paper replication, regression tests, multi-cell sweeps |
|
seed is left to whatever process state happens to be |
none |
one-off interactive runs where determinism doesn’t matter |
strict and any other unknown value are rejected by the L0 schema
validator. Pass random_seed explicitly when you want a non-zero base.
0_meta:
fixed_axes:
reproducibility_mode: seeded_reproducible
leaf_config:
random_seed: 42
_resolve_seed(recipe_root) returns:
the explicit
leaf_config.random_seedif present,0for the defaultseeded_reproduciblemode,Noneforexploratory(or any other non-seeded_reproduciblevalue).
What _apply_seed actually seeds
A best-effort propagation that covers every RNG macroforecast or its dependencies are likely to touch:
Library |
Call |
|---|---|
Python |
|
NumPy global state |
|
Process env (hash-seed-sensitive iteration) |
|
PyTorch (when installed) |
|
scikit-learn estimators receive random_state=seed_int from the L4
recipe params (_build_l4_model) – the global numpy seed isn’t enough
for sklearn because most estimators capture random_state=None and call
check_random_state once. Pin random_state per estimator if you need
deterministic ensembles.
Cell-index seed schedule
A multi-cell sweep is not run with the same seed in every cell. The
sweep loop applies base_seed + (cell_index - 1) so:
Cell 1 uses
random_seed.Cell 2 uses
random_seed + 1.… cell N uses
random_seed + N - 1.
This means two cells of the same recipe with different {sweep: [...]}
values produce different RNG streams (bug-catching: see
test_distinct_cells_get_distinct_seeds), but a re-run of the same
sweep produces identical streams cell-by-cell.
Bit-exact replicate
macroforecast.replicate(manifest_path) reads the stored manifest, expands
the same sweep, and re-executes every cell. The returned
ReplicationResult carries:
recipe_match: bool– the canonicalized recipe dict round-trips identically (key order, sweep marker placement, etc.).sink_hashes_match: bool– every cell’s per-sink SHA-256 matches the original.per_cell_match: dict[str, bool]– per-cell breakdown.
Two sinks are exempt from the strict equality check because they legitimately encode environmental data:
l1_data_definition_v1– carriesleaf_config.cache_rootwhich depends on the local filesystem layout.l8_artifacts_v1– records the absolute paths of exported files.
The other eight sinks (L1 regime, L2, L3 features + metadata, L4 forecasts + models + training, L5 evaluation, plus L6 / L7 / L8 outputs when produced) are byte-equal across runs.
Determinism boundaries
Boundary |
Guarantee |
Caveats |
|---|---|---|
Two re-runs of the same recipe in the same Python session |
byte-identical sinks (excluding |
– |
Two re-runs in different processes with the same package + lockfile |
byte-identical sinks (validated by |
– |
|
byte-identical sinks vs. serial run for the same cells (validated by |
|
Across machines with the same package version + lockfile |
numerical equality at machine epsilon |
floating-point summation order across BLAS implementations can drift on the last bit |
Across |
best-effort |
C++ trees are sensitive to library upgrades; pin via the lockfile |
Deep-NN families ( |
seeded (we call |
install |
Across |
best-effort |
the L7 SHAP path falls back to a coefficient / permutation proxy when |
Worked examples
Single-path recipe -> identical artifacts twice
import macroforecast
from pathlib import Path
a = macroforecast.run("recipe.yaml", output_directory=Path("out/a"))
b = macroforecast.run("recipe.yaml", output_directory=Path("out/b"))
# Every cell's sink hashes match (excluding path-dependent l1, l8).
for left, right in zip(a.cells, b.cells):
for sink_name in left.sink_hashes:
if sink_name in {"l1_data_definition_v1", "l8_artifacts_v1"}:
continue
assert left.sink_hashes[sink_name] == right.sink_hashes[sink_name]
Sweep variant ID -> distinct seed
recipe = """
0_meta:
fixed_axes: {reproducibility_mode: seeded_reproducible}
leaf_config: {random_seed: 100}
3_feature_engineering:
nodes:
- {id: lag_x, type: step, op: lag, params: {n_lag: {sweep: [1, 2, 3, 4]}}, ...}
"""
result = macroforecast.run(recipe)
# Cells get seeds 100, 101, 102, 103.
Replicate the manifest
import macroforecast
primary = macroforecast.run("paper_recipe.yaml", output_directory="paper_out/")
replication = macroforecast.replicate("paper_out/manifest.json")
assert replication.sink_hashes_match
Out of scope
GPU determinism beyond
torch.manual_seed. Settorch.use_deterministic_algorithms(True)and the relevant cuDNN flags yourself if you need bit-exact CUDA output – that is a platform-specific decision.Reproducibility across BLAS implementations (OpenBLAS vs. MKL vs. Apple Accelerate). The L4 estimators are deterministic given fixed parameters, but floating-point reductions are not associative.
Reproducibility across Python versions. The package targets
python>=3.10; minor versions are tested in CI but cross-version hash equality is not guaranteed.