`failure_policy`

Back to L0 | Browse all axes | Browse all options

Axis failure_policy on sub-layer l0_a (layer l0).

Sub-layer

l0_a

Axis metadata

Default: 'fail_fast'
Sweepable: False
Status: operational

Operational status summary

Operational: 2 option(s)
Future: 0 option(s)

Options

`fail_fast` – operational

Stop the entire study on the first cell that errors.

When the cell-loop catches an exception in any sweep cell, fail_fast raises immediately and the manifest is not written. The remaining cells are skipped.

This is the default because the typical authoring failure mode is a schema or data error that affects every cell – catching it after the first cell saves wall-clock and surfaces the problem with a single traceback rather than a wall of identical errors. For sweeps where cells can fail independently (e.g., one model family throws on a particular target while others succeed), use continue_on_failure instead so partial results survive.

When to use

Default for every authoring iteration. Pick this while the recipe is still being tuned; the first failure tells you exactly what to fix without waiting for a full sweep to finish.

When NOT to use

Long-running production sweeps where a transient failure on one cell (e.g., a memory hiccup on one bootstrap iteration) should not abort the whole study.

References

macroforecast design Part 1, L0 §A: ‘fail_fast vs continue_on_failure is the canonical execution-policy choice for any cell-loop study.’

Related options: continue_on_failure

Examples

Author-time recipe (default)

0_meta:
  fixed_axes:
    failure_policy: fail_fast

Last reviewed 2026-05-04 by macroforecast author.

`continue_on_failure` – operational

Record failed cells in the manifest and keep the sweep running.

Per-cell exceptions are caught by the cell loop, the cell’s CellExecutionResult.error and traceback fields are populated, and the loop moves on to the next cell. The manifest’s cells_summary distinguishes succeeded from failed cells; the failed-cell entries carry the captured traceback for post-hoc diagnosis.

Replication still runs end-to-end on a manifest with failed cells: replicate() re-executes every cell and verifies the failure occurs in the same place with the same exception class.

When to use

Production horse-race sweeps where partial coverage is more useful than no coverage. Common examples: a 50-cell model-family sweep where one optional family (xgboost without the extra) fails to import, or a long bootstrap where a single iteration trips a numerical edge case.

When NOT to use

Authoring iteration – failures are usually configuration problems that affect every cell, and fail_fast shortens the feedback loop.

References

macroforecast design Part 1, L0 §A: ‘continue_on_failure preserves partial coverage; the manifest carries enough context to diagnose each failed cell after the run.’

Related options: fail_fast

Examples

Production sweep over many model families

0_meta:
  fixed_axes:
    failure_policy: continue_on_failure