Forecast-Time And Source-Frame Policies (1.5)

Declares how forecast-time information and source-frame quality are handled before Layer 2 sees the data. Data revision / vintage status is controlled by information_set_type in the forecast-time information page. This page covers publication lag, same-period x access, raw-source missing/outlier treatment before T-codes, and frame availability after the Layer 1 source frame exists.

Section	axis	Role
1.5.1	`missing_availability`	Frame Availability Policy: what to do when predictor / target rows contain NaN after the Layer 1 source frame exists
1.5.2	`raw_missing_policy`	Whether to repair raw-source missing values before FRED transforms/T-codes
1.5.3	`raw_outlier_policy`	Whether to repair raw-source outliers before FRED transforms/T-codes
1.5.4	`release_lag_rule`	Publication Lag Rule: when predictor observations are treated as published and usable
1.5.5	`contemporaneous_x_rule`	Same-Period Predictor Rule: whether x observed at the target date may enter the model

Note on dropped axes:

alignment_rule — mixed-frequency calendar axis; meaningful mainly for FRED-SD. Current runtime uses explicit monthly/quarterly conversion plus provenance reports, Layer 2 native-frequency block payloads, custom mixed-frequency adapters, and narrow built-in MIDAS routes (midas_almon, midasr with nealmon / almonp / nbeta / genexp / harstep). State-space mixed-frequency likelihoods remain future.
evaluation_scale — re-homed to Layer 2 (PreprocessContract.evaluation_scale) where the actual runtime effect lives.
exogenous_block — redundant with feature_builder default logic.
regime_task — duplicates 1.3 oos_period.recession_only_oos / expansion_only_oos.
vintage_policy — dropped as a separate axis. Current data revision / vintage control is handled by information_set_type plus leaf_config.data_vintage, including FRED-SD vintages.
x_map_policy — single-op non-axis; multi-target X mapping is owned by study_scope (0.2). At a glance (defaults):
missing_availability = zero_fill_leading_predictor_gaps — Frame Availability Policy. After the selected sample period is sliced, predictor leading missing values before each column’s first valid observation are filled with zero and recorded in provenance. Switch to require_complete_rows, keep_available_rows, or impute_predictors_only only when a specific missing-data treatment matters.
raw_missing_policy = preserve_raw_missing — leave raw-source missing values unchanged before FRED transforms/T-codes. Switch only when the research design intentionally cleans raw data before T-code construction.
raw_outlier_policy = preserve_raw_outliers — leave raw-source outliers unchanged before FRED transforms/T-codes. Switch only when the research design intentionally clips or flags raw data before T-code construction.
release_lag_rule = ignore_release_lag — Publication Lag Rule. Every column is available at its nominal date. Switch to fixed_lag_all_series / series_specific_lag when you need to simulate a publication lag.
contemporaneous_x_rule = forbid_same_period_predictors — Same-Period Predictor Rule. Realistic real-time constraint. Switch to allow_same_period_predictors only for oracle / data-leak benchmarks.

Most research runs leave all five at the default.

1.5.1 `missing_availability`

Frame Availability Policy. Selects how NaN rows are handled after the source frame exists and before Layer 2 representation construction. Four operational values.

Value catalog

Value	Status	What it does
`zero_fill_leading_predictor_gaps`	operational	Default. Within the selected sample period, predictor leading missing values are filled with 0. Fully missing predictors are also filled with 0 and warned. Target leading missing dates are reported; target mid-sample missing blocks execution.
`require_complete_rows`	operational	No panel-level filter; downstream executors handle NaNs per their own policy.
`keep_available_rows`	operational	Drop rows where any non-date column has NaN before training. Aggressive but legitimate on short fixture windows.
`impute_predictors_only`	operational	Impute predictor (non-target) columns using `leaf_config.x_imputation` ∈ {`mean`, `median`, `ffill`, `bfill`}. Target column retains NaNs so the OOS loop still sees target missingness.

Functions & features

Sample-period availability path: macroforecast.execution.build._apply_sample_period_and_availability(raw_result, recipe, *, target) implements zero_fill_leading_predictor_gaps and records data_reports["availability"].
General missing policy path: macroforecast.execution.build._apply_missing_availability(raw_result, rule, *, target, spec).
Called during dataset loading in execute_recipe after official transforms have produced the selected frame and before researcher preprocessing runs.
Compile guard: impute_predictors_only without valid leaf_config.x_imputation raises CompileValidationError.

Dropped values

target_date_drop_if_missing, real_time_missing_as_missing, state_space_fill, factor_fill, em_fill — complex / niche imputation strategies; v1.1+.

Recipe usage

# Forward-fill predictor columns, keep target NaNs visible
path:
  1_data_task:
    fixed_axes:
      missing_availability: impute_predictors_only
    leaf_config:
      x_imputation: ffill

1.5.2 `raw_missing_policy`

Selects raw-source missing treatment before FRED transforms/T-codes. Four operational values.

Value catalog

Value	Status	What it does
`preserve_raw_missing`	operational	Default. Leave raw-source missing values untouched before FRED transforms/T-codes.
`zero_fill_leading_predictor_missing_before_tcode`	operational	Within the selected sample period, fill predictor leading missing values with 0 before FRED transforms/T-codes.
`impute_raw_predictors`	operational	Impute raw predictor columns before FRED transforms/T-codes using `leaf_config.raw_x_imputation` in {`mean`, `median`, `ffill`, `bfill`}.
`drop_raw_missing_rows`	operational	Drop rows with any raw-source missing value before FRED transforms/T-codes. Aggressive; use only for explicit full-mode designs.

Functions & features

Runtime path: macroforecast.execution.build._apply_raw_missing_policy(raw_result, rule, *, target, spec).
Called before macroforecast.execution.build._apply_tcode_preprocessing(...), so any changes affect T-code construction.
Compile guard: impute_raw_predictors without valid leaf_config.raw_x_imputation raises CompileValidationError.
Provenance: runtime records data_reports["raw_missing"] with before_official_transform: true.

Recipe usage

# Clean raw predictors before official FRED T-codes are applied
path:
  1_data_task:
    fixed_axes:
      raw_missing_policy: impute_raw_predictors
    leaf_config:
      raw_x_imputation: ffill

1.5.3 `raw_outlier_policy`

Selects raw-source outlier treatment before FRED transforms/T-codes. Six operational values.

Value catalog

Value	Status	What it does
`preserve_raw_outliers`	operational	Default. Leave raw-source outliers untouched before FRED transforms/T-codes.
`winsorize_raw`	operational	Clip raw numeric columns at the 1st and 99th percentiles.
`iqr_clip_raw`	operational	Clip raw numeric columns by 1.5 IQR fences.
`mad_clip_raw`	operational	Clip raw numeric columns by 3 MAD fences.
`zscore_clip_raw`	operational	Clip raw numeric columns by 3 standard deviations.
`set_raw_outliers_to_missing`	operational	Convert values outside the 1st and 99th percentiles to missing before FRED transforms/T-codes.

Functions & features

Runtime path: macroforecast.execution.build._apply_raw_outlier_policy(raw_result, rule, *, spec).
Called before macroforecast.execution.build._apply_tcode_preprocessing(...), so any changes affect T-code construction.
Optional column subset: leaf_config.raw_outlier_columns. If omitted, all raw numeric non-date columns are eligible.
Provenance: runtime records data_reports["raw_outliers"] with before_official_transform: true.

Recipe usage

# Clip selected raw columns before official FRED T-codes are applied
path:
  1_data_task:
    fixed_axes:
      raw_outlier_policy: iqr_clip_raw
    leaf_config:
      raw_outlier_columns: [INDPRO, RPI]

1.5.4 `release_lag_rule`

Publication Lag Rule. Selects publication-lag policy for predictor shifts. Three operational values.

Value catalog

Value	Status	What it does
`ignore_release_lag`	operational	Default, no-op. Every column is available at its nominal date.
`fixed_lag_all_series`	operational	Shift every non-date column by 1 period (uniform 1-month publication lag).
`series_specific_lag`	operational	Per-column shift declared via `leaf_config.release_lag_per_series: dict[str, int]`. Columns absent from the dict are left untouched.

Functions & features

Module: macroforecast.execution.build._apply_release_lag(raw_result, rule, *, spec).
Compile guard: series_specific_lag without a non-empty leaf_config.release_lag_per_series dict raises CompileValidationError.

Dropped values

calendar_exact_lag, lag_conservative, lag_aggressive — pure duplicates (observable behaviour identical to other values).

Recipe usage

# Typical FRED-MD release convention: most series lag 1 month, UNRATE lags 0
path:
  1_data_task:
    fixed_axes:
      release_lag_rule: series_specific_lag
    leaf_config:
      release_lag_per_series:
        INDPRO: 1
        CPIAUCSL: 1
        UNRATE: 0

Moved Out Of Layer 1

structural_break_segmentation is now a Layer 2 representation/feature-block decision. It augments the model input with break dummies, so it no longer belongs to the FRED data-frame task. For user-supplied break dates, use Layer 2 deterministic_components=break_dummies with leaf_config.break_dates.

1.5.5 `contemporaneous_x_rule`

Same-Period Predictor Rule. Selects whether x observed at the target date may enter the model. Two operational values.

Value catalog

Value	Status	What it does
`forbid_same_period_predictors`	operational	Default. `X_pred` is taken at the forecast origin `t` (no contemporaneous observation of y_{t+h}). Realistic real-time forecasting.
`allow_same_period_predictors`	operational	`X_pred` is taken at the target date `t+h`, aligned with `y_{t+h}`. Oracle / data-leak benchmark used in some comparisons.

Functions & features

Wired inside macroforecast.execution.build._build_raw_panel_training_data — the axis value selects how X_train and X_pred align with the target.
Applies to raw-panel recipes only (target_lag_features uses target lags, so the axis is irrelevant there).

Recipe usage

# Oracle contemporaneous-X benchmark
path:
  1_data_task:
    fixed_axes:
      contemporaneous_x_rule: allow_same_period_predictors
  3_feature_engineering:
    nodes:
      - {id: src_x, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: predictors}}}
      - {id: src_y, type: source, selector: {layer_ref: l2, sink_name: l2_clean_panel_v1, subset: {role: target}}}
      - {id: y_h, type: step, op: target_construction, params: {horizon: 1}, inputs: [src_y]}
    sinks:
      l3_features_v1: {X_final: src_x, y_final: y_h}