FRED-SD
Real-time U.S. state-level macroeconomic panel maintained by the Federal Reserve Bank of St. Louis. Provides per-state variables (labor market, production, housing) with vintage history, enabling real-time forecasting exercises at the state level. Loaded via macroforecast.load_fred_sd() when path.1_data_task.fixed_axes.dataset == "fred_sd".
FRED-SD differs structurally from FRED-MD / FRED-QD: the file format is an Excel workbook, and the panel mixes monthly and quarterly series — a user-facing complication discussed in v1.0 limitations below.
What macroforecast downloads
Current vintage: macroforecast reads the official FRED-SD landing page and selects the latest Data by Series workbook link, for example
.../fred-sd/series/series-YYYY-MM.xlsx. This is the workbook layout the loader expects: tabs are variables, columns are states.Historical vintage: macroforecast first tries the direct Data by Series workbook URL,
.../fred-sd/series/series-YYYY-MM.xlsx. If a vintage is only distributed inside an official by-series archive, it falls back to the appropriatefredsd_byseries_*.zipfile and extracts the requested workbook.
The workbook is laid out with one sheet per variable; within each sheet, columns are states (50 states plus DC) and rows are observations indexed by date. The macroforecast loader (macroforecast/raw/datasets/fred_sd.py) accepts two selectors:
load_fred_sd(
vintage=None,
states=None, # list of state abbreviations, e.g. ["CA", "TX", "NY"]
variables=None, # list of variable names (sheet names)
)
When both selectors are None, the loader ingests every sheet and every state, producing a wide panel with column names of the form {variable}_{state} (e.g., PAYEMS_CA).
Structure — ~28 variables per state
Per the 2022 paper, FRED-SD provides approximately 28 variables per state, grouped into three broad categories:
Labor market — payroll employment (
PAYEMS-analogues), unemployment rate, labor force participation, average weekly hours.Production / output — state personal income, coincident activity index, industrial production proxies where available.
Housing — housing starts, permits, house price indices.
Exact variable list and state coverage at a given vintage: see the data appendix on the St. Louis Fed site. macroforecast does not redistribute the per-variable metadata.
Mixed frequency — the core structural issue
Unlike FRED-MD (purely monthly) or FRED-QD (purely quarterly), FRED-SD mixes series at different native frequencies:
Monthly series: labor market indicators (unemployment, payroll employment), housing starts / permits.
Quarterly series: state GDP, state personal income (only from certain BEA releases), some productivity measures.
Within a single workbook, the monthly series have 12 observations per year while the quarterly series have 4. macroforecast now separates this into two explicit decisions:
Layer 1 reports and optionally gates the selected native-frequency mix through
fred_sd_frequency_report_v1andfred_sd_frequency_policy.Layer 2 chooses how the selected FRED-SD panel enters the representation through
fred_sd_mixed_frequency_representation.
The built-in estimator set is intentionally narrow. The executable surface is
a Layer 2 native-frequency block payload plus Layer 3 registered-custom-model
adapter routes and package-owned direct MIDAS baselines: midas_almon,
midasr with nealmon, almonp, nbeta, genexp, or harstep weights,
and the legacy midasr_nealmon alias. Full state-space mixed-frequency
likelihoods and regularized group MIDAS remain research-extension work.
Real-time vintage discipline
FRED-SD is explicitly designed as a real-time database — each .xlsx file is the dataset as known at a specific publication month. This matters for studies that want to avoid contaminating forecasts with data revisions:
information_set_type: real_time_vintage+leaf_config.data_vintage: "2022-03"loads the March-2022 vintage exactly.information_set_type: final_revised_data(default) loads the latest official Data by Series workbook discovered from the St. Louis Fed landing page.
The authors emphasise in the 2022 paper that some state-level series see substantial revisions (especially GDP by state), so real-time studies on FRED-SD generally should pin a vintage.
State and variable selection
The loader supports direct source selectors:
from macroforecast import load_fred_sd
result = load_fred_sd(states=["CA", "TX", "NY", "FL"], variables=["PAYEMS"])
Recipe/runtime selection is available through Layer 1 and the simple API:
import macroforecast as mf
exp = (
mf.Experiment(
dataset="fred_md+fred_sd",
target="INDPRO",
start="1985-01",
end="2019-12",
horizons=[1, 3, 6],
)
.use_fred_sd_selection(states=["CA", "TX"], variables=["UR", "BPPRIVSA"])
)
Built-in groups are also Layer 1 source-load selectors:
exp = (
mf.Experiment(
dataset="fred_md+fred_sd",
target="INDPRO",
start="1985-01",
end="2019-12",
horizons=[1, 3, 6],
)
.use_fred_sd_groups(
state_group="census_region_west",
variable_group="labor_market_core",
)
)
FRED-SD has two different variable concepts:
sd_variable_selectionchooses workbook sheets before loading, vialeaf_config.sd_variables.fred_sd_variable_groupis a recipe-level shortcut that resolves tosd_variable_selection=selected_sd_variablesbefore loading.variable_universeremains the generic post-load column universe filter, after columns have canonicalVARIABLE_STATEnames.
The built-in state groups are Census regions and divisions plus
contiguous_48_plus_dc. Full recipes may also use
fred_sd_state_group=custom_state_group with either
leaf_config.sd_state_group_members or
leaf_config.sd_state_groups + leaf_config.sd_state_group_name.
Built-in SD-variable groups are:
labor_market_core:ICLAIMS,LF,NA,PARTRATE,URemployment_sector:CONS,FIRE,GOVT,INFO,MFG,MFGHRS,MINNG,PSERVgsp_output: aggregate and sector GSP/output variableshousing:BPPRIVSA,RENTS,STHPItrade:EXPORTS,IMPORTSincome:OTOTt-code-review groups:
direct_analog_high_confidence,provisional_analog_medium,semantic_review_outputs,no_reliable_analog
Use fred_sd_variable_group=custom_sd_variable_group with
leaf_config.sd_variable_group_members or
leaf_config.sd_variable_groups + leaf_config.sd_variable_group_name for
paper-specific bundles. Group selectors and explicit list selectors are
mutually exclusive for the same side; use one source of truth per state or
variable selection.
Series metadata contract
FRED-SD runs write fred_sd_series_metadata.json when the selected dataset
includes FRED-SD. The contract version is fred_sd_series_metadata_v1 and its
owner is Layer 1. It records one row per loaded FRED-SD column:
canonical column name,
sd_variable, state, and source sheetinferred native frequency (
monthly,quarterly,annual,irregular, orunknown) from the non-missing observation calendarobserved start/end dates and non-missing observation count
selected states, selected SD variables, and native-frequency counts
For composite datasets, the same contract is stored from the FRED-SD component
before generic post-load variable_universe filtering.
Frequency composition report
FRED-SD runs also write fred_sd_frequency_report.json. The contract version
is fred_sd_frequency_report_v1 and its owner is Layer 1. It is derived from
fred_sd_series_metadata_v1 and makes the selected native-frequency mix
explicit before Layer 2 chooses any post-frame representation.
The report records:
native_frequency_countsandknown_native_frequency_countsfrequency_status:empty,unknown_only,single_frequency,single_frequency_with_unknown,mixed_frequency, ormixed_frequency_with_unknownhas_monthly_quarterly_mixrequires_mixed_frequency_decisionfrequency counts by state and by SD variable
This report feeds the Layer 1 fred_sd_frequency_policy axis. The default is
report_only, which records the selected panel without changing execution.
Full recipes can instead choose:
allow_mixed_frequency: explicitly accept mixed or unknown native frequencies.reject_mixed_known_frequency: block selected panels with more than one known native frequency, such as monthly plus quarterly.require_single_known_frequency: require exactly one known native frequency and no unknown-frequency selected series.
The policy runs before Layer 2 representation construction. Use the strict policy for same-frequency FRED-SD studies where quarterly-vs-monthly conversion should not be silently folded into the downstream representation.
Layer 2 mixed-frequency representation
After Layer 1 has selected the FRED-SD series and written the frequency report,
Layer 2 can shape the panel with
fred_sd_mixed_frequency_representation:
Value |
Runtime behavior |
|---|---|
|
Default. Keep the selected FRED-SD columns on the recipe target calendar after generic monthly/quarterly conversion. This preserves the current behavior and records a Layer 2 report. |
|
Drop selected FRED-SD columns whose inferred native frequency is |
|
Keep only selected FRED-SD columns whose inferred native frequency matches the recipe frequency ( |
|
Operational narrow. Preserve the calendar-aligned frame and emit |
|
Operational narrow. Emit the native-frequency block payload and |
Simple API:
exp = (
mf.Experiment(
dataset="fred_md+fred_sd",
target="INDPRO",
start="1985-01",
end="2019-12",
horizons=[1],
)
.use_fred_sd_groups(variable_group="labor_market_core")
.use_fred_sd_mixed_frequency_representation("drop_non_target_native_frequency")
)
Runtime writes fred_sd_mixed_frequency_representation.json with owner
2_preprocessing, the selected policy, target frequency, kept/dropped FRED-SD
columns, and dropped counts by native frequency. Non-FRED-SD columns in a
composite dataset are preserved. A FRED-SD target column is never silently
dropped; runtime raises an execution error if the selected representation would
remove it.
Advanced block/adapter choices are deliberately narrow. They require:
datasetincludesfred_sdfeature_builder="raw_feature_panel"forecast_type="direct"model_familyis a registered custom model,midas_almon,midasr, ormidasr_nealmon
The custom model receives the regular tabular X_train, y_train, X_test
plus context["auxiliary_payloads"]. For native_frequency_block_payload,
that context includes fred_sd_native_frequency_block_payload with
blocks, block_order, and column_to_native_frequency. For
mixed_frequency_model_adapter, it also includes
fred_sd_mixed_frequency_model_adapter, whose current adapter kind is
registered_custom_model. See
examples/custom_fred_sd_mixed_frequency_model.py for an executable custom
model template and
examples/recipes/templates/fred-sd-custom-mixed-frequency-model.yaml for the
matching recipe skeleton. Import the Python module that registers the callable
before running a YAML recipe that selects the custom model_family.
model_family="midas_almon" is a narrow built-in direct executor for users who
want a package-owned MIDAS-style baseline without adding an R dependency. It
builds Almon polynomial distributed-lag basis columns from the raw-panel FRED-SD
predictors, forward-fills native lower-frequency columns within the available
calendar, and fits a ridge-regularized direct forecast. Tunable fields are
midas_max_lag (default 3), midas_almon_degree (default 2), and
midas_alpha (default 1.0) in the training/leaf config. This is not a full
midasr-style nonlinear MIDAS implementation and does not replace custom
research adapters.
model_family="midasr" is the Python parity surface against the R midasr
restricted-weight grammar. It consumes the same FRED-SD native-frequency
block/adapter route, constructs raw lag tensors from the selected predictors,
and estimates a restricted direct MIDAS forecast by nonlinear least squares.
The runtime midasr_weight_family choices are:
|
Runtime status |
Meaning |
|---|---|---|
|
operational narrow |
Normalized exponential Almon weights; compatibility alias |
|
operational narrow |
Raw polynomial Almon weights. |
|
operational narrow |
Normalized beta weights with three parameters. |
|
operational narrow |
Generalized exponential weights with four parameters. |
|
operational narrow |
HAR(3)-RV step weights with three parameters and exactly 20 lags. |
Tunable fields are midas_max_lag (default 3), midasr_nealmon_degree
(default 2 for nealmon), midasr_almonp_degree (default 2 for
almonp), midasr_max_terms (default 12), and midasr_max_nfev (default
500). nbeta, genexp, and harstep use fixed parameter widths rather than
a polynomial degree. harstep requires midas_max_lag=20; the compiler uses
20 as the default when harstep is selected and no explicit lag is supplied.
Use Layer 2 state/variable/feature selection before this route when the raw
panel is wide; the NLS slice is meant for explicit research
specifications, not blind high-dimensional grids.
import macroforecast as mf
@mf.custom_model("my_fred_sd_mf_model")
def my_fred_sd_mf_model(X_train, y_train, X_test, context):
blocks = context["auxiliary_payloads"]["fred_sd_native_frequency_block_payload"]
monthly_columns = blocks["blocks"].get("monthly", {}).get("columns", [])
quarterly_columns = blocks["blocks"].get("quarterly", {}).get("columns", [])
# Fit a research model using X_train plus the block metadata.
return float(y_train[-1])
exp = (
mf.Experiment(
dataset="fred_sd",
target="UR_CA",
start="2000-01",
end="2020-12",
horizons=[1],
frequency="monthly",
feature_builder="raw_feature_panel",
model_family="my_fred_sd_mf_model",
)
.use_fred_sd_selection(states=["CA"], variables=["UR", "NQGSP"])
.use_fred_sd_mixed_frequency_adapter()
)
Built-in MIDAS-style route:
exp = (
mf.Experiment(
dataset="fred_sd",
target="UR_CA",
start="2000-01",
end="2020-12",
horizons=[1],
frequency="monthly",
model_family="midas_almon",
feature_builder="raw_feature_panel",
)
.use_fred_sd_selection(states=["CA", "TX"], variables=["UR", "NQGSP"])
.use_fred_sd_mixed_frequency_adapter()
)
R midasr-style weight-family route:
exp = (
mf.Experiment(
dataset="fred_sd",
target="UR_CA",
start="2000-01",
end="2020-12",
horizons=[1],
frequency="monthly",
model_family="midasr",
feature_builder="raw_feature_panel",
benchmark_config={"minimum_train_size": 60, "rolling_window_size": 60},
)
.sweep({"midasr_weight_family": "almonp"})
.use_fred_sd_selection(states=["CA", "TX"], variables=["UR", "NQGSP"])
.use_fred_sd_mixed_frequency_adapter()
)
To sweep all package-owned R midasr weight branches, use:
exp = exp.sweep(
{"midasr_weight_family": ["nealmon", "almonp", "nbeta", "genexp", "harstep"]}
)
Do not force midas_max_lag=3 in that sweep; the harstep branch needs 20
lags and the compiler will supply 20 when the branch is selected.
The adapter variant uses .use_fred_sd_mixed_frequency_adapter(). The package
now supplies the enforced Layer 2 payload, Layer 3 custom-adapter route, and
built-in direct MIDAS baselines. State-space estimators and regularized group
MIDAS remain future extensions.
Changes from the 2020 working paper to current
Compared with FRED-MD / FRED-QD the FRED-SD maintenance history is shorter (first release late 2020):
Coverage expansion — the initial release focused on the subset of states for which all three category groups were available in real-time; subsequent vintages added state-series coverage as Fed / BEA / BLS backfilled missing vintage history.
Methodology refinement — the authors’ methodology for constructing real-time coincident activity indices evolved between the 2020 working paper and the 2022 published version; the data appendix on the St. Louis Fed site tracks the exact method currently in use.
Series additions — a small number of series have been added (e.g., certain housing-quality indices for states where BLS released new data). Always consult the live data appendix for the current variable roster.
Loader behaviour — things to know
Excel parsing via
openpyxl. Each sheet is read independently;pd.read_excel(..., sheet_name=None)returnsdict[str, DataFrame]and the loader concatenates wide-form.Cache: same mechanism as FRED-MD / FRED-QD (
~/.cache/macroforecast/raw/).support_tier = “stable” on the returned
RawDatasetMetadata— the live/vintage loader, selectors, metadata report, and FRED-SD t-code policy surface are package-owned. Mixed-frequency estimator families still report their own operational-narrow status.Series metadata — runtime runs that include FRED-SD write
fred_sd_series_metadata.json, which makes the selected state/variable panel and native-frequency mix auditable.Frequency report — runtime also writes
fred_sd_frequency_report.json, which reduces the selected panel to a Layer 1 frequency-composition contract for downstream policy decisions.Mixed-frequency representation report — runtime writes
fred_sd_mixed_frequency_representation.jsonfor FRED-SD runs; this is the Layer 2 panel-shaping contract consumed before t-code preprocessing.No T-code row — the FRED-SD workbook does not encode stationarity codes per variable the way FRED-MD / FRED-QD do. FRED-SD transformation codes are therefore a research decision, not source metadata.
T-code policy choices — state panels create a real choice between national-analog t-codes, one empirically selected code per SD variable, or independent state-by-series codes. The default is no FRED-SD t-code. The reviewed national-analog map is opt-in via
Experiment.use_sd_inferred_tcodes(). The empirical variable-global map is opt-in viaExperiment.use_sd_empirical_tcodes(unit="variable_global"). State-by-series empirical codes require an explicit column map viaExperiment.use_sd_empirical_tcodes(unit="state_series", code_map={...}). All three recordofficial=falsein runtime reports.
Known limitations in macroforecast v1.0
Built-in mixed-frequency estimators are narrow — monthly-to-quarterly and quarterly-to-monthly conversion, strict same-frequency filtering, unknown-frequency filtering, native-frequency block payloads, custom adapter routes, the built-in
midas_almondirect Almon-lag baseline,midasrwithnealmon/almonp/nbeta/genexp/harstep, and the compatibilitymidasr_nealmonrestricted-NLS slice are available. Regularized group MIDAS and state-space nowcasting estimators remain future work.State and SD-variable groups are recipe-level selectors —
fred_sd_state_groupandfred_sd_variable_groupresolve into explicitsd_states/sd_variablesbefore loading. They are not post-loadvariable_universefilters.Generic
variable_universeis post-load — usesd_variable_selectionfor workbook-sheet selection andvariable_universefor loadedVARIABLE_STATEcolumns.No official T-code row —
official_transform_policy: apply_official_tcodehas no FRED-SD workbook T-code row to consume. FRED-SD inferred T-codes are macroforecast research metadata and must be opted into separately.Mixed-frequency estimator scope is narrow —
support_tieris stable for FRED-SD loading and metadata, but advanced mixed-frequency estimators remain operational-narrow unless explicitly documented as built-in runtime families.
See also
Source & Frame (1.1) —
dataset/information_set_type/frequencyaxis interaction.Horizon & evaluation window (1.3) — how mixed-frequency panels interact with horizon / OOS structure.