Skip to content

Add ChaisemartinDHaultfoeuille (dCDH) DID_M estimator (Phase 1)#290

Open
igerber wants to merge 13 commits intomainfrom
dcdh-estimator
Open

Add ChaisemartinDHaultfoeuille (dCDH) DID_M estimator (Phase 1)#290
igerber wants to merge 13 commits intomainfrom
dcdh-estimator

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 11, 2026

Summary

  • Adds ChaisemartinDHaultfoeuille (alias DCDH) — Phase 1 of the de Chaisemartin-D'Haultfœuille estimator family. The only modern staggered DiD estimator in the library that handles non-absorbing (reversible) treatments — treatment can switch on AND off over time (marketing campaigns, seasonal promotions, on/off policy cycles, binary fuzzy designs).
  • Implements DID_M from de Chaisemartin & D'Haultfœuille (2020) AER, equivalently DID_1 (horizon l = 1) of the dynamic companion paper (NBER WP 29873).
  • Forward-compatible API: Phase 2 (multi-horizon event study via aggregate/L_max) and Phase 3 (covariate adjustment via controls, group-specific linear trends, HonestDiD) parameters present from day one with NotImplementedError gates pointing at the relevant phase.
  • Ships joiners-only (DID_+) and leavers-only (DID_-) decompositions, single-lag placebo (DID_M^pl) point estimate, optional multiplier bootstrap (Rademacher / Mammen / Webb), TWFE decomposition diagnostic from Theorem 1 of AER 2020 (per-cell weights, fraction negative, sigma_fe, beta_fe), plus a standalone twowayfeweights() helper.
  • Adds generate_reversible_did_data() generator with 7 patterns (single_switch default, joiners_only, leavers_only, mixed_single_switch, random, cycles, marketing).

Methodology references

  • Method: dCDH DID_M estimator (= DID_1 at horizon l = 1 of the dynamic paper)
  • Sources:
    • de Chaisemartin, C. & D'Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. American Economic Review, 110(9), 2964–2996. https://doi.org/10.1257/aer.20181169
    • de Chaisemartin, C. & D'Haultfœuille, X. (2022, revised 2024). Difference-in-Differences Estimators of Intertemporal Treatment Effects. NBER Working Paper 29873. https://www.nber.org/papers/w29873
    • Three paper reviews under docs/methodology/papers/chaisemartin-dhaultfoeuille-{2020-review,2020-appendix-review,dynamic-review}.md (committed in earlier work; the dynamic-paper review is the source for the cohort-recentered analytical variance from Web Appendix Section 3.7.3)
  • Reference implementation parity: R DIDmultiplegtDYN v2.3.3 (CRAN, maintained by the paper authors)

Intentional deviations from R DIDmultiplegtDYN (documented as **Note (deviation from R):** in REGISTRY.md)

Stable-control set definition. Python uses period-based stable definitions (stable_0(t) = any cell with D_{g,t-1} = D_{g,t} = 0, regardless of baseline D_{g,1}), matching the AER 2020 Theorem 3 cell-count notation N_{0,0,t} and N_{1,1,t} literally. R uses cohort-based definitions that additionally require D_{g,1} to match the side, matching the dynamic companion paper's cohort (D_{g,1}, F_g, S_g) framework. The two definitions agree exactly under any of: (a) the panel is pure-direction (joiners-only or leavers-only), (b) no joiner's post-switch state ever overlaps a period when leavers are switching (and vice versa), or (c) the worked example. They diverge by O(1%) on mixed-direction panels with both joiners and leavers when some joiners' post-switch cells could serve as leavers' controls (or vice versa). Parity tests use 1e-4 tolerance for pure-direction scenarios and 2.5% tolerance for mixed-direction scenarios, with the deviation explicitly documented in docs/methodology/REGISTRY.md.

Phase 1 placebo SE. Intentionally NaN with a UserWarning. The dynamic companion paper Section 3.7.3 derives the cohort-recentered analytical variance for DID_l only — not for the placebo DID_M^pl. Phase 2 will add multiplier-bootstrap support for the placebo via the dynamic paper's machinery; until then, placebo_se, placebo_t_stat, placebo_p_value, and placebo_conf_int remain NaN even when n_bootstrap > 0. The placebo point estimate is still computed and exposed via results.placebo_effect. A regression test (test_placebo_bootstrap_unavailable_in_phase_1) pins this contract.

Assumption 11 zero-retention convention. When some period t has joiners but no stable_0 controls (or leavers but no stable_1 controls), Python sets that period's DID_{+,t} (or DID_{-,t}) to zero but retains the switcher count in the N_S denominator, matching the AER 2020 Theorem 3 worked-example arithmetic. R drops the cohort entirely. Per-period flags (did_plus_t_a11_zeroed, did_minus_t_a11_zeroed) and a consolidated UserWarning make this visible to users. The R parity scenarios add never-treated and always-treated control cohorts so neither implementation triggers the convention, eliminating this as a source of disagreement in the parity tests.

Documented matches with R (worth noting because they're substantive choices)

  • drop_larger_lower=True default — drops multi-switch groups before estimation (matches R; required for the analytical variance to be consistent with the point estimate)
  • Singleton-baseline filter (footnote 15 of dynamic paper) — drops groups whose D_{g,1} is unique
  • Cohort-recentered analytical variance from Web Appendix Section 3.7.3 of the dynamic paper (subtracts cohort-conditional means, not a grand mean — the load-bearing variance correctness check that has its own dedicated test)
  • Conservative CI under Assumption 8 (independent groups), exact only under iid sampling — documented as a **Note:** in REGISTRY.md

No silent failures

Every drop / round / fallback emits an explicit warnings.warn() or ValueError. Five distinct warnings/errors fire on the relevant code paths: NaN treatment validation, NaN outcome validation, placebo not-computable, TWFE diagnostic denom == 0, never-switching groups filter from variance, plus the multi-switch and singleton-baseline filters and the consolidated A11 warning.

Validation

Test files added

  • tests/test_chaisemartin_dhaultfoeuille.py — 44 tests covering basic API, forward-compat gates (one per Phase 2/3 parameter), drop_larger_lower filter, A11 zero-retention, NaN handling, bootstrap (Rademacher / Mammen / Webb), results dataclass, and the new test_placebo_bootstrap_unavailable_in_phase_1 regression test
  • tests/test_methodology_chaisemartin_dhaultfoeuille.py — 8 fast + 2 slow Tier 2 methodology tests including the hand-calculable 4-group worked example (asserts DID_M = 2.5, DID_+ = 2.0, DID_- = 3.0 exactly), test_cohort_recentering_not_grand_mean (the load-bearing variance correctness check that catches the Add initial diff-diff library implementation #1 implementation bug), TWFE diagnostic correctness, and large-N recovery (slow)
  • tests/test_chaisemartin_dhaultfoeuille_parity.py — 5 R DIDmultiplegtDYN parity scenarios at horizon l = 1: single_switch_mixed, joiners_only, leavers_only, mixed_single_switch, and the hand-calculable worked example
  • tests/test_prep_dgp_reversible.py — 38 tests for the new generate_reversible_did_data() generator (shape, patterns, reproducibility, true-effect column, switcher-type column, validation errors)

Test files modified

  • tests/conftest.py — added _check_r_dcdh_available(), r_dcdh_available session fixture, and require_r_dcdh per-test fixture parallel to the existing r_available / require_r pattern, with DIFF_DIFF_R=skip env-var override

R parity reference

  • benchmarks/R/generate_dcdh_dynr_test_values.R — mirror generator + 5 scenarios, generates golden values via did_multiplegt_dyn(..., effects=1) from R DIDmultiplegtDYN v2.3.3
  • benchmarks/data/dcdh_dynr_golden_values.json — committed golden values (~73 KB)
  • The R script also exercises a polars r-universe backend that DIDmultiplegtDYN >= 2.x requires
  • The Python parity test fixture skips cleanly when R or DIDmultiplegtDYN is unavailable, so the test suite still runs on dev machines without R installed

Documentation surface

Per docs/doc-deps.yaml for the new chaisemartin_dhaultfoeuille group:

  • docs/methodology/REGISTRY.md — full ## ChaisemartinDHaultfoeuille section with assumption checks, estimator equations, the four **Note:** blocks (conservative CI, drop_larger_lower, A11 zero-retention, singleton-baseline filter), the **Note (deviation from R DIDmultiplegtDYN):** block, edge cases, requirements checklist
  • docs/api/chaisemartin_dhaultfoeuille.rst — autoclass directives + autofunction for chaisemartin_dhaultfoeuille() and twowayfeweights(), with usage examples
  • docs/api/index.rst — toctree entry; docs/api/prep.rstgenerate_reversible_did_data autofunction
  • docs/choosing_estimator.rst — Quick Reference row, decision-flowchart question, Survey Compatibility Matrix row (-- in all four columns), Detailed Guidance subsection
  • docs/practitioner_decision_tree.rst — new "Reversible Treatment (On/Off Cycles)" section with business-language framing
  • docs/llms.txt and docs/llms-full.txt — bullet + ~100-line section
  • README.md — features bullet, alias table entry, full usage section
  • CHANGELOG.md### Added entry under ## [Unreleased]
  • ROADMAP.md — Phase 1 items (1a–1i) flipped from "Not started" to "Shipped"; Phase 2 / Phase 3 retained

Test count

96 dCDH tests passing locally (44 API + 8 methodology fast + 2 methodology slow [deselected by default] + 5 R parity + 38 generator – overlap = 95 deselectable + 1 placebo regression = 96 total). The hand-calculable 4-group worked example produces DID_M = 2.5 exactly matching R DIDmultiplegtDYN.

No tutorial in this PR

Per the multi-phase plan, a single comprehensive dCDH tutorial ships in Phase 3 after covariate support is added. Phase 1 deliberately ships no tutorial.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

igerber and others added 3 commits April 11, 2026 15:50
Phase 1 of the de Chaisemartin-D'Haultfoeuille estimator family — the only
modern staggered DiD estimator in the library that handles non-absorbing
(reversible) treatments. Treatment can switch on AND off over time
(marketing campaigns, seasonal promotions, on/off policy cycles).

Implements DID_M from de Chaisemartin & D'Haultfoeuille (2020) AER,
equivalently DID_1 (horizon l=1) of the dynamic companion paper
(NBER WP 29873).

Ships:
- Headline DID_M point estimate with cohort-recentered analytical SE
  from Web Appendix Section 3.7.3 of the dynamic companion paper
- Joiners-only (DID_+) and leavers-only (DID_-) decompositions
- Single-lag placebo DID_M^pl (Theorem 4 of AER 2020); analytical
  placebo SE deferred to Phase 2 (NaN with explicit warning)
- Optional multiplier bootstrap clustered at group level
  (Rademacher / Mammen / Webb)
- TWFE decomposition diagnostic from Theorem 1 of AER 2020 (per-cell
  weights, fraction negative, sigma_fe, beta_fe), exposed both as
  estimator field and standalone twowayfeweights() helper
- Multi-switch group filtering (drop_larger_lower=True default,
  matches R DIDmultiplegtDYN); singleton-baseline filter (footnote 15
  of dynamic paper); never-switching groups filter from variance —
  all with explicit warnings (no silent failures)
- Forward-compatible fit() signature: Phase 2 (multi-horizon event
  study, aggregate, L_max) and Phase 3 (covariate adjustment via
  controls, group-specific linear trends, HonestDiD) parameters
  raise NotImplementedError with phase pointers
- generate_reversible_did_data() generator with 7 patterns (single_switch,
  joiners_only, leavers_only, mixed_single_switch, random, cycles, marketing)
- Validated against R DIDmultiplegtDYN v2.3.3 at horizon l=1 via
  tests/test_chaisemartin_dhaultfoeuille_parity.py (5 scenarios)

Methodology notes documented in REGISTRY.md:
- Conservative CI under Assumption 8 (independent groups), exact only
  under iid sampling
- A11 zero-retention convention (paper-faithful, with consolidated warning)
- drop_larger_lower default and inconsistency warning when False
- Singleton-baseline filter (footnote 15)
- Phase 1 placebo SE intentionally NaN (deferred to Phase 2)
- Note (deviation from R DIDmultiplegtDYN): Python uses period-based
  stable-control sets matching AER 2020 Theorem 3 literally; R uses
  cohort-based controls from the dynamic paper. Agree exactly on
  pure-direction panels and the worked example; differ by ~1% on
  mixed-direction panels with both joiners and leavers.

Tests: 95 passing locally (44 API, 8 fast methodology + 2 slow Tier 2,
38 generator, 5 R parity). Hand-calculable 4-group worked example
produces DID_M = 2.5 exactly matching R.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s API

Methodology / inference:
- Bootstrap-updated overall/joiners/leavers inference now goes through
  safe_inference() instead of inline t_stat = effect / se. Matches the
  project anti-pattern rule and keeps NaN behavior consistent with the
  rest of the library.

Placebo doc/code alignment:
- Phase 1 placebo bootstrap is deferred to Phase 2 (the dynamic companion
  paper Section 3.7.3 derives the cohort-recentered analytical variance
  for DID_l only; the placebo's influence-function representation is not
  yet implemented). The bootstrap path covers DID_M, DID_+, and DID_-
  only.
- Updated the source warning text, the placebo_inputs comment, README,
  CHANGELOG, REGISTRY, choosing_estimator.rst, llms-full.txt to drop
  the misleading "set n_bootstrap > 0 for a placebo SE today" promise
  and state clearly that placebo inference fields stay NaN even with
  bootstrap enabled.
- Added test_placebo_bootstrap_unavailable_in_phase_1 to pin this
  contract so future contributors do not silently widen the bootstrap
  surface without wiring up the documented Phase 2 derivation.

twowayfeweights() public API:
- Added ValueError validation for missing outcome/group/time/treatment
  columns (was treatment-only).
- Updated README, docs/api/chaisemartin_dhaultfoeuille.rst, and
  docs/llms-full.txt examples to pass outcome="outcome" so the
  documented usage actually runs.

set_params():
- Re-runs the same __init__ validation rules so invalid parameter
  combinations cannot slip in after construction.

Tests:
- 96 passing locally (was 95). The new placebo test runs alongside
  TestBootstrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final doc inconsistency caught on re-review: practitioner_decision_tree.rst
still said "set n_bootstrap > 0 for a bootstrap-based placebo SE today" while
README, REGISTRY, choosing_estimator, and llms-full had been updated to
reflect that placebo bootstrap is deferred to Phase 2 and placebo inference
fields stay NaN even when n_bootstrap > 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fc881fb78d806e089150f0eba1d584886f845a5e


Overall Assessment

⚠️ Needs changes

This PR adds a new estimator and touches methodology-critical code paths, but I found three unmitigated P1 issues in the estimation/inference logic. Static review only: I was not able to run the new test suite in this shell because the local Python runtime here is missing project deps such as pandas.

Executive Summary

  • The Phase 1 DID_M / DID_M^pl implementation is using per-cell observation counts n_gt as Theorem 3/4 weights, while the new Methodology Registry defines N_{a,b,t} as counts of (g,t) cells. That silently changes the estimand on individual-level inputs with uneven cell sizes.
  • The analytical SE path does not implement the registry’s full U^G_g = (Lambda^G_{g,l=1} * Y_g).sum() construction; it explicitly substitutes a first-switch-only “Phase 1 simplification,” and the bootstrap reuses that same vector.
  • The singleton-baseline filter is applied to the estimation sample, although the registry documents it as a variance-stage filter. Under the documented period-based stable-control deviation, that can change DID_M, not just the variance.
  • The new methodology-reference files linked from the roadmap/doc-deps are not present in the repo, and the added tests do not actually validate SE parity or the claimed “cohort recentering vs grand mean” distinction.
  • I did not find additional security issues in the changed files.

Methodology

Code Quality

No additional material findings beyond the methodology issues above.

Performance

No material findings in the changed performance-sensitive paths beyond the correctness issues above.

Maintainability

No additional material findings beyond the documentation/test gaps below.

Tech Debt

No mitigating TODO.md entry covers the P1 issues above, so they are not deferrable under the project’s review policy.

Security

No material security findings in the changed files.

Documentation/Tests

  • Severity: P2. Impact: the PR adds methodology-reference links that do not resolve in this repo, and the new tests do not actually validate the claimed inference correctness. ROADMAP.md and docs/doc-deps.yaml now point reviewers to three docs/methodology/papers/chaisemartin-dhaultfoeuille-*.md files, but those files are not present in this checkout. Separately, the supposed “load-bearing” recentering test only asserts that overall_se > 0, and the R parity suite checks point estimates only, not SEs. That leaves the incorrect SE path above effectively unguarded. Concrete fix: commit the referenced methodology-review files, make doc-deps/path resolution fail CI when missing, turn test_cohort_recentering_not_grand_mean into a real λ-vs-grand-mean regression test, and add SE parity assertions for pure-direction scenarios. Refs: ROADMAP.md:L125-L129, docs/doc-deps.yaml:L261-L275, tests/test_methodology_chaisemartin_dhaultfoeuille.py:L162-L251, tests/test_chaisemartin_dhaultfoeuille_parity.py:L122-L168

Path to Approval

  1. Rework the Theorem 3/4 aggregation to use transition-cell counts rather than n_gt observation counts, or explicitly document and test a weighted estimand as a deviation.
  2. Replace the current first-switch-only SE shortcut with the full Lambda^G_{g,l=1} influence-function implementation from Section 3.7.3, and use that same vector for the bootstrap path.
  3. Move the singleton-baseline filter out of the point-estimate sample unless you also document it as a point-estimate deviation and show parity/hand-worked evidence for the altered estimator.
  4. Commit the missing methodology-review markdown files and strengthen the new tests so they actually fail on grand-mean recentering, wrong λ construction, and SE mismatches versus R on pure-direction designs.

Addresses three methodology issues in the dCDH Phase 1 implementation:

1. Cell-count weighting (P1): the per-period DID and placebo formulas
   were computing N_{a,b,t} as the sum of within-cell observation counts
   instead of the count of (g, t) cells. This silently changed the
   estimand on individual-level inputs with uneven cell sizes. Fixed by
   replacing n_curr[mask].sum() with int(mask.sum()) and using simple
   .mean() over cells instead of n_curr-weighted averages. The fix
   matches the AER 2020 Theorem 3 cell-count notation literally and
   matches R DIDmultiplegtDYN's default behavior on cell-aggregated
   input. Renamed the results field n_switcher_obs -> n_switcher_cells
   for accuracy. Test data uses balanced 1-obs-per-cell panels so the
   regression test_cell_count_weighting_unbalanced_input is needed to
   guard the unbalanced-input path.

2. Full Lambda^G_{g,l=1} influence function (P1): the analytical SE
   path was using a "first-switch only" shortcut where each switching
   group's U^G_g came from its own switch period only. This omitted the
   periods where switching groups serve as stable controls for other
   cohorts. Replaced with _compute_full_per_group_contributions, which
   sums role-weighted outcome differences (joiner +1, stable_0
   -n_10/n_00, leaver -1, stable_1 +n_01/n_11) across all periods. The
   bootstrap path inherits the corrected vectors automatically. SE
   parity vs R DIDmultiplegtDYN on pure-direction scenarios narrowed
   from ~18% to ~3%; pure-direction parity tests now assert SE within
   5% rtol.

3. Singleton-baseline filter at variance stage only (P1): footnote 15
   of the dynamic paper says singleton-baseline groups have no cohort
   peer for the cohort-recentered variance. The previous code applied
   the filter as a point-estimate filter (dropping the groups before
   the per-period DIDs), which under Python's documented period-based
   stable-control interpretation could change DID_M when a unique-
   baseline always-treated or never-treated group would otherwise serve
   as a valid stable control. Fixed by computing the singleton-baseline
   group set in fit() and passing it to the variance helper, which
   excludes those groups from the cohort enumeration and U vector
   only. The cell DataFrame retains them. Updated
   TestSingletonBaselineFilter to assert the variance-only semantics
   and the warning text.

Documentation:
- doc-deps.yaml and ROADMAP.md no longer link to gitignored
  paper review files (REGISTRY.md is the canonical source).
- REGISTRY.md updated with cell-count notation, the full IF formula,
  the variance-only singleton-filter scope, the new SE parity gap
  numbers (3% vs the prior 18%), and a clearer period-vs-cohort
  deviation note.

Tests:
- test_cohort_recentering_not_grand_mean now actually computes both
  formulas on the same data and asserts they differ materially. The
  prior version only checked overall_se > 0.
- test_cell_count_weighting_unbalanced_input pins the cell-count
  contract on a deliberately unbalanced individual-level panel.
- test_parity_joiners_only and test_parity_leavers_only now assert
  SE parity at 5% rtol (was previously skipped).
- 97 dCDH tests passing (96 + the new cell-count regression).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 0d70f4d0deac473176a1c6fd467e06eb7aa810e4


Overall Assessment

⚠️ Needs changes

Static review only: this shell is missing pytest, pandas, and numpy, so I could not run the new dCDH test suite locally. Assessment is based on two unmitigated P1 findings in methodology-critical paths.

Executive Summary

  • The three prior P1 blockers from the last review appear resolved: Theorem 3 now uses cell counts, the full Lambda^G_{g,l=1} influence-function path is present, and the singleton-baseline filter is now variance-only.
  • P1 [Newly identified]: the analytical variance path still assumes a complete baseline observation and observed adjacent treatment pairs, but the estimator accepts ragged panels; missing periods can be misread as switches or crash cohort-key construction.
  • P1 [Newly identified]: standalone twowayfeweights() bypasses the estimator’s documented validation contract and can silently drop/round malformed treatment/outcome cells.
  • P2: joiner/leaver metadata is mislabeled and partially miscomputed, so summaries and to_dataframe(level="joiners_leavers") can report the wrong sample-size quantities.
  • P2: the parity suite still skips without live R even though committed JSON golden values are already available, which weakens regression coverage for a methodology-critical estimator.

Methodology

Affected methods: dCDH DID_M / DID_+ / DID_- variance path from Web Appendix Section 3.7.3, and the standalone TWFE diagnostic from AER 2020 Theorem 1. Re-review note: the prior findings on cell-count weighting, full IF construction, and variance-only singleton-baseline filtering look addressed in diff_diff/chaisemartin_dhaultfoeuille.py:L511-L554, diff_diff/chaisemartin_dhaultfoeuille.py:L1088-L1096, and diff_diff/chaisemartin_dhaultfoeuille.py:L1237-L1346.

  • Severity: P1 [Newly identified]. Impact: the estimator now tolerates missing (group, time) cells in the point-estimate path, but the cohort/variance path still treats raw NaN entries in D_mat as if they were observed treatment states. D_mat is built with NaN gaps for missing cells, the point-estimate path correctly gates transitions on present, but _compute_cohort_recentered_inputs() ignores that guard and detects first switches from raw D_mat[g, t] != D_mat[g, t-1]; a group missing the first global period can also hit int(np.nan) when cohort keys are built. Step 7 also uses each group’s first observed row, not guaranteed D_{g,1}, for singleton-baseline identification. This can silently misclassify cohorts and distort overall_se / joiners_se / leavers_se, or fail with an opaque error on late-entry groups. Concrete fix: after building N_mat, either explicitly reject groups missing the first global period or drop them with a user-facing warning, and compute first_switch_idx / switch_direction only on adjacent observed pairs (N_mat[:, t] > 0 and N_mat[:, t-1] > 0). Add regression tests for a missing-baseline group and an interior-gap group. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:L524-L528, diff_diff/chaisemartin_dhaultfoeuille.py:L568-L583, diff_diff/chaisemartin_dhaultfoeuille.py:L1078-L1083, diff_diff/chaisemartin_dhaultfoeuille.py:L1443-L1467.

  • Severity: P1 [Newly identified]. Impact: the new public twowayfeweights() helper does not honor the dCDH validation rules documented in the registry and enforced in fit(). fit() explicitly rejects NaN treatment/outcome values, rejects non-binary treatment, and warns before majority-rounding within-cell mixed treatment. twowayfeweights() does none of that; it only coerces treatment numeric, then aggregates with mean()/count() and thresholds d_gt >= 0.5. That means NaN outcomes/treatments can be silently dropped by the aggregation, non-binary treatment can be silently thresholded into {0,1}, and within-cell varying treatment is rounded with no warning, yielding misleading Theorem 1 diagnostics. Concrete fix: factor the validation + cell-aggregation logic into a shared helper used by both fit() and twowayfeweights(), with identical NaN, binary-treatment, and within-cell-rounding behavior, and add tests for each case. Refs: docs/methodology/REGISTRY.md:L470-L473, docs/methodology/REGISTRY.md:L581-L583, diff_diff/chaisemartin_dhaultfoeuille.py:L392-L456, diff_diff/chaisemartin_dhaultfoeuille.py:L1765-L1809.

Code Quality

Performance

  • No material findings in the changed performance-sensitive paths.

Maintainability

  • Severity: P3. Impact: summary() still reports never-switching groups under “Groups dropped before estimation”, but the methodology registry now says n_groups_dropped_never_switching is backward-compat metadata and no longer represents an actual exclusion. That makes the results summary internally inconsistent with the documented estimator contract. Concrete fix: rename that block to something like “Group filter / metadata counts” and label the never-switching count as “reported, not dropped”. Refs: diff_diff/chaisemartin_dhaultfoeuille_results.py:L412-L429, docs/methodology/REGISTRY.md:L476-L476.

Tech Debt

  • No separate tech-debt finding, but TODO.md does not currently track either P1 above, so they are not mitigated under the project’s review rules. Ref: TODO.md:L51-L98.

Security

  • No material security findings in the changed files.

Documentation/Tests

  • Severity: P2. Impact: the parity tests already load committed JSON golden values, but every scenario still takes require_r_dcdh, so the whole regression suite skips when live R / DIDmultiplegtDYN is unavailable. That leaves the main methodology-regression checks inactive on ordinary Python-only environments even though the assertions themselves do not need R once the goldens are committed. Concrete fix: keep the R availability check only for the golden-value generation workflow, and let the JSON-backed parity assertions run whenever benchmarks/data/dcdh_dynr_golden_values.json exists. Refs: tests/test_chaisemartin_dhaultfoeuille_parity.py:L40-L58, tests/test_chaisemartin_dhaultfoeuille_parity.py:L122-L184, tests/conftest.py:L107-L178.

  • Severity: P3. Impact: the docs are internally inconsistent about the dCDH paper-review markdowns. ROADMAP.md now says the primary papers are not committed in-repo, while CHANGELOG.md says three dCDH paper reviews were committed under docs/methodology/papers/. Concrete fix: align the changelog with the roadmap, or add the promised review files. Refs: ROADMAP.md:L127-L129, CHANGELOG.md:L22-L22.

Path to Approval

  1. Make the Section 3.7.3 variance/cohort path robust to ragged panels: either reject groups missing the baseline period with a clear ValueError, or base baseline/switch detection on observed adjacent pairs from N_mat instead of raw NaN comparisons. Add regression tests for a late-entry group and for an interior missing period.
  2. Make twowayfeweights() reuse the same validation and cell-aggregation contract as fit(): reject NaN treatment/outcome, reject non-binary treatment before aggregation, and emit the same warning on within-cell mixed treatment. Add one test for each case.

Addresses two new P1s, two P2s, and two P3s from CI re-review:

P1: ragged panel handling
- Add fit() Step 5a validation: reject groups missing the first global
  period with a clear ValueError listing offenders, and drop groups with
  interior period gaps with an explicit UserWarning. Eliminates the
  silent NaN propagation in _compute_cohort_recentered_inputs that
  could crash the cohort enumeration with int(np.nan) or misclassify
  late-entry groups' baselines.
- Fix singleton-baseline computation in fit() Step 7 to read the
  validated first global period explicitly instead of using
  groupby.first() which returned the first OBSERVED row per group.
- Add defensive presence-gating in _compute_cohort_recentered_inputs:
  the helper now refuses to run if N_mat[:, 0] has any zero entries
  (a fit() validation regression tripwire), and the first-switch
  detection loop only counts transitions between adjacent OBSERVED
  periods. Even with fit() validation in place, the helper is now
  safe to call directly.
- Add 2 regression tests: test_missing_baseline_period_raises_value_error
  and test_interior_gap_drops_group_with_warning.

P1: twowayfeweights validation refactor
- Extract _validate_and_aggregate_to_cells helper enforcing the dCDH
  validation contract (NaN treatment / NaN outcome / non-binary
  treatment all raise ValueError; within-cell varying treatment
  emits UserWarning before majority rounding).
- Both fit() and twowayfeweights() now call the helper. Single source
  of truth for the validation rules, no drift between the two public
  entry points.
- Add 4 regression tests for twowayfeweights() validation:
  test_twowayfeweights_rejects_nan_treatment,
  test_twowayfeweights_rejects_nan_outcome,
  test_twowayfeweights_rejects_non_binary_treatment,
  test_twowayfeweights_warns_on_within_cell_rounding.

P2: joiner/leaver metadata
- Fix n_joiner_cells = int(n_10_t_arr.sum()) (was count_nonzero counting
  PERIODS, not cells). Same for n_leaver_cells.
- Compute n_joiner_obs and n_leaver_obs as actual observation counts
  (sum of n_gt over the joiner/leaver cells across periods), not as
  cell totals. For balanced one-obs-per-cell panels they equal
  n_*_cells; for individual-level inputs with multiple obs per cell
  they can be larger. Update results dataclass docstrings.

P2: parity tests run on JSON without R
- Decouple golden_values fixture from require_r_dcdh. Tests now run
  whenever the JSON file exists. R is only needed to regenerate the
  JSON via benchmarks/R/generate_dcdh_dynr_test_values.R.
- Verified by running DIFF_DIFF_R=skip pytest tests/test_chaisemartin_dhaultfoeuille_parity.py
  — all 5 parity scenarios PASS (was previously skipping entirely).

P3: summary() label rename
- Rename "Groups dropped before estimation:" to "Group filter /
  metadata counts:". Label never-switching as "(reported, not dropped)".
  Reflects the Round 2 change where never-switching groups participate
  in the variance via stable-control roles.

P3: CHANGELOG/ROADMAP consistency
- Remove the CHANGELOG bullet that claimed three paper review files
  were committed under docs/methodology/papers/. Replace with a
  reference to REGISTRY.md as the canonical methodology surface
  (matching the ROADMAP wording).

Tests: 103 dCDH passing (97 + 6 new). Worked example DID_M = 2.5 still
exact. Pure-direction R parity tests still pass at 1e-4 / 5% rtol.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: df7a0a389e78d3b82eeb3455948a9adc8d74ca12


Overall Assessment

✅ Looks good

Static review only: I could not run the new dCDH suite in this environment because pytest, numpy, and pandas are not installed.

Executive Summary

Methodology

Methodology cross-check: the PR’s Phase 1 framing is consistent with the cited source material and with the registry’s current-scope definition for DID_M / DID_1; the previously flagged methodology blockers noted above also appear resolved in the implementation. (aeaweb.org)

Code Quality

  • Severity: P2. Impact: to_dataframe(level="joiners_leavers") exposes one n_obs column with different units by row: DID_M uses switcher-cell count, while DID_+ / DID_- use raw observation counts summed over switching cells. That makes the table easy to misread programmatically and breaks the semantic contract of a single count field. Concrete fix: split this into separate n_cells and n_obs columns, or rename the current field so all rows use the same unit consistently. Refs: diff_diff/chaisemartin_dhaultfoeuille_results.py:L645, diff_diff/chaisemartin_dhaultfoeuille.py:L796.

Performance

  • No material findings in the changed performance-sensitive paths.

Maintainability

  • No material maintainability finding beyond the documentation/contract drift noted below.

Tech Debt

  • No separate tech-debt blocker in this diff. The remaining issues are low-severity contract/documentation cleanup, not silent correctness bugs.

Security

  • No material security findings in the changed files. I did not see secrets, unsafe subprocess additions, or new network-facing behavior.

Documentation/Tests

  • Severity: P3. Impact: the results-object docs are stale relative to the implemented Phase 1 contract. DCDHBootstrapResults still says placebo bootstrap outputs are populated when available, but Phase 1 intentionally never surfaces placebo bootstrap inference; n_groups_dropped_never_switching still says never-switchers are excluded from variance, while the registry/code now treat that field as backward-compat metadata only. Concrete fix: update those docstrings to match current runtime behavior. Refs: diff_diff/chaisemartin_dhaultfoeuille_results.py:L45, diff_diff/chaisemartin_dhaultfoeuille_results.py:L237, docs/methodology/REGISTRY.md:L476, docs/methodology/REGISTRY.md:L554, diff_diff/chaisemartin_dhaultfoeuille.py:L952.
  • Severity: P3. Impact: a couple of new comments overstate what the fixtures/docs guarantee. The R golden-value script says set.seed() and NumPy default_rng() produce identical draws, and the reversible DGP doc says Assumption 5 is “automatically satisfied” for binary treatment even though the PR explicitly relies on dropping multi-switch groups for those paths. These are comment-level issues, but they can mislead future maintainers about parity and identification assumptions. Concrete fix: reword both comments to describe structural mirroring rather than RNG identity, and to say the default single-switch path is A5-safe while multi-switch patterns are stress tests for the drop filter. Refs: benchmarks/R/generate_dcdh_dynr_test_values.R:L37, diff_diff/prep_dgp.py:L1877.

…trings)

P2: split joiners_leavers DataFrame into n_cells + n_obs columns
- to_dataframe(level="joiners_leavers") previously had a single n_obs
  column with mixed semantics by row (DID_M used switcher cell count;
  DID_+/DID_- used raw observation counts). Two columns with consistent
  units across all rows: n_cells (count of switching (g, t) cells) and
  n_obs (sum of n_gt over the same cells). DID_M row uses union of
  joiner + leaver cells. Updated test_to_dataframe_joiners_leavers to
  pin the new contract.

P3: stale docstrings on results object
- DCDHBootstrapResults class docstring now states explicitly that
  placebo bootstrap fields ALWAYS remain None in Phase 1 (the previous
  wording said they were "populated when available"). Per-field
  docstrings for placebo_se / placebo_ci / placebo_p_value now point
  back to the class-level note.
- n_groups_dropped_never_switching docstring now reflects the Round 2
  full-IF fix: never-switching groups participate in the variance via
  stable-control roles and the field is reported for backwards
  compatibility only — no actual exclusion happens.
- n_groups_dropped_singleton_baseline docstring clarifies the
  variance-only filter scope (cell DataFrame retains them as
  period-based stable controls).

P3: misleading R-script + prep_dgp comments
- benchmarks/R/generate_dcdh_dynr_test_values.R: clarified that the
  Python and R generators mirror each other STRUCTURALLY (same pattern
  logic, same FE/effect/noise model), not at the RNG level. R's
  set.seed and NumPy's default_rng use different RNGs. Parity tests
  load the R script's golden-value JSON so both sides operate on
  byte-identical input regardless of how it was originally generated.
- prep_dgp.py generate_reversible_did_data: clarified that the default
  single_switch pattern is A5-safe by construction (every group has at
  most one transition). Other patterns (random/cycles/marketing) ARE
  allowed to violate A5 and exist primarily as stress tests for the
  drop_larger_lower=True filter. The cohort-recentered variance
  formula is derived under A5, which is why drop_larger_lower defaults
  to True.

Tests: 103 dCDH passing (no new tests; the existing
test_to_dataframe_joiners_leavers was strengthened to assert the new
n_cells / n_obs contract).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 787eae20a2aab0a52701ab1d6f3ea88ed449e895


Overall Assessment

Needs changes

Static review only: pytest, numpy, and pandas are unavailable in this environment, so I could not execute the new dCDH suite.

Executive Summary

  • The prior re-review items I spot-checked look addressed in the current diff: shared validation in twowayfeweights(), the joiners_leavers count contract, and the previously stale dCDH docstrings/comments.
  • P1 [Newly identified]: the analytical variance path can collapse to overall_se = 0.0 on valid panels where every exact (D_{g,1}, F_g, S_g) cohort is a singleton. The PR’s own 4-group worked example appears to hit that case, while the committed R golden file records overall_se = 1.8027756377.
  • P1 [Newly identified]: the placebo path silently applies A11 zero-retention when placebo switchers lack eligible stable controls, but it does not expose the warning/flags that the main estimator exposes.
  • The documented period-vs-cohort control-set deviation in REGISTRY.md does not mitigate the SE issue; the registry explicitly treats the 4-group worked example as a case where that deviation should not matter.
  • The new tests pin ATT parity on the worked example, but they still do not pin its SE or the placebo A11 warning path, which is why the above issues can escape.

Methodology

Code Quality

  • No separate code-quality finding beyond the methodology defects above.

Performance

  • No material performance findings in the changed paths reviewed.

Maintainability

  • No separate maintainability finding beyond the methodology/test contract issues below.

Tech Debt

  • Severity: P3 informational. Impact: I did not find any TODO.md entry that tracks either methodology issue above, so they remain unmitigated under the project’s deferment rules and still count toward the assessment. Concrete fix: resolve these in code in this PR; only any leftover non-blocking cleanup should be deferred into TODO.md afterward. Refs: TODO.md:L51-L90

Security

  • No material security findings in the changed files reviewed. I did not see secrets, unsafe subprocess additions, or new network-facing behavior.

Documentation/Tests

Path to Approval

  1. Fix the analytical SE path so valid panels with singleton exact (D_{g,1}, F_g, S_g) cohorts do not silently collapse to overall_se = 0.0; if the variance is genuinely unidentified after the intended paper-consistent filtering, surface NaN plus an explicit warning instead.
  2. Make placebo A11 handling explicit by either adding placebo A11 flags plus a consolidated warning or by marking the placebo unavailable when the required placebo control sets are missing.
  3. Add regression coverage for both cases: assert the worked-example overall_se/CI against the committed golden JSON, and add a placebo panel that exercises missing placebo controls on one side.

Addresses two new P1s + a P2 from CI re-review:

P1: degenerate-cohort SE collapses to 0.0
- _plugin_se now returns NaN when sum(U_centered**2) <= 0 (in addition
  to the existing n==0 and divisor<=0 guards). This catches the
  case where every variance-eligible group forms its own
  (D_{g,1}, F_g, S_g) cohort, so cohort recentering produces an
  identically-zero centered IF vector. Returning NaN rather than 0.0
  prevents the silently-implies-infinite-precision failure mode.
- fit() now detects this case after the analytical SE call and emits
  a UserWarning explaining that the variance is unidentified, the
  point estimate is still valid, and the bootstrap path inherits the
  same degeneracy on the same data.
- Worked example test now asserts overall_se is NaN with the warning,
  alongside the existing DID_M = 2.5 exact assertion.

P1: placebo A11 silent zeroing
- _compute_placebo now mirrors the main path's A11 distinction:
  "no joiners" stays a natural zero (no flag), while "joiners but no
  3-period stable_0 controls" sets the placebo period contribution
  to zero AND appends a warning string to placebo_a11_warnings.
  Symmetric for leavers/stable_1.
- _compute_placebo's return signature widened from
  Optional[Tuple[float, bool]] to Optional[Tuple[float, bool, List[str]]]
  to expose the A11 warnings to the caller.
- fit() unpacks the new tuple and surfaces a consolidated
  "Placebo (DID_M^pl) Assumption 11 violations" warning when any
  placebo period was zeroed by the A11 contract. Mirrors the main
  DID path's warning format.
- New test_placebo_a11_violation_emits_warning constructs a panel
  where placebo joiners exist but no placebo stable_0 controls do
  (4-group T=3 with 2 joiners + 2 always-treated controls), asserts
  the placebo A11 warning fires.

REGISTRY.md: two new Note blocks
- The degenerate-cohort SE contract (NaN + warning, deviation from R)
- The placebo A11 zero-retention contract (mirrors main DID path)

Tests: 105 dCDH passing (was 103, added 2 new). The existing
test_hand_calculable_4group_3period_joiners_and_leavers wraps fit()
in a warnings.catch_warnings(simplefilter("ignore")) block since the
degenerate-cohort warning now fires on this panel; the dedicated
test_worked_example_se_is_unidentified_with_warning asserts the
warning explicitly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1e05c9d86fb2cb12451f888fc0e123c3064f2594


Overall Assessment

⚠️ Needs changes

Static review only. I could not execute the new dCDH suite here because pytest, numpy, and pandas are unavailable in this environment.

Executive Summary

  • The prior re-review blockers I spot-checked look addressed: degenerate singleton-cohort analytical SE now surfaces NaN plus a warning instead of 0.0, and placebo A11 zero-retention now emits a consolidated warning, with both behaviors documented and tested.
  • P1 [Newly identified]: ChaisemartinDHaultfoeuille.fit() now errors on groups missing the first global period and drops groups with interior period gaps before computing DID_M, DID_+, DID_-, or DID_M^pl, but the dCDH registry/docs do not document that sample restriction even though the PR also presents DIDmultiplegtDYN as the reference implementation and that manual explicitly supports unbalanced panels. diff_diff/chaisemartin_dhaultfoeuille.py:L534-L607, docs/methodology/REGISTRY.md:L468-L478, docs/methodology/REGISTRY.md:L544-L569, docs/api/chaisemartin_dhaultfoeuille.rst:L18-L43, tests/test_chaisemartin_dhaultfoeuille.py:L462-L507. citeturn0view0turn0view1
  • The documented Phase 1 placebo-SE=NaN contract and the documented degenerate-cohort overall_se=NaN behavior are not blockers.
  • P3: several changed docs/comments still describe never-switching groups as filtered from variance or as true “filter counts,” while the implementation and main registry note now say they are retained and that field is metadata only.

Methodology

Code Quality

  • No separate findings beyond the methodology item above.

Performance

  • No material performance findings in the changed paths reviewed.

Maintainability

  • No separate maintainability findings beyond the documentation drift noted below.

Tech Debt

  • Severity: P3 informational. Impact: I did not find a TODO.md entry tracking the P1 above, so it remains unmitigated under the project’s deferment rules. Concrete fix: resolve or document it in this PR rather than deferring it. TODO.md:L51-L95

Security

  • No material security findings in the changed files reviewed.

Documentation/Tests

Path to Approval

  1. Decide the intended dCDH unbalanced/ragged-panel contract and document it in docs/methodology/REGISTRY.md: if Phase 1 requires first-period observation plus consecutive periods, say so explicitly and label it as a deviation/limitation relative to DIDmultiplegtDYN; otherwise, implement the reference package’s unbalanced-panel handling.
  2. Mirror that contract in the public API/README text that currently advertises DIDmultiplegtDYN parity, so users do not expect R-like behavior on late-entry or interior-gap panels.
  3. Add one regression test that pins the remaining unspoken branch of the contract, e.g. whether terminal missingness / late exit is retained or rejected, so the documented behavior is fully specified.

Documents the dCDH Phase 1 ragged-panel handling contract that was
implemented in Round 3 but not surfaced in any user-facing doc:

- Balanced baseline required: groups missing the first global period
  raise ValueError with offending IDs
- Interior period gaps: groups dropped with UserWarning
- Terminal missingness (early exit / right-censoring): retained,
  contributes from observed periods only via the per-period present
  guard at three sites in the variance computation

This is a Phase 1 limitation relative to R DIDmultiplegtDYN, which
supports unbalanced panels with documented missing-treatment-before-
first-switch handling. Added a new **Note (deviation from R)** block
in REGISTRY.md, mirrored in the API rst, README, and CHANGELOG so
users do not expect R-like behavior on late-entry or interior-gap
panels.

Also fixes doc drift on never-switching groups: after the Round 2
full-IF fix they participate in the variance via stable-control roles
and are no longer filtered, but CHANGELOG/README/REGISTRY checklist
and a methodology test comment still described them as filtered. The
n_groups_dropped_never_switching field stays as backwards-compat
metadata only (already documented correctly in the dataclass docstring
since Round 2).

Adds test_terminal_missingness_retained as a regression test pinning
the contract: a group with periods [0, 1, 2] in a 5-period panel is
retained in results.groups, the fit completes without error, and
the per-period DIDs are populated.

Files modified:
- docs/methodology/REGISTRY.md (assumption check + new deviation Note + checklist)
- docs/api/chaisemartin_dhaultfoeuille.rst (Phase 1 panel requirements section)
- README.md (Note block + n_groups_dropped_never_switching field row split)
- CHANGELOG.md (panel requirement sub-bullet + line 17 never-switching fix)
- tests/test_chaisemartin_dhaultfoeuille.py (test_terminal_missingness_retained)
- tests/test_methodology_chaisemartin_dhaultfoeuille.py (fix test comment)

Test count: 105 -> 106 (one new regression test).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cf639d44a87b2f2c322284bd3a289578c473b546


Overall Assessment

⚠️ Needs changes

Static review only. I could not execute the added dCDH tests in this runner because pytest and numpy are not installed here, although the project declares them and CI installs them in normal jobs. pyproject.toml:L48-L57, .github/workflows/rust-test.yml:L191-L199

Executive Summary

Methodology

The previous ragged-panel methodology blocker appears resolved; I do not see a remaining finding there. docs/methodology/REGISTRY.md:L477-L477, docs/api/chaisemartin_dhaultfoeuille.rst:L52-L72

Code Quality

No additional findings beyond the public cluster no-op. The changed dCDH inference path consistently routes through safe_inference(). diff_diff/chaisemartin_dhaultfoeuille.py:L920-L942, diff_diff/chaisemartin_dhaultfoeuille.py:L1023-L1038

Performance

No material findings in the changed paths I reviewed.

Maintainability

Tech Debt

  • Severity: P3. Impact: I did not find a dCDH entry in TODO.md that would mitigate the cluster issue, so it remains unmitigated under the project’s deferment rules. Concrete fix: resolve or explicitly document/gate the behavior in this PR rather than deferring it. TODO.md:L51-L97

Security

No material security findings in the changed files reviewed.

Documentation/Tests

Path to Approval

  1. Choose a Phase 1 cluster contract and make it explicit in code: either reject any non-None cluster with a clear error, or implement cluster-aware analytical/bootstrap inference end-to-end.
  2. If cluster is intentionally forward-compatible only, add an explicit dCDH **Note:** in docs/methodology/REGISTRY.md and mirror that contract in the public docs so the behavior is not silent.
  3. Add a regression test that pins the chosen cluster behavior, e.g. ChaisemartinDHaultfoeuille(cluster="state").fit(...) should either raise a clear Phase 1 message or exercise documented cluster-aware inference.

…nguage

Resolves three findings from the prior CI review:

P1 — cluster parameter was a public no-op. The constructor exposed
`cluster: Optional[str] = None` and stored it on `self.cluster`, but
neither `fit()` nor `_compute_dcdh_bootstrap()` ever read it, so
`cluster="state"` silently produced the same group-level inference as
`cluster=None`. The Phase 1 contract is now: dCDH always clusters at
the group level (via the cohort-recentered influence function and the
multiplier bootstrap), and any non-None cluster value raises
`NotImplementedError` at construction time. The same gate fires from
`set_params`. Added `test_cluster_parameter_raises_not_implemented`
covering all four entry points (`__init__`, `set_params`, the
`cluster=None` happy path, and the convenience function).

P3 — TWFE diagnostic was running on the post-Step-5a sample but the
inline comment claimed "FULL pre-filter cell dataset" and the
standalone `twowayfeweights()` function used the truly pre-filter
cell. On ragged panels with interior gaps, the two paths diverged.
Fixed by reordering: the TWFE diagnostic now runs as Step 5a (was
5b) BEFORE the ragged-panel validation, which is now Step 5b (was
5a). The blocks are independent — the diagnostic just reads `cell`
while the ragged-panel block mutates it. Both API surfaces now
operate on the same `_validate_and_aggregate_to_cells()` output.

P3 — API rst step 3 said "Filters singleton-baseline groups" which
read as a point-estimate sample drop. After Round 3, singleton-
baseline groups are excluded from the variance computation only
(they remain in the point-estimate sample as period-based stable
controls). Fixed the language to match.

Documentation:
- REGISTRY.md gets a new `**Note (Phase 1 cluster contract):**`
  block in the dCDH section explaining the always-group-level
  semantics and the construction-time NotImplementedError gate.
- API rst step 3 reflects the variance-only singleton-baseline scope.
- The dCDH `cluster` parameter docstring now describes the Phase 1
  contract instead of "Currently unused — analytical SEs are always
  at the group level via the cohort-recentered plug-in."

Test counts: 106 -> 107 (one new cluster gate regression test).

Files modified:
- diff_diff/chaisemartin_dhaultfoeuille.py (cluster gate in __init__
  and set_params, docstring update, TWFE diagnostic block reorder
  with renumbered Step 5a/5b labels)
- docs/methodology/REGISTRY.md (cluster contract Note + rename
  Step 5a -> Step 5b in the ragged-panel deviation Note)
- docs/api/chaisemartin_dhaultfoeuille.rst (singleton-baseline
  language fix in step 3 of the overview)
- tests/test_chaisemartin_dhaultfoeuille.py
  (test_cluster_parameter_raises_not_implemented in
  TestForwardCompatGates; rename Step 5a -> Step 5b in three
  ragged-panel test docstrings)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 83cc093c82faeaac67b9dadceeefbc3dfe10b5e9


Overall Assessment

✅ Looks good

This re-review clears the prior P1 blocker: cluster is no longer a silent no-op in Phase 1, and I did not find a new unmitigated P0/P1 in the changed dCDH estimator or its supporting files. Static review only; I could not execute the added tests here because pytest is not installed in this runner.

Executive Summary

Methodology

Affected methods: Phase 1 dCDH DID_M, DID_+, DID_-, DID_M^pl, plus the standalone Theorem 1 twowayfeweights() diagnostic.

No unmitigated P0/P1 findings. The changed implementation matches the registry on the core Theorem 3 / Theorem 4 formulas, the Section 3.7.3 cohort-recentered variance, and the explicitly documented deviations from R, so I am not treating those documented differences as defects (docs/methodology/REGISTRY.md:L480, docs/methodology/REGISTRY.md:L517, docs/methodology/REGISTRY.md:L565, docs/methodology/REGISTRY.md:L569, diff_diff/chaisemartin_dhaultfoeuille.py:L785, diff_diff/chaisemartin_dhaultfoeuille.py:L1506).

Code Quality

No material findings. The changed dCDH inference path uses safe_inference() rather than inline t-stat / p-value / CI logic, so I did not find new partial-NaN or inline-inference anti-patterns (diff_diff/chaisemartin_dhaultfoeuille.py:L950, diff_diff/utils.py:L152).

Performance

No material findings from static review. I did not benchmark in this runner.

Maintainability

Tech Debt

  • Severity: P3 (tracked in TODO.md). Impact: the broader gap that allowed the .txt AI guide to drift is already tracked: .txt documentation is outside current smoke validation, so this does not block the PR under the repo’s deferment rules (TODO.md:L93, docs/llms-full.txt:L238).
    Concrete fix: no blocker for this PR; follow the existing TODO by extending doc-snippet or consistency checks to docs/*.txt.

Security

No security findings in the changed files reviewed.

Documentation/Tests

Static only here: I could not run the added tests because pytest is unavailable in this runner. The changed suite itself does cover the key edge cases from the checklist, including missing baseline, interior gaps, terminal missingness, A11 zero-retention, placebo bootstrap deferral, and R golden-value parity (tests/test_chaisemartin_dhaultfoeuille.py:L509, tests/test_chaisemartin_dhaultfoeuille.py:L528, tests/test_chaisemartin_dhaultfoeuille.py:L556, tests/test_chaisemartin_dhaultfoeuille.py:L690, tests/test_chaisemartin_dhaultfoeuille.py:L922, tests/test_chaisemartin_dhaultfoeuille_parity.py:L123).

Two P3 doc-drift cleanups from the prior CI review.

P3 — Bootstrap helper docstring drift. The dCDH bootstrap mixin
documented the re-aggregation divisor as a "count of contributing
groups" with the formula `W @ u_centered / n_groups_target`, but
the live code (and the registry) use switching-cell denominators
(`N_S`, joiner cells, leaver cells) per Theorem 3 of AER 2020. The
numbers were correct; only the docstring drifted. Renamed the public
mixin parameter from `n_groups_overall` to `divisor_overall` and
updated the docstring to describe the divisor as the switching-cell
count from the Theorem 3 weighting formula. The joiners/leavers
tuple description in the docstring also says `(u_centered, divisor,
original_effect)` instead of `(u_centered, n_groups, original_effect)`
to match. The fit() call site at line 1041 passes `divisor_overall=N_S`
under the new name.

P3 — `docs/llms-full.txt` line 238 said the dCDH `cluster` parameter
was "Reserved for future use," which is stale after Round 7's
NotImplementedError gate. Updated to "Phase 1: must be None; non-None
raises NotImplementedError" to match REGISTRY.md and the constructor
contract.

No behavior change; both fixes are purely textual / parameter
naming. 107 tests still pass; black, ruff clean.

Files modified:
- diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
  (rename n_groups_overall -> divisor_overall, expanded docstring
  describing the cell-count divisor and the formula `W @ u_centered
  / divisor`)
- diff_diff/chaisemartin_dhaultfoeuille.py
  (call site at line 1041 updates the keyword argument to match)
- docs/llms-full.txt
  (cluster comment matches the Phase 1 contract)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8cafae13ad3d7d1516c7a198b9331730dd5414fb


Overall Assessment

⚠️ Needs changes — one unmitigated P1 finding.

Executive Summary

  • Previous blocker resolved: cluster is now rejected in both __init__ and set_params, matching the Phase 1 contract.
  • Affected methodology here is the dCDH DID_M / placebo surface plus the AER 2020 TWFE decomposition diagnostic, with analytical variance mapped to the intertemporal DiD working paper. (aeaweb.org)
  • Severity P1 [Newly identified]: fitted twfe_* outputs are computed on the pre-filter cell sample, while overall_att, results.groups, and inference use the post-filter sample. When interior-gap or multi-switch filters fire, the diagnostic is not actually from the “same fit” / “same data” as documented.
  • That deviation is not documented in docs/methodology/REGISTRY.md and is not tracked in TODO.md, so it remains unmitigated.
  • Static review only: this runner does not have numpy, pandas, or pytest, so I could not execute the added tests.

Methodology

Affected methodology: the dCDH DID_M / placebo estimator family and the TWFE decomposition diagnostic introduced by de Chaisemartin and D’Haultfoeuille, plus the dynamic-paper variance mapping. (aeaweb.org)

Beyond that sample-contract issue, I did not find another unmitigated methodology defect in the changed DID_M, placebo, or cohort-recentered inference paths.

Code Quality

No findings. I did not see a new inline-inference or partial-NaN regression; the changed estimator path routes inference through diff_diff/utils.py:L152.

Performance

No separate findings from static review. The blocker above also does unnecessary TWFE work on rows later dropped, but the correctness/contract problem is the part that needs action.

Maintainability

No separate findings.

Tech Debt

No separate deferrable finding. The P1 above is not tracked under the TODO.md “Tech Debt from Code Reviews” section, so it is not mitigated by project deferment rules.

Security

No findings.

Documentation/Tests

Path to Approval

  1. Make the fitted twfe_* results use the same filtered cell sample as overall_att and results.groups, or explicitly split pre-filter vs post-filter diagnostics and warn when they differ.
  2. Add regression coverage for both filter paths that change the estimation sample: one interior-gap drop case and one drop_larger_lower=True multi-switch-drop case.
  3. Update the dCDH docstring, results docstring, API docs, and estimator-selection docs so their “same fit” / “same data” wording matches the implemented contract.

Documents the dCDH TWFE diagnostic sample contract that Round 7's
swap left implicit. The fitted results.twfe_* values are computed on
the FULL pre-filter cell sample (matching the standalone
twowayfeweights() function), NOT on the post-filter estimation
sample used by overall_att / results.groups / inference fields. The
existing user-facing wording said "TWFE on the same data" /
"diagnostic from the same fit" — phrases that naturally read as
"same data as overall_att" — which contradicted the post-Round-7
behavior. This commit:

1. Adds a new `**Note (TWFE diagnostic sample contract):**` block in
   REGISTRY.md enumerating all three sample-shaping filters
   (interior-gap, multi-switch, singleton-baseline) and explicitly
   carving singleton-baseline as variance-only (no fitted-vs-overall_att
   mismatch, so no warning).

2. Rewrites the `twfe_diagnostic` parameter docstring in
   chaisemartin_dhaultfoeuille.py to describe the pre-filter contract
   and the divergence warning.

3. Rewrites the twfe_weights / twfe_fraction_negative / twfe_sigma_fe
   / twfe_beta_fe field docstrings in the results dataclass to clarify
   they describe the FULL pre-filter cell sample, with a pointer to
   the REGISTRY contract Note.

4. Adds a `UserWarning` from `fit()` whenever the user requested the
   TWFE diagnostic AND any of the interior-gap or multi-switch filters
   dropped groups. The warning explains the divergence with explicit
   counts and points at REGISTRY for the rationale. The warning fires
   regardless of whether the diagnostic itself succeeded or hit the
   rank-deficient fallback (the plan-review correctly flagged that the
   `twfe_diagnostic_payload is not None` guard would swallow the rare
   rank-deficient + filtered-panel intersection — dropped that guard).

5. Updates docs/api/chaisemartin_dhaultfoeuille.rst and
   docs/choosing_estimator.rst to replace "from the same fit" with
   "computed on the data you pass in (pre-filter)".

6. Adds three regression tests in TestTwowayFeweightsHelper:
   - test_twfe_pre_filter_contract_with_interior_gap_drop: panel with
     a dropped interior-gap group, asserts fitted twfe_* matches
     standalone, estimation sample is smaller, and the divergence
     warning fires with the expected counts.
   - test_twfe_pre_filter_contract_with_multi_switch_drop: panel with
     an injected multi-switch crosser, similar assertions.
   - test_twfe_no_divergence_warning_on_clean_panel: negative test
     asserting NO divergence warning fires on a clean panel
     (hard-codes pattern="single_switch" to close a future footgun).

7. Fixes the stale "Step 5a guarantees..." comment at line 712 to
   "Step 5b guarantees..." (post-Round-7 the ragged-panel validation
   is Step 5b, not Step 5a). Independent cleanup; bundled because
   it's in the same file and the same topic.

This resolution preserves Round 7's standalone-vs-fitted parity
(both APIs use the pre-filter cell sample) and addresses Round 9's
P1 about the documentation contract. Both reviewers' concerns are
now satisfied: the standalone and fitted produce identical numbers
on the same input, AND users see an explicit warning when filters
make the fitted sample diverge from the dCDH estimation sample.

Test counts: 107 -> 110 (three new sample-contract regression
tests). Black, ruff clean.

Files modified:
- docs/methodology/REGISTRY.md
  (new TWFE sample contract Note enumerating all three filters)
- diff_diff/chaisemartin_dhaultfoeuille.py
  (twfe_diagnostic param docstring, n_groups_dropped_interior_gap
  tracking, divergence warning at Step 6b, stale comment fix)
- diff_diff/chaisemartin_dhaultfoeuille_results.py
  (twfe_weights / twfe_fraction_negative / twfe_sigma_fe /
  twfe_beta_fe field docstrings)
- docs/api/chaisemartin_dhaultfoeuille.rst (wording fix)
- docs/choosing_estimator.rst (wording fix)
- tests/test_chaisemartin_dhaultfoeuille.py (3 new tests + 1
  parity test comment update)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 63edb3d9041ef37a53b58a34b6de32539ecfb2bd


Overall Assessment

Needs changes. One unmitigated P1 finding.

Executive Summary

Methodology

Code Quality

No unmitigated findings beyond the bootstrap inference-surface issue above.

Performance

No separate performance findings from static review.

Maintainability

No separate maintainability findings from the changed code.

Tech Debt

No separate deferrable finding. The bootstrap inference-surface mismatch is not tracked in TODO.md, so it remains unmitigated. See TODO.md:L51-L110.

Security

No findings.

Documentation/Tests

  • Severity P2. Impact: the new bootstrap tests only verify that bootstrap_results is populated and that placebo bootstrap stays unavailable; they do not assert that the top-level public fields (overall_p_value, overall_conf_int, joiner/leaver analogues, event_study_effects[1]) or user-facing renderers (summary(), to_dataframe()) actually reflect bootstrap inference. That gap is why the P1 above would pass CI. Separately, the README currently describes the top-level overall_* fields as analytical, which conflicts with the runtime behavior once bootstrap is requested. Concrete fix: add a regression that compares the top-level inference fields and rendered summaries/tables against bootstrap_results when n_bootstrap > 0, and update the README / API docs / result docstrings to state exactly which fields switch to bootstrap inference. Refs: tests/test_chaisemartin_dhaultfoeuille.py:L867-L955, README.md:L1204-L1214, docs/api/chaisemartin_dhaultfoeuille.rst:L196-L206.

Path to Approval

  1. In diff_diff/chaisemartin_dhaultfoeuille.py:L1108-L1122, propagate bootstrap p-values and CIs from br into the top-level overall_*, joiners_*, and leavers_* result fields instead of recomputing them with safe_inference().
  2. Add a regression test that fits with n_bootstrap > 0 and asserts the public inference surface (overall_*, joiners_*, leavers_*, event_study_effects[1]) matches bootstrap_results.
  3. Add user-facing coverage that summary() and to_dataframe() surface the corrected bootstrap inference, and update the dCDH docs so the semantics of the top-level fields are explicit when bootstrap is enabled.

Fixes a P1 where the dCDH bootstrap branch silently replaced the
multiplier-bootstrap percentile p-values and CIs with normal-theory
recomputations from the bootstrap SE. The bootstrap helper computes
the per-target SE / CI / p-value triples correctly on the
DCDHBootstrapResults object, but fit() was only copying the SE and
then calling safe_inference() which returns normal-theory p-values
and CIs. The user requested multiplier bootstrap inference and got
a hybrid (bootstrap SE + normal-theory p/CI) — the public surface
fields, event_study_effects[1], summary(), to_dataframe(),
is_significant, and significance_stars all reflected the
normal-theory recomputations instead of the bootstrap inference.

The fix: in the fit() bootstrap branch, propagate br.overall_p_value
and br.overall_ci directly to the top-level overall_p / overall_ci
(and joiners/leavers analogues). Keep the t-stat from
safe_inference()[0] since percentile bootstrap doesn't define an
alternative t-stat semantic, and that satisfies the project
anti-pattern rule (never compute t = effect/se inline).

Library precedent: imputation.py:790-805, two_stage.py:778-787, and
efficient_did.py:1009-1013 all propagate bootstrap p/CI to the
public surface while keeping a SE-derived t-stat. dCDH was the only
modern bootstrap-enabled estimator that didn't follow this pattern.

Documentation updates:
- New `**Note (bootstrap inference surface):**` block in REGISTRY.md
  documenting the propagation contract, the rationale for the
  SE-based t-stat, and the placebo carve-out (placebo bootstrap
  remains deferred to Phase 2).
- Inference-method switch paragraph added to the
  ChaisemartinDHaultfoeuilleResults class docstring `Notes` section.
- README.md row updated to clarify "cohort-recentered analytical SE
  by default; multiplier-bootstrap percentile inference when
  n_bootstrap > 0".
- API rst Multiplier bootstrap example now shows that
  results.overall_p_value and results.overall_conf_int reflect the
  bootstrap inference (not just bootstrap_results.overall_*).

Regression test added: test_bootstrap_p_value_and_ci_propagated_to_top_level
asserts results.overall_p_value == bootstrap_results.overall_p_value,
overall_conf_int == bootstrap_results.overall_ci, the joiners/leavers
analogues, event_study_effects[1] reflects bootstrap, the t-stat is
the SE-derived value, summary() doesn't crash, and to_dataframe()
returns the bootstrap-derived numbers. This pins the contract so
the silent inference-method swap can't regress.

Test counts: 110 -> 111 (one new bootstrap propagation regression
test). The existing TestBootstrap tests at lines 855-955 only
asserted that bootstrap_results was populated and SE was finite,
which is why the silent swap passed CI for nine commits.

Files modified:
- diff_diff/chaisemartin_dhaultfoeuille.py
  (fit() bootstrap branch propagates p/CI from br instead of
  recomputing via safe_inference)
- diff_diff/chaisemartin_dhaultfoeuille_results.py
  (class docstring `Notes` section gains the inference-method
  switch paragraph)
- docs/methodology/REGISTRY.md
  (new bootstrap inference surface Note)
- README.md (field-description row clarification)
- docs/api/chaisemartin_dhaultfoeuille.rst
  (Multiplier bootstrap example reflects the new propagation)
- tests/test_chaisemartin_dhaultfoeuille.py
  (test_bootstrap_p_value_and_ci_propagated_to_top_level)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 819ba5789e3ce411dd64066126eda1bc99a99208


Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings remain in the changed dCDH code. Static review only; I could not run the tests because this runner is missing pytest, numpy, and pandas.

Executive Summary

  • The previous P1 bootstrap-inference blocker appears fixed. fit() now propagates bootstrap percentile p_value and conf_int into the top-level overall_*, joiners_*, and leavers_* fields, rather than silently recomputing normal-theory inference from the bootstrap SE: diff_diff/chaisemartin_dhaultfoeuille.py:L1090-L1136.
  • That bootstrap contract is now documented in both the results docstring and the Methodology Registry, so the main inference surface is aligned with the stated methodology: diff_diff/chaisemartin_dhaultfoeuille_results.py:L141-L158, docs/methodology/REGISTRY.md:L561-L565.
  • The substantive dCDH deviations from the papers or R that I checked are documented in REGISTRY.md with Note / Note (deviation from R ...) labels, so under the review rules they are informational rather than defects.
  • Severity P2: summary() still prints the analytical-CI caution unconditionally, even when n_bootstrap > 0 and the displayed p_value / conf_int are bootstrap-based. That is a reporting/docs mismatch, not a methodology correctness bug: diff_diff/chaisemartin_dhaultfoeuille_results.py:L522-L526.
  • The new regression covers numeric bootstrap propagation to top-level fields and to_dataframe(), but it only smoke-tests summary(), which is why the text mismatch can slip through: tests/test_chaisemartin_dhaultfoeuille.py:L974-L1049.

Methodology

Code Quality

  • No unmitigated findings in the changed code from static review.

Performance

  • No findings from static review.

Maintainability

  • No separate findings beyond the documentation/reporting mismatch noted below.

Tech Debt

  • No separate findings. The remaining issue is documentation/test debt rather than a correctness blocker.

Security

  • No findings.

Documentation/Tests

P2 fix: summary() was unconditionally printing "dCDH analytical CI
is conservative under Assumption 8" even when n_bootstrap > 0 and
the displayed p_value / conf_int were bootstrap-based. The footer
now branches on bootstrap_results: when bootstrap is active, it
prints "p-value and CI are multiplier-bootstrap percentile inference
(N iterations, weight_type weights)"; when analytical, it keeps the
original conservative-CI note.

Added test assertions pinning both paths:
- test_summary_formats_without_error (analytical mode): asserts
  "analytical CI is conservative" is present and "multiplier-
  bootstrap" is absent
- test_bootstrap_p_value_and_ci_propagated_to_top_level (bootstrap
  mode): asserts "multiplier-bootstrap percentile inference" is
  present and "analytical CI is conservative" is absent

111 tests still pass; black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 12, 2026
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3824de8fcf853155ef17631e45ce9ae2771605d7


Overall Assessment

⚠️ Needs changes.

Highest unmitigated severity is P1. This is a static re-review of the changed files; I could not run the tests on this runner because pytest is not installed.

Executive Summary

Methodology

  • Severity P1 [Newly identified]. Impact: _validate_and_aggregate_to_cells() averages raw treatment within each (group, time) cell and then majority-rounds any fractional d_gt to binary, and both fit() and twowayfeweights() consume that helper. The same registry section defines the estimator on binary cell-level treatment paths and transition sets (joiners, stable_0, leavers, stable_1), so majority-rounding a mixed cell mutates switcher/control membership before Theorem 3 / Section 3.7 arithmetic is applied. REGISTRY mentions this only as a generic implementation bullet, not a **Note:** / **Note (deviation...)** block, and there is no dCDH mitigation entry in TODO.md, so this remains an unmitigated methodology defect under the review rubric. Concrete fix: raise ValueError on any fractional aggregated d_gt instead of warning-and-rounding; require users to pre-aggregate to constant binary (group, time) treatment; remove the Phase 1 “binary fuzzy designs” claim until an actual fuzzy dCDH extension is implemented. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:121, diff_diff/chaisemartin_dhaultfoeuille.py:199, diff_diff/chaisemartin_dhaultfoeuille.py:475, docs/methodology/REGISTRY.md:473, docs/methodology/REGISTRY.md:492, README.md:1158, docs/choosing_estimator.rst:232, TODO.md:51

Previous methodology findings from the last review cycle appear addressed: the bootstrap inference surface and summary() note now match the registry/tests. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:1101, diff_diff/chaisemartin_dhaultfoeuille_results.py:523

Code Quality

  • No unmitigated findings from static review.

Performance

  • No findings from static review.

Maintainability

  • Severity P3. Impact: the inline comment immediately above the placebo inference block still says Phase 1 “approximate[s]” placebo SE using the plug-in formula, but the executable code below intentionally leaves placebo inference as NaN. The runtime behavior is correct; the comment is not. Concrete fix: update or remove that stale comment so it matches the warning text and the registry contract. Ref: diff_diff/chaisemartin_dhaultfoeuille.py:1034

Tech Debt

  • No separate tech-debt finding, but there is no dCDH-specific mitigation entry under TODO.md:51, so the P1 above is not downgraded by tracking.

Security

  • No findings.

Documentation/Tests

  • No additional unmitigated documentation/test finding beyond the methodology issue above.
  • I could not execute the new dCDH test files on this runner because pytest is unavailable, so the assessment is based on static review only.

Path to Approval

  1. Change _validate_and_aggregate_to_cells() to reject any fractional aggregated d_gt with ValueError instead of warning-and-rounding, and keep that contract shared by both fit() and twowayfeweights().
  2. Update the Phase 1 docs/docstrings to say treatment must already be constant within each (group, time) cell, and remove or explicitly defer the “binary fuzzy designs” language in README.md:1158, docs/choosing_estimator.rst:232, and docs/methodology/REGISTRY.md:466.
  3. Replace the current mixed-cell warning regression at tests/test_chaisemartin_dhaultfoeuille.py:1521 with rejection tests for both public entry points, and add one positive test showing that already-constant cell-level treatment still fits correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant