Norms Behind Closed Doors:
Misperceptions and Maternal Employment in Couples

A Field Experiment in Bogotá, Colombia

Marie Boltz · Monserrat Bustelo · Ana María Díaz · Agustina Suaya

U. Strasbourg / BETA · IADB · Pontificia Universidad Javeriana · IADB

RIDGE Forum · Barbados, May 22, 2026 Pre-registered · AEARCTR-0014648 · Under Review EDCC Sample · 1,732 couples · Bogotá, Colombia

Paper in a Nutshell

What?

A randomized information intervention that corrects pluralistic ignorance about community support — specifically for mothers of young children working, a sharper norm than the "women working" studied so far — delivered to both spouses individually.

Where?

1,732 cohabiting couples with at least one child under 6 in Bogotá, Colombia — designed to be representative on observable characteristics benchmarked to census and GEIH data.

How?

WhatsApp chatbot with personalized feedback on the gap between the respondent's belief and the true Bogotá-wide support; endline phone survey 2–3 months later.

Why?

Where institutions and private preferences are already permissive, beliefs about the community become the binding constraint on intra-household decisions about women's careers.

Main results.

Beliefs: the binding misperception (men about other men's support) falls −11 pp**; spousal belief about partner +6 pp** (spillover, both genders).
Intra-household decision: treated men are 23 % more likely (+9.1 pp) to nominate their wife for an online career-development course instead of taking it themselves.
Labor outcomes: treated women search for work / change jobs more (+9.6 pp***, RW p = 0.013); men's work–family balance preference +11 pp*. Aspirations null.
Boundary: the chain runs end-to-end only among labor-attached women — where structural barriers are already low.

Motivation · Pluralistic Ignorance

Pluralistic ignorance is global — and LAC is no exception

Bursztyn, Cappelen, Tungodden, Voena, Yanagizawa-Drott (2023): 60-country survey on support for women working outside the home
In nearly every region, both men and women underestimate how much others support women working — and the gap is remarkably similar in size across contexts (≈ 15–25 pp); LAC is no exception
Our focus is sharper: pluralistic ignorance specifically about mothers of young children working — the norm that binds exactly when the gender gap in employment widens

Pluralistic ignorance — Global and by region, LAC highlighted

Bursztyn et al. (2023). Bars: actual support (orange) vs. perceived support (navy) for women working. Global on the left of the dashed line; regions on the right; LAC framed in green as our setting.

Research Questions

Three Research Questions

RQ 1 · Societal beliefs

Does information on actual community support correct second-order beliefs about the share of men and women who support maternal employment?

RQ 2 · Spousal beliefs

Does correcting community-level misperceptions spill over into updated beliefs about the partner's views on maternal employment?

RQ 3 · Decisions & labor

Do belief updates affect (a) intra-household allocation of a career-building course, and (b) short-run job search and labor outcomes?

Causal pathway

Information → Community &
spousal beliefs → Intra-household decision → Labor outcomes

Information corrects second-order beliefs (community first; partner beliefs follow) — which jointly feed the household decision.

Motivation · Why LAC

Why LAC? — high private support, frozen gap, no obvious culprit

The LAC puzzle. Female LFP stalled around 2002 even as fertility fell, education rose, and private support for women's work approached parity (Gasparini & Marchionni 2015; Marchionni, Edo & Berniell 2024). The deceleration is a normative puzzle, not an income or institutional one.

Where the gap concentrates — Colombia, women aged 25–45 (GEIH 2023–25)

Childless women: LFP ≈ 82 % · mothers of children under 6: LFP ≈ 61 % → ~20 pp child penalty
Fathers' LFP is essentially unaffected by having young children → ~25 pp mother-vs-father gap among parents of young children
Hourly wages: mothers earn 14–17 % less per hour than comparable childless women; fathers face no comparable penalty

Consistent with the LAC-wide child-penalty pattern in Berniell, de la Mata, Edo & Marchionni (2021).

The question. If norms — not preferences, not institutions — are the binding constraint, then correcting beliefs about the norm should move behaviour.

Motivation · Unit of analysis

Why couples: a joint decision, shaped by two complementary mechanisms

The context.

~78% of LATAM women live in a partnered household — maternal employment is negotiated within the couple, not chosen unilaterally
Spouses are already mildly optimistic about each other (within-couple gap 1–3 pp); the binding friction is at the community level — both underestimate societal support by 20–30 pp
Surveying both partners lets us measure and control the within-household belief gap — and track how it moves after treatment

①

COMMUNITY → PARTNER SPILLOVER

We deliver community information only — nothing about the partner
Yet beliefs about the partner also update
Leading interpretation: the treatment makes the shared community misperception salient and triggers within-couple discussion

②

HUSBAND AS GATEKEEPER (Bernhardt et al. 2018)

The perceived social cost of women's work falls on men — the husband's status is what is seen as diminished
Correcting his belief about what other men accept is what moves the household decision — not changing her own attitude

Why complementary. Without ①, the community correction would not trigger discussion within the household nor reach beliefs about the partner; without ②, updated beliefs would not translate into intra-household action. We cannot fully isolate ①'s channel — but the couples design lets us measure how within-household beliefs and decisions move together after treatment.

Section II

Experimental Design

1,732 couples · Bogotá · WhatsApp + phone · Three survey waves

Experimental Design Baseline Facts Results Heterogeneity Conclusions

Experimental Design · Sample

A sample designed to be representative of Bogotá couples with young children

Couples

1,732

Cohabiting heterosexual couples; 3,464 adults surveyed

Eligibility

≥1 child <6

Stage when gender gaps in LFP widen most sharply

Frame

Census & GEIH

Benchmarked to Colombia's population census and the Gran Encuesta Integrada de Hogares on demographics, income, education

Design

1:1

Randomization at couple level; stratified by wife's LFP and husband's first-order belief

Representative of Bogotá households with ≥1 child under 6 on the dimensions that matter for our reference group (education, income, woman's labor attachment) — so the measured norm (89% of fathers / 91% of mothers support maternal employment) is a credible true reference value
Both partners individually surveyed in private (in-person or phone, July–September 2024) — necessary to measure misperceptions about the partner without contamination
Household income: 28% low · 60% middle · 12% high — mirrors city distribution
Three survey waves: Baseline (Jul–Sep 2024) → Midline/WhatsApp (Oct 2024) → Endline by phone (Nov 2024–Jan 2025)

Experimental Design · Stage 1

Stage 1 — Baseline Survey & Belief Elicitation

First-order belief: do you agree mothers should be free to work?
Second-order beliefs: community-level estimates (how many fathers/mothers agree?)
Spousal beliefs: what do you think your partner believes?

Experimental Design · Stage 2

Stage 2 — Randomization at the Couple Level

Unit: couple (both partners receive the same arm) — eliminates within-household spillover from arm assignment
Stratification: wife's labor status, husband's first-order support, presence of children <6
Treatment: WhatsApp chatbot with actual Bogotá-level support; Control: placebo on public-transport subsidies

Experimental Design · Two Arms

What Each Arm Saw — One Norm, Two Topics

Treatment (866 couples)

"Mothers of children under six should be free to work for pay outside the home."

Gender-norm statement (target).

Control (866 couples)

"Companies should subsidize public transport."

Placebo norm — unrelated topic.

Same chatbot, same four steps, same schedule. The only difference is the norm each arm sees. Any T–C gap in downstream behavior is attributable to corrected beliefs about gender norms.

Experimental Design · Treatment Chatbot

Treatment Arm — Personalized Feedback on the Gender Norm

N

Norms Studyonline

Hello 👋 In the baseline survey you answered a question about the statement:

"Mothers of children under six should be free to work for pay outside the home."10:02

You estimated that out of 100 fathers in Bogotá, 60 agree with this statement.10:02

Do you think your estimate matches the true share in Bogotá?10:02

Yes • No • Not sure

No10:03 ✓✓

Actual share in Bogotá (baseline data):

Fathers

89 / 100

Mothers

91 / 100

In fact, 89 out of 100 fathers and 91 out of 100 mothers in Bogotá agree with the statement.10:03

How does this information feel to you?10:03

Interesting • Irrelevant • Disappointing

Treatment arm — gender norm

The four steps (as in the paper)

Step 1 — Recall: shows the respondent's own baseline estimate of fathers' / mothers' agreement
Step 2 — Check: asks whether that estimate matches reality
Step 3 — Reveal: shows the actual share computed from the baseline (as a WhatsApp message with numbers and emojis)
Step 4 — Rate: asks the respondent to rate the discrepancy — interesting / irrelevant / disappointing

The same four steps are repeated for beliefs about men's and women's support, with order randomized.

Experimental Design · Treatment — The Reveal

What Treated Respondents Saw at Step 3 — The Figure

Share of fathers in Bogotá who agree with the statement

"Mothers of children under six should be free to work for pay outside the home."

Your estimate(from the baseline survey)

60%

60 / 100

Actual share(measured in Bogotá)

89%

89 / 100

+29 pp — fathers in Bogotá support working mothers much more than respondents believe

Repeated for mothers: a second pair of bars shows the respondent's estimate for mothers (≈80%) and the actual (≈91%). Order randomized across respondents.

Experimental Design · Control Chatbot

Control Arm — Same Structure, Placebo Norm

N

Norms Studyonline

Hello 👋 In the baseline survey you answered a question about the statement:

"Companies should subsidize public transport."10:02

You estimated that out of 100 people in Bogotá, 75 agree with this statement.10:02

Do you think your estimate matches the true share in Bogotá?10:02

Yes • No • Not sure

Not sure10:03 ✓✓

Actual share in Bogotá (baseline data):

Men

94 / 100

Women

95 / 100

In fact, 94 out of 100 men and 95 out of 100 women in Bogotá agree.10:03

How does this information feel to you?10:03

Interesting • Irrelevant • Disappointing

Control arm — placebo norm

Why this placebo

Same channel (WhatsApp chatbot), same format, same four-step sequence, same schedule
Unrelated topic: attitudes toward corporate subsidies for public transport
Shares the belief-elicitation mechanics without addressing gender norms

Identification: the T – C contrast isolates the effect of correcting beliefs about maternal employment, net of any attention, engagement, or framing effect from the chatbot itself.

Experimental Design · Stage 3

Stage 3 — Midline: WhatsApp Engagement

Delivery: WhatsApp chatbot with 4 interactive steps (Sep–Oct 2024)
Engagement: only ~29% of couples completed the interaction (501 of 1,732 T; 518 of 1,732 C)
Note: low engagement suggests digital interventions face uptake barriers in this population

Experimental Design · Two outcome measures

Two outcomes: an intra-household decision (in-chatbot) and individual labor responses (endline)

① Intra-household decision · elicited at the end of the WhatsApp chatbot

Course nomination — one real online career course per household, keep it or give to partner
Zero-sum, revealed-preference choice with direct personal cost
Captures the joint household allocation decision, immediately after belief correction

② Individual labor outcomes · endline phone survey, 2–3 months later

Job search effort, mobility, aspirations, work–family balance
1,382 of 3,464 individuals re-interviewed (≈40%)
Measured before any treatment reinforcement

Section III

Baseline Facts

Near-universal private support — and a 28 pp misperception

Experimental Design Baseline Facts Results Heterogeneity Conclusions

Baseline Facts · Sample

Individual Attributes — Large Gender Gap in LFP

Variable	Husbands	Wives	Δ
Demographics
Age (years)	34.9	32.0	2.8***
Education
Low	14.3%	10.8%	3.5***
Medium	69.5%	71.1%	−1.6
High	16.2%	18.1%	−2.0
Employment Status
Employed	90.5%	52.0%	38.5***
Unemployed	5.0%	6.3%	−1.3
Inactive	4.5%	41.7%	−37.2***
Weekly hours	48.7	37.6	11.1***
Job Flexibility
High	23.6%	33.0%	−9.4***
Some	27.2%	31.5%	−4.3**
None	48.9%	35.1%	13.8***
Job Search
Looking for job	10.6%	16.2%	−5.5***
Start business	9.1%	7.2%	1.9**
↳ Actively pursuing job mobility (sum)	19.7%	23.4%	−3.7*
Not looking, but would like to	49.3%	51.8%	−2.5
Satisfied	31.0%	24.9%	6.2***

Baseline Facts · Household

Household Attributes & Income Distribution

Characteristic	Sample Composition
Household Size & Composition
Average household size	3.8 members
Children under 6 per HH	1.13
HH with child <6 not in childcare	27.6%
HH with member needing permanent care	32.0%
Household Income Category
Low income (<1.3M COP)	28%	~$6,200 USD
Middle income (1.3–3.9M COP)	60%	~$6,200–$18,600 USD
High income (>3.9M COP)	12%	>$18,600 USD
Sample Composition
Total households	1,732
Total individuals	3,464 (1,732 couples)

Baseline Facts · Misperception of community support 📊 full table

Both sexes underestimate community support — sharply for men, mildly for women

Distribution of beliefs about men's support for maternal employment

True share

89 %

Perceived (peak)

60 %

Gap

27–33 pp

Distribution of beliefs about women's support for maternal employment

True share

91 %

Perceived (peak)

80 %

Gap

10–11 pp

Pluralistic ignorance runs on both sides. Both husbands and wives underestimate community support for maternal employment — sharply for men's support (≈ 30 pp), more mildly for women's (≈ 10 pp). The intervention corrects misperceptions about both by giving each spouse the true Bogotá distribution.

Section IV

Empirical Strategy

IPWRA · Attrition correction · Multiple testing

Experimental Design Baseline Facts Empirical Strategy Results Heterogeneity Conclusions

Empirical Strategy · Setup

The estimand, two threats, and three analysis samples

Target estimand — ATT (Average Treatment effect on the Treated):

y_i = β₀ + β₁D_i + ρ·y_i0 + X_i'γ + ε_i

Stratification FE · SE clustered at household · gender-specific effects via D_i × Female

Threat 1 · Differential attrition

Baseline respondents are not all observed at midline / endline. If survey participation correlates with potential outcomes — and especially if it does so differently by treatment arm — ITT estimates are biased.

Threat 2 · Selective engagement

Only ~29% of treated couples complete the WhatsApp chatbot. Engagers can differ from non-engagers on covariates that also predict outcomes → covariate imbalance within the realized sample.

Three analysis samples — one for each class of outcome:

Sample 2

Chatbot completers → course nomination (decision elicited inside the WhatsApp module)

Sample 3

Endline survey → labor outcomes (job search, mobility, aspirations, work–family balance)

Sample 2 ∩ 3

Chatbot + endline → community & spousal second-order beliefs

Empirical Strategy · IPWRA

Two-step IPWRA: re-weight for attrition, then estimate the ATT

Step 1 — Selection weights (one per sample S, estimated separately by gender)

p̂_i^S = Pr(S_i=1 │ D_i, X_i) ⟹ w_i^S = Pr(S=1) / p̂_i^S

Probit specification · X_i ≈ 50 baseline covariates (demographics, household, education, labor, baseline first- and second-order beliefs, strata FE) · weights stabilized around 1.

Step 2 — IPWRA within each sample S

Step-1 weights w_i^S enter as pweights. The estimator combines:

Treatment model e(X_i^D, strata) = Pr(D=1│X, strata) → propensity score
Outcome models m_d(X_i^D, strata) = E[Y│D=d, X, strata], d ∈ {0,1}

Reweights the control group toward the covariate distribution of the treated; adjusts flexibly for remaining differences via the outcome regression.

Doubly-robust property. The ATT is consistently estimated if either the treatment model e(·) or the outcome models m_d(·) are correctly specified — we do not need both. Reported with cluster-robust analytic variances at the household level.

Empirical Strategy · Inference

Inference: Fisher randomization as primary, Romano-Wolf for multiple testing

Primary inference · Fisher randomization-inference p-values

Derived from the permutation distribution of the actual treatment assignment, respecting the stratification
Exact and finite-sample valid — does not rely on asymptotic approximations or large-sample CLT arguments
Reported as the primary p-value next to every ATT throughout the paper (Fisher, 1935; Imbens & Rubin, 2015)

Multiple-testing correction · Romano-Wolf step-down (Appendix Table A·rwolf)

Step-down adjustment within each outcome family — F1 community beliefs · F2 spousal beliefs · F3 course outcomes · F4 labor outcomes
Controls family-wise error rate while preserving power vs. Bonferroni
Computed on unweighted OLS → conservative relative to our preferred IPWRA estimates (treats RW as a stress test of the headline pattern)

Headline robustness. Women's job mobility survives Romano-Wolf double-starred (RW p = 0.013). Men's course nomination and work–family balance are suggestive under RW but the Fisher p-values remain below 0.10.

Empirical Strategy · Attrition robustness

Beyond IPWRA: bounds, near-miss diagnostic, and weight sensitivity

Lee (2009) sharp bounds

Under monotone selection (treatment never makes someone less likely to be observed), trim the over-represented arm at the appropriate quantile to get sharp upper/lower bounds on the ATT — without parametric assumptions on selection. Reported in Appendix Table A·leebounds for all main outcomes where monotonicity is plausible.

Near-miss diagnostic — harder-to-reach respondents

Compare November (easy) vs. December–January (hard-to-reach) endline respondents on baseline covariates. The variable the treatment directly corrects — community SOB about men's support — does not differ: 58.3 vs. 59.4 (p>0.5). A 3.7 pp gap on FOB exists, but FOB is a stratification variable.

Four weight specifications

Baseline probit-PS (main spec)
Weights winsorised at p95
Trimmed sample: drop PS < 0.10
Alternative logit-PS specification

Headline results stable across all four (Appendix Table A·ipwrasens).

Reference-group accuracy

For the disclosed Bogotá-average to be the right anchor, engagers and non-engagers must share the targeted prior. They do: 58.1 vs. 58.6, p>0.5. Maximum subgroup deviation from the city-wide mean is 3.3 pp — far below the 28 pp misperception being corrected.

Macours (2025): only 28% of dev-econ RCTs explicitly correct for attrition in estimation; a further 23% stop at balance tests on attrition rates. Our IPWRA + Lee + near-miss design addresses both the estimation and the diagnostic gap.

Section V

Results

Community beliefs · Spousal beliefs · Course allocation · Labor market outcomes

Experimental Design Baseline Facts Empirical Strategy Results Heterogeneity Conclusions

Results · RQ1 — Community beliefs 📊 full table

Preferences unchanged — but the binding misperception falls for men

♂

Men

N = 453 · Sample 2∩3

Own belief (FOB)	━ null
Belief about ♂'s support	+2.8 pp
Belief about ♀'s support	+4.5 pp ★
⚡ Misperception of ♂'s	−11 pp ★★
Misperception of ♀'s	━ null

♀

Women

N = 649 · Sample 2∩3

Own belief (FOB)	━ null
Belief about ♂'s support	+2.6 pp
Belief about ♀'s support	+2.3 pp
Misperception of ♂'s	━ null
Misperception of ♀'s	━ null

The binding misperception falls — men only · P[underestimate ≥ 5 pp]

Control men

♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂

83 of 100
underestimate

Treated men

♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂

72 of 100
−11 pp ★★

Why binary > continuous? The mean SOB shifts only +2.8 pp (Panel B), but the share who substantially underestimate (≥5 pp) drops 11 pp. The treatment compresses the distribution toward truth: a meaningful share of men crosses out of substantially underestimating, even when the average estimate moves modestly. Preferences (FOB) are unchanged.

Definitions: Misperception of ♂'s/♀'s = 1 if respondent underestimates true community support by ≥ 5 pp. ★ p<0.10 · ★★ p<0.05 · IPWRA · Fisher randomization p-values

Results · RQ2 — Spousal beliefs 📊 full table

Spillover into spousal beliefs — even with no information about the partner

♂

Men

N = 453 · Sample 2∩3

Own attitude (first-order belief)	━ null
⚡ Belief about ♀'s support	+5.9 pp ★★

♀

Women

N = 649 · Sample 2∩3

Own attitude (first-order belief)	━ null
⚡ Belief about ♂'s support	+6.3 pp ★★

The spillover — community correction lifts perceived partner support, even from an already-high baseline

♂ men's belief about ♀'s support for maternal employment

Control

90.9 %

→

Treated

96.8 %

+5.9 pp ★★

♀ women's belief about ♂'s support for maternal employment

Control

86.8 %

→

Treated

93.1 %

+6.3 pp ★★

Why this is striking. At baseline, spouses are already mildly optimistic about their partner (within-couple gap ≈ 1–3 pp). The dominant friction is at the community level, not within the couple. Yet correcting community beliefs lifts the partner-belief further. Leading interpretation: the treatment makes the shared misperception salient and triggers within-couple discussion — though the channel is not separately identified.

Why is the binary misperception null? The spousal misperception indicator is defined differently from RQ1: =1 if the respondent's dichotomised belief about the partner doesn't match the partner's actual yes/no support (no 5 pp threshold). At baseline most respondents already get the direction right (control mean ≈ 18 %), so there is little room for the binary to move — the action is in the continuous belief level.

★★ p<0.05 · ★★★ p<0.01 · IPWRA · Fisher randomization p-values

Results · RQ3 — Course allocation 📊 full table

Belief-corrected men nominate their wives 23 % more often for the course

♂

Men

N = 373 · Sample 2

Control: nominate wife

40 %

→

Treated

49 %

+9.1 pp (+23 %) ★★

Fisher exact p = 0.011 · IPWRA p = 0.104

♀

Women

N = 644 · Sample 2

Control: nominate self

84 %

→

Treated

84 %

━ ceiling effect

Already 84% nominate themselves — no room to move

Out of every 10 men, one more nominates his wife after the intervention

Control men

♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂

4 of 10
nominate wife

Treated men

♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂

5 of 10
+1 ★★

Robustness. Effect for men stable across all 4 IPWRA weight specs (0.091–0.094); Lee sharp bounds strictly positive at 0.8 % trimming. Secondary outcomes (Q&A backup): own interest in course — null for both genders; belief about partner's interest — null. The decision shifts; the stated preferences do not.

★★ p<0.05 · Course nomination is a zero-sum revealed-preference choice elicited inside the WhatsApp chatbot

Results · RQ4 — Labor outcomes 📊 full table

Women act in the market; men report stronger work–family balance preferences

♂

Men

N = 453 · Sample 2∩3

Job mobility	━ null
LM aspirations	━ null
Work–family balance (stated preference)	+11 pp ★

WFB is a stated preference — interpret with caution (possible social-desirability echo; see SDB slide). The clear behavioural result is on the women's side.

♀

Women

N = 649 · Sample 2∩3

⚡ Job mobility	+9.6 pp ★★★
LM aspirations	━ null
Work–family balance	━ null

Control mean 73 % → treated 82 % (+13 % relative). Survives Romano-Wolf: RW p = 0.013.

The asymmetric response — same belief update, different behaviour

♂ men's work–family balance preference

Control

32 %

→

Treated

43 %

+11 pp ★

♀ women's job mobility (changed job / started business)

Control

73 %

→

Treated

82 %

+9.6 pp ★★★

Not social desirability — job mobility. Placebo arm (Sample 3∖2, endline-only respondents who never saw the chatbot) shows negative coefficients on job mobility (−8.7 pp for women, n.s.) → rules out a survey-response artifact. Women's mobility survives Romano-Wolf (RW p = 0.013). Men's work–family balance is a stated preference — interpret with caveat. Possible social-desirability echo of the treatment script. Main defense: placebo arm shows no analogous shift, and the effect persists in Sample 3 (+6.6 pp ★) under a separate window. Aspirations null for both genders.

★ p<0.10 · ★★★ p<0.01 · IPWRA · Fisher randomization p-values

Results · Mechanisms

Two mechanisms, one chain — community→partner spillover & husband as gatekeeper, supported

①

COMMUNITY → PARTNER SPILLOVER

The setup. We deliver community-level information only — nothing about the partner. If the treatment touched only community beliefs, beliefs about the partner should be unaffected.

The result. Belief about the partner's support moves +6 pp ★★ for both men and women — from an already-optimistic baseline.

Leading interpretation: the treatment makes the shared misperception salient and triggers within-couple discussion. The channel is not separately identified.

②

HUSBAND AS GATEKEEPER

The theory. Bernhardt et al. (2018): the social cost of women's work falls on men. So the belief to correct is his — what he thinks other men accept — not her own attitude.

Men's misperception of ♂'s support: −11 pp ★★ (the binding belief moves)
Treated men then nominate wife: +9.1 pp, Fisher p = 0.011 (the gatekeeper releases)

Women already nominate themselves 84 % — it is the husband's belief that has to move, not the wife's.

The 4-step chain that ties them together

Community info → Husband's belief about other men −11 pp → Husband's belief about wife +6 pp → Husband releases as gatekeeper +9 pp course → Wife moves in the market +9.6 pp mobility ***

Results · Heterogeneity (mechanism evidence)

The gatekeeper releases — but only where the wife is labor-market attached

Heterogeneity by wife's labor attachment — Course, Job mobility, Aspirations

Wife active (green): the chain runs — Course nomination by husband +12.8 pp ★, Job mobility +10.8 pp ★★, Aspirations +10.6 pp ★. Wife inactive (orange): all three point estimates are smaller and not statistically significant (Course −0.9 pp, Job mobility +6.1 pp, Aspirations +0.4 pp). Genuine boundary for Course (Δ −13.7 pp ★★★) and Aspirations (Δ −10.2 pp ★★★); for Job mobility the interaction is also n.s. (Δ −4.7 pp), so we cannot reject equal effects across groups, but neither can we claim the treatment works for inactive women on its own. The husband only releases as gatekeeper, and the women only act, where there is an actionable labor-market margin. Interaction model with Z_i=1 if wife inactive; wife inactivity is a stratification variable → heterogeneity is causal. Magnitudes are IPW estimates; interaction stars from OLS Fisher randomization-inference p-values (paper Table G, Panel B Men for Course, Panel C Women for Job mobility and Aspirations).

Results · Synthesis

The boundary of the norms channel — where it stops

Among inactive wives, the chain breaks down:
- Course nomination by husband ≈ 0
- Labor-market aspirations ≈ 0
- Job mobility — not statistically significant in this group
Structural barriers (childcare, labor demand, skill certification) are the first-order constraint here — information about norms is not enough
Information is insufficient where participation constraints bind: correcting the husband's beliefs about other men does not translate into intra-household action when there is no actionable labor-market margin for the wife
Consistent with Afridi, Dhillon, Roy & Sangwan (2023): when access and information are offered but structural barriers bind, women's outside-the-home employment does not move

Policy implication. Information is a low-cost lever where structural conditions already permit. Where they don't, norms and structure are complements, not substitutes — childcare, demand, and certification are the binding tools.

Robustness · Social desirability bias

Five defenses against social-desirability bias

The concern. If treated respondents simply echo back the progressive message rather than acting on it, the headline effects would be artefactual. The paper's design and analysis include five complementary defenses.

① Timing placebo (Sample 3 ∖ 2)

Endline-only respondents received the info after their behavioral window. If results were SDB, job mobility would still appear here. Coefficient is negative / null for women (paper Table, col 3) → rules out a survey-response artefact.

② Real behavior > stated preferences

SDB would predict stated preferences (aspirations) move most. We find the opposite: aspirations null; behavior (course nomination +9.1 pp, job mobility +9.6 pp ★★★) moves.

③ Course as revealed preference

Zero-sum, costly choice (give up own slot to wife). Hard to drive via cheap talk — the husband faces a real loss if he nominates wife only to please the experimenter.

④ Engager balance

Treatment engagers vs. control engagers have near-identical baseline beliefs (max gap 2.5 pp on any belief variable) → no selection on motivated reasoning (paper §6, engager diagnostics).

⑤ Spillover to partner-specific beliefs (RQ2)

Treated respondents update beliefs about their own partner, whose views they were never told. Inventing partner-specific beliefs to please the experimenter is implausible — these are private, idiosyncratic targets. The spillover (+6 pp for both) is hard to reconcile with SDB.

One exception we flag honestly: men's work–family balance preference (+11 pp ★) is a stated preference; the timing placebo helps but a demand effect cannot be fully ruled out. We interpret it conservatively (paper §6.3).

Robustness

The Main Results Are Robust

Lee (2009) Sharp Bounds

Trimming fraction for course (men) < 0.8% — both bounds strictly positive
Women's job search: Lee bounds positive; belief outcomes include zero
Attrition-robust: monotonicity holds for course (Sample 2 = midline engagers)

IPWRA Sensitivity (4 specs)

Baseline probit · Winsorized (p95) · Trimmed (PS<0.10) · Logit PS
Course men: 0.091–0.094 across all specs; crosses 10% threshold under winsorized weights
Women job search: 0.091–0.096 (p=0.006–0.009) — stable and significant
Work–family men: fragile to trimming (p=0.198 with 11 high-leverage obs dropped)

Romano-Wolf Step-Down

F1 (community beliefs): RW p > 0.40
F3 (course, men): RW p = 0.151 (does not clear 10% under MHT correction; suggestive)
F4 (men's work–family balance): RW p = 0.077 (clears 10% threshold)
F4 (women's job mobility): RW p = 0.013 — survives MHT at 5%

Additional Checks

Near-miss timing placebo: no differential loss on key belief variables
Engager characterization: beliefs identical between engagers and non-engagers
Reference group accuracy: disclosed norm accurate within ±3.3 pp for all subgroups

Section VII

Conclusions

What we found · What it means · What comes next

Experimental Design Baseline Facts Empirical Strategy Results Conclusions

Conclusions · Summary

What We Found

① Own attitudes don't move — beliefs about the partner do

First-order beliefs (own support for women's work) are already high and don't shift. The channel runs through second-order beliefs about the partner: spousal SOB +6.2 pp; community SOB +3–5 pp. The friction is informational, not preferences.

② Within-couple decisions move

Treated men are 9.1 pp (+23%) more likely to nominate their wife for a career-development course. A zero-sum allocation with direct personal cost — a lower bound on the willingness to invest in the wife's career.

③ Women's labor decisions follow

Treated women report +9.6 pp more job mobility in 1–2 months (p=0.006; RW p=0.013). Placebo timing check rules out social desirability. The behavioral chain — SOB → bargaining → action — runs end-to-end.

④ The boundary: where the chain stops

For households with inactive wives, beliefs partially update (the inference channel still operates), but the chain breaks at the behavioral translation step: course allocation and labor-market margins do not move because there is no actionable margin for the husband to act on. Information about norms complements, not substitutes for, structural policy.

Conclusions · Implications

What It Means

The binding constraint is shared misperception about the community, not about each other. Spouses are already mildly optimistic about their partner's support; the friction sustaining the FLFP gap is the gap between private attitudes and what each spouse believes the broader community endorses. Correcting that community-level misperception is what unblocks couple-level bargaining at low cost.
Couples are the right unit of analysis. Single-respondent designs miss the spillover from community to partner beliefs and the within-household allocation margin. Surveying both spouses lets us measure and control the within-household belief gap — and track how beliefs and decisions move together after treatment.
We identify the boundary of the norms channel. The chain runs end-to-end (SOB → bargaining → labor decisions) among labor-attached households and is silent among inactive women. The asymmetry is itself a policy lesson: information is a low-cost tool for the first group; structural policy (childcare, demand, certification) is the binding tool for the second. Norms and structure are complements, not substitutes.
Scalability: WhatsApp-based norm correction is low-cost and digitally deliverable at scale in LAC cities. Engagement rates (~36%) are typical for low-cost digital interventions but underscore the importance of sustained exposure (endline reinforcement was necessary to achieve the full effect).

Appendix

Full tables · Robustness · Diagnostics · Additional results

A2 Full beliefs A3 Romano-Wolf (+ intuition) A4 Lee bounds (+ intuition) A5 IPWRA sensitivity (+ intuition) A6 Balance tests A7 Attrition A8 PS overlap A9 OLS vs IPWRA A10 Het table A11 Indirect effects A12 Near-miss + Engagers (+ intuition)

Appendix · backup table

Results · RQ1 — Community Beliefs

Does Information Correct Community Second-Order Beliefs? Complete Results

Outcome	First-Order Belief (1)	2nd-Order: Men's Support (2)	2nd-Order: Women's Support (3)	Misperception Men D (4)	Misperception Women D (5)
Panel A — All (N = 1,102)
ATT	0.001	2.75*	3.58**	−0.054*	−0.047
Control mean	0.902	63.2	75.8	0.832	0.553
Panel B — Men (N = 453)
ATT	−0.004	2.82	4.50*	−0.110**	−0.047
Control mean	0.874	65.6	75.3	0.830	0.566
Panel C — Women (N = 649)
ATT	−0.002	2.57	2.30	−0.018	−0.040
Control mean	0.930	61.4	76.8	0.840	0.536

Key finding 1: No change in first-order beliefs (already near ceiling at 90%). The intervention leaves own attitudes unchanged.

Key finding 2: Community beliefs do correct. Men's misperception of male support falls −11 pp (p=0.031). Women perceive male support +4.5 pp (p=0.058).

Sample: Respondents in both midline and endline surveys (Sample 2∩3, N=1,102). ATT = Average Treatment effect on the Treated (IPWRA, 90% CI). Control means are unadjusted baseline/endline values.

Appendix · backup table

Results · RQ2 — Spousal Beliefs

Spousal Beliefs: +6 pp Perceived Partner Support

	Working Mothers			Equal Task Sharing
Panel	1st-order	2nd-order (Spouse)	Misperception D	1st-order	2nd-order (Spouse)	Misperception D
Panel A — All (N = 1,102)
ATT	0.009	0.063***	−0.028	0.021**	0.038**	−0.027
Control mean	0.900	0.885	0.184	0.965	0.899	0.104
Panel B — Men (N = 453)
ATT	0.010	0.059**	−0.046	0.013	−0.004	−0.002
Panel C — Women (N = 649)
ATT	0.004	0.063**	−0.004	0.025***	0.061**	−0.040

Working mothers: Perceived spousal support rises by 6.3 pp (p=0.001) — similar for men (+5.9 pp) and women (+6.3 pp). Community-level correction spills over into within-couple beliefs.

Equal task sharing: Women update perceived husband support by 6.1 pp (p=0.016) — consistent with women updating beliefs about men more broadly when community misperceptions are corrected.

Appendix · backup table

Results · RQ3a — Course Allocation

Men +9.1 pp More Likely to Nominate Wife for the Course

	Wife Should Attend Course (1)	Are You Interested? (2)	Is Partner Interested? (3)
Panel A — All (N = 1,017)
ATT	0.023	−0.014	−0.022
Control mean	0.688	0.793	0.460
Panel B — Men (N = 373)
ATT	0.091 (p = 0.104)	−0.046	0.018
Control mean	0.402	0.743	0.574
Panel C — Women (N = 644)
ATT	−0.006	0.007	−0.019
Control mean	0.841	0.819	0.374

Men (+9.1 pp, +23%): IPWRA p=0.104; Fisher exact p=0.011; Lee bounds strictly positive at 0.8% trimming; stable across all 4 IPWRA weight specs (0.091–0.094). Suggestive but credible.

Women (near zero): 84.1% of women in the control group already nominate themselves — a ceiling effect. Little scope for the treatment to move this.

Appendix · backup table

Results · RQ3b — Labor Market

Women Search More (+10 pp) · Men Value Work–Family Balance More (+11 pp)

	Job Mobility		Aspires Better LM		Work–Family Balance
	Sample 2∩3 (1)	Placebo 3∖2 (3)	Sample 2∩3 (4)	Sample 3 (5)	Sample 2∩3 (6)	Sample 3 (7)
Panel A — All (N = 1,102)
ATT	0.058*	−0.056	0.005	0.000	0.052	0.038
Panel B — Men (N = 453)
ATT	0.012	−0.048	−0.042	−0.027	0.110*	0.066*
Control mean	0.664		0.492		0.317
Panel C — Women (N = 649)
ATT	0.096***	−0.087	0.054	0.018	−0.006	0.015
Control mean	0.725		0.507		0.361

Women's job mobility: +9.6 pp (p=0.006), +13% relative to control. Placebo negative and p>0.10 → not social desirability. RW p=0.013.

Men's work–family balance: +11 pp (p=0.054), +35% relative to control. Robust in Sample 3 (+6.6 pp, p=0.099). Aspirations: null for both.

Appendix · backup table

Appendix · Baseline beliefs — full table

Baseline Beliefs: Target Norm & Placebo

Belief Type	Husbands	Wives	Difference
A. Target Norm: "Mothers of children <6 should be free to work"
First-order (own view)	88.5%	90.5%	−2.0 pp**
Second-order: Men (estimate of fathers)	61.0%	55.7%	+5.3 pp***
Second-order: Women (estimate of mothers)	79.6%	80.0%	−0.4 pp
Spousal second-order	93.9%	89.9%	+4.1 pp***
B. Placebo Norm: "Companies should subsidize public transport"
First-order (own view)	93.5%	94.9%	−1.4 pp***

N = 1,732 couples. High first-order support for both norms (88–95%). Misperception concentrated on father's support for maternal employment (gap: 27–33 pp). Placebo norm shows no such gap.

Appendix

Appendix · Within-couple exposure (descriptive)

Information does not diffuse automatically within couples

⚠ Caveat — endogeneity: we did not randomize which spouse engaged with the WhatsApp module. Direct / indirect / joint exposure configurations are endogenous. We therefore interpret these results as descriptive, not causal.

Direct (own engagement): spousal SOB +6.2 pp — replicates main result

Indirect (only partner): spousal SOB ≈ 0; men's 1st-order ↓ 4–6 pp (possible reactance)

Joint (both partners): spousal SOB +8.9 pp; course allocation +9–10 pp — strongest

Policy implication: ensure both partners receive the information directly. Don't rely on within-couple diffusion — bilateral exposure delivers the largest and most coherent updates. The asymmetric indirect-exposure pattern (men retreat when info comes via wife) suggests possible reactance.

Appendix

Appendix · Baseline

Full Baseline Beliefs — 8 Gender Norms

Norm Statement	Men 1st-order	Women 1st-order	Men's est. men's support	Men's est. women's support
Mothers with children <6 should be free to work	88.5%	90.5%	61.0%	79.6%
Fathers and mothers should share childcare equally	—	—	—	—
Children suffer when mother works	—	—	—	—
Problems arise if wife earns more than husband	—	—	—	—
Placebo: companies should subsidize green transport	93.5%	94.9%	—	—

Across all 8 gender-norm items, the same pattern holds: progressive private attitudes coexist with sizable misperceptions about others' views, particularly men's support. The placebo shows near-universal agreement and no misperceptions — confirming misperceptions are norm-specific, not general pessimism.

Appendix

Appendix · Multiple Testing — What does it do?

Romano-Wolf Step-Down: Intuition

The threat: when we test many outcomes (beliefs, course, labor), the chance of finding at least one spurious "significant" result rises. Pure luck can produce false positives.

What we do: Romano-Wolf adjusts each p-value by simulating the joint distribution of all test statistics under the null (1,000 bootstrap replications, clustered by household). It produces a family-wise error rate–controlled p-value for every outcome, accounting for the dependence structure between them.

Plain English: "Even if I throw lots of outcomes at this experiment, here is the p-value adjusted for the fact that I am fishing in many ponds. Results that survive RW are not lucky strikes."

Computed on unweighted OLS — conservative relative to IPWRA
Outcome families: F1 community beliefs · F2 spousal beliefs · F3 course allocation · F4 labor
Headline: women's job mobility survives (RW p = 0.013); men's course nomination is marginal (RW p = 0.077)

Appendix

Appendix · Multiple Testing

Romano-Wolf Step-Down p-values

Outcome Family & Variable	OLS coef.	Fisher p	RW p	Survives?
F1 — Community Beliefs (4 outcomes)
Perceived men's support	+2.75	(0.089)	(0.40+)	✗
Misperception indicator, men	−0.054	(0.061)	(>0.40)	✗
F3 — Course Allocation (men only)
Wife should attend course (men)	+0.091	0.011	0.077	marginal
F4 — Labor Outcomes (women)
Job mobility (women, Sample 2∩3)	+0.096	0.006	0.013 **	✓
Aspires better LM (women)	+0.054	(0.223)	(>0.49)	✗

Notes: Romano-Wolf computed on OLS (unweighted) — conservative vs. IPWRA. 1,000 replications, seed(12345), clustered by household. Women's job mobility survives stepdown correction (RW p=0.013). Course nomination for men is marginal (RW p=0.077).

Appendix

Appendix · Attrition Robustness — What does it do?

Lee (2009) Sharp Bounds: Intuition

The threat: IPW corrects attrition only on observable covariates. What if dropouts are different on unobservables correlated with the outcome (e.g., motivation)?

What we do: Trim observations from the lower-attrition arm to make response rates equal across treatment and control. Then compute the worst-case and best-case ATT — the interval brackets all possible values consistent with monotonicity (treatment doesn't change who attrits).

Plain English: "Imagine the absolute worst possible scenario about who dropped out. Even then, my treatment effect lies somewhere in this range. If both ends of the range exclude zero, my result holds even under unobservable bias."

Headline: Both bounds strictly positive for men's course nomination and women's job search
Belief outcomes: bounds include zero — consistent with no robust belief effects
Key assumption: monotonicity (treatment doesn't push you to drop out)

Appendix

Appendix · Attrition Robustness

Lee (2009) Sharp Bounds

Outcome	ATT (IPWRA)	Lower Bound	Upper Bound	Trimming %	Both Positive?
Course Allocation
Wife attends course (men)	0.091	0.079	0.106	0.8%	✓ Yes
Labor Market (Sample 2∩3)
Job mobility (women)	0.096	0.048	0.134	2.1%	✓ Yes
Work–family balance (men)	0.110	−0.008	0.214	1.6%	—
Community Beliefs (Sample 2∩3)
Perceived men's support	2.75	−1.2	+6.5	—	—

Interpretation: Lee bounds apply when treatment monotonically increases probability of being in sample. For the course (Sample 2), this is satisfied by design (engagers). Both bounds strictly positive for men's course nomination and women's job search — key results hold under worst-case attrition scenarios consistent with monotonicity.

Appendix

Appendix · IPWRA Sensitivity — What does it do?

IPWRA Sensitivity: Intuition

The threat: attrition propensity scores range from 0.06 to 0.76. A few observations get very large weights (≈ 1/0.06 ≈ 17). Those few units could drive the entire ATT.

What we do: re-estimate the IPWRA ATT under 4 alternative weight constructions to check that headline results don't depend on the most extreme weights:

(i) Baseline: probit propensity score (preferred)
(ii) Winsorised: cap weights at 95th percentile
(iii) Trimmed: drop observations with PS < 0.10 (≈ 1% of obs)
(iv) Logit PS: alternative functional form for the selection model

Plain English: "If a handful of unusual observations were driving my result, the estimate would change a lot when I cap or drop them. It doesn't change → my result is robust, not artifact of extreme weights."

Headline: men's course estimate stable at 0.091–0.094 across all 4 specs; women's job mobility stable at 0.091–0.096.

Appendix

Appendix · IPWRA Sensitivity

IPWRA Sensitivity to Alternative Weight Specifications

Specification	Course (men) coef. / p	Job mobility (women) coef. / p	Work–family (men) coef. / p
Headline: Probit PS weights (untrimmed)
Baseline	0.091 / (0.115)	0.096 / (0.006)	0.110 / (0.054)
Sensitivity checks
Winsorized (cap p95)	0.094 / (0.086) *	0.091 / (0.009)	0.101 / (0.057)
Trimmed (drop PS < 0.10, N−11)	0.091 / (0.108)	0.096 / (0.006)	0.072 / (0.198) ✗
Logit PS	0.094 / (0.105)	0.095 / (0.007)	0.109 / (0.051)

Course (men): Estimate stable at 0.091–0.094 across all 4 specs. Crosses 10% threshold under winsorized weights. Lee bounds positive → the 9 pp estimate is credible.

Work–family balance (men): Fragile to trimming — 11 high-leverage observations matter. Interpret cautiously; direction consistent but precision conditional on those obs.

Appendix

Appendix · Diagnostics

Balance Tests: Treatment Assignment

After IPWRA weighting, maximum absolute standardized mean differences (SMDs) are below 0.10 in all samples and genders. Some covariates show marginal imbalance in Sample 2∩3 (joint F tests reject), but effect sizes are small, and post-weighting balance is tight. The key variable — second-order belief about men's community support — does not differ significantly across treatment and control arms in any sample.

Appendix

Appendix · Attrition

Attrition Diagnostics

Endline attrition: 40% response rate. Attritors are more likely to be employed and younger — consistent with time availability. After weighting, SMDs < 0.10.

Near-miss timing: November (easy) vs. Dec–Jan (hard to reach) respondents have near-identical second-order beliefs (58.3 vs. 59.4, diff. +1.2 pp, p>0.5) — the key variable the treatment corrects.

Appendix

Appendix · Diagnostics

Propensity Score Overlap

Treatment PS — All

Attrition PS — All

Propensity scores range from 0.06 to 0.76; overlap is adequate in all samples. Effective sample sizes remain large after weighting; mass outside common support is small. P-score densities from 0.06–0.76 → no extreme regions of non-overlap that would invalidate IPWRA.

Appendix

Appendix · Specification

OLS vs. IPWRA: Estimates Are Similar

Outcome	OLS	OLS + weights	IPWRA (preferred)	Direction consistent?
Beliefs — Men's community SOB (men only)
Perceived men's support	+3.1*	+2.9*	+2.82	✓
Course — Wife attends (men only)
Wife should attend course	+0.087*	+0.089	+0.091	✓
Labor — Job mobility (women, Sample 2∩3)
Job mobility	+0.094***	+0.096***	+0.096***	✓
Labor — Work–family balance (men)
Wants work–family balance	+0.108*	+0.109*	+0.110*	✓

IPWRA is the preferred specification chosen a priori to address selection into midline take-up. OLS and weighted-OLS produce nearly identical point estimates across all headline results — the choice of estimator does not drive the findings.

Appendix

Appendix · Heterogeneity

Heterogeneity by Wife's Baseline Labor Status — Full Results

Outcome (Women)	All Women	Employed	Unemployed	Inactive
Job Mobility (Sample 2∩3)
ATT	0.096***	0.148**	0.089*	0.018
Control mean	0.725	0.712	0.780	0.699
Labor-Market Aspirations (Sample 2∩3)
ATT	0.054	0.031	0.119	0.018
Course — Wife attends (Men, by wife's status)
ATT (men)	0.091	0.134*	0.051	0.042

Job mobility effects are concentrated among employed (+14.8 pp) and unemployed (+8.9 pp) women. Inactive women show near-zero effects. The course nomination effect is also largest when the wife is employed (+13.4 pp, p<0.10). Together, these results suggest information works at the margin where action is already feasible.

Appendix

Appendix · Exposure Patterns

Indirect vs. Direct Exposure — Spillovers Within Couples

Direct exposure (Sample 2∩3): Respondent personally engaged with WhatsApp chatbot + received endline reinforcement. Main analysis sample.
Indirect exposure: Respondent did not engage at midline, but their partner did. Column (2) in labor market table includes "direct or indirect T" — captures potential within-couple discussion spillovers.
Result: Women's job mobility under direct+indirect exposure = +7.5 pp (p=0.028) — somewhat smaller than direct only (+9.6 pp). Suggests some information diffuses within couple, but weaker than direct receipt.
Men's beliefs: Indirect exposure effects on men's community beliefs and course nomination are small and p>0.10 — consistent with low within-couple discussion of labor-market plans for men.

Interpretation: Joint exposure (both spouses engaged) produces the strongest and most coherent belief updates. Indirect exposure through a treated spouse is comparatively weak — direct delivery to each partner matters. This favors individual-level norm delivery rather than relying on couple-level diffusion.

Appendix

Appendix · Validity Checks — What do they do?

Near-Miss & Engager Diagnostics: Intuition

Near-miss timing: compare respondents reached easily at endline (November) vs. those reached only after extra effort (December–January). The hard-to-reach are a proxy for would-be attriters.

Plain English: "If the people who barely picked up the phone are statistically the same as easy-to-reach respondents on the variable I'm measuring, then the people who never picked up are probably also similar — so attrition isn't biasing my result."

Engager characterization (selective take-up): only 36% engage with the WhatsApp module. We compare engagers vs. non-engagers on (a) demographics and (b) the targeted second-order belief.

Plain English: "Engagers are more inactive (selection on demographics — fix with IPW step 2). But they hold the same prior on community support as non-engagers — so the disclosed norm is accurate for them, the people who actually received it."

Reference-group accuracy: max deviation of any subgroup's mean SOB from city-wide average is 3.3 pp. The misperception we are correcting is 28 pp → reference-group mismatch is < 12% of the corrected signal. Disclosed Bogotá-average norm is a valid proxy for every demographic subgroup.

Appendix

Appendix · Validity Checks

Near-Miss Timing Placebo & Engager Characterization

Near-Miss Timing Placebo

Endline ran Nov 18 – Jan 20 (63 days). "Hard to reach" = Dec–Jan (N=492); "Easy" = November (N=379)
Key belief variable: 2nd-order belief about men's support. Nov: 58.3; Dec–Jan: 59.4; diff. +1.2 pp (p>0.5)
No differential loss on the variable the treatment corrects → attrition unlikely to confound

Engager Characterization

Engagers (N=1,236) vs. non-engagers (N=2,228): more inactive (+11 pp), more care burden, fewer employed (−11 pp)
BUT: 2nd-order beliefs virtually identical (58.1 vs. 58.6, diff 0.5 pp, p>0.5) → disclosed norm is accurate for engagers' reference group
Engagement balanced across arms: 35.8% treated vs. 35.6% control

Reference group accuracy: Max deviation of any subgroup's mean SOB from city-wide average = 3.3 pp (high-SES). The corrected misperception is ~28 pp → reference group mismatch is <12% of the corrected signal. Disclosed norm is valid for all demographic subgroups in the sample.

Appendix

Appendix · Figures

Spousal Beliefs — IPWRA Estimates by Gender

IPWRA estimates of treatment effects on spousal second-order beliefs, by gender and norm. 90% CI. Sample 2∩3.

Appendix

Appendix · Mechanism

Mechanism: IV Mediation (Exploratory)

Setup: 2SLS system: treatment Z instruments mediator M (follow-up perceived community support); M instruments on labor outcomes Y. With one instrument and one mediator, the mediated share = 1 mechanically → interpreted as sign check, not a proportion estimate.
Sign pattern: Consistent with the proposed pathway. Updated perceived societal support → increased job search for women; updated work–family balance beliefs → increased aspiration for men.
Caveat: Cannot cleanly distinguish community-level vs. spousal-level channel, as both beliefs updated simultaneously (particularly under double exposure) — consistent with the treatment making the shared misperception salient and prompting within-couple discussion.
Belief updates close 15–25% of the gap between treatment and control on labor outcomes — the mediation channel is real but partial, consistent with norm correction being a necessary but not sufficient condition for full behavioral response.

The mediation exercise supplements rather than replaces the reduced-form evidence. We treat it as a consistency check on the sign pattern and direction of the channels.

Appendix

Appendix · Pre-registration & IRB

Pre-Registration, IRB, and Timeline

Pre-registration

AEA Social Science Registry · Trial ID AEARCTR-0014648. Pre-registered outcomes, samples, and specification before endline data collection.

IRB

IRB certificate from Pontificia Universidad Javeriana · Approved 2024-04-24. Both partners consented individually.

Stage	Date	N
Baseline survey (in-person/phone)	Jul–Sep 2024	3,464 adults (1,732 couples)
Randomization	End Oct 2024	1,732 couples (1:1)
Midline — WhatsApp chatbot	Oct–Nov 2024	1,236 engaged (36%)
Endline — phone survey	Nov 2024–Jan 2025	1,382 (≈40%)
Sample 2∩3 (both midline + endline)		1,102

Replication data and code available at doi.org/10.7910/DVN/QYWHLA.

Norms Behind Closed Doors:Misperceptions and Maternal Employment in Couples

Paper in a Nutshell

What?

Where?

How?

Why?

Pluralistic ignorance is global — and LAC is no exception

Three Research Questions

RQ 1 · Societal beliefs

RQ 2 · Spousal beliefs

RQ 3 · Decisions & labor

Why LAC? — high private support, frozen gap, no obvious culprit

Why couples: a joint decision, shaped by two complementary mechanisms

Experimental Design

A sample designed to be representative of Bogotá couples with young children

Couples

Eligibility

Frame

Design

Stage 1 — Baseline Survey & Belief Elicitation

Stage 2 — Randomization at the Couple Level

What Each Arm Saw — One Norm, Two Topics

Treatment Arm — Personalized Feedback on the Gender Norm

What Treated Respondents Saw at Step 3 — The Figure

Share of fathers in Bogotá who agree with the statement

Control Arm — Same Structure, Placebo Norm

Stage 3 — Midline: WhatsApp Engagement

Two outcomes: an intra-household decision (in-chatbot) and individual labor responses (endline)

Baseline Facts

Individual Attributes — Large Gender Gap in LFP

Household Attributes & Income Distribution

Both sexes underestimate community support — sharply for men, mildly for women

Empirical Strategy

The estimand, two threats, and three analysis samples

Threat 1 · Differential attrition

Threat 2 · Selective engagement

Sample 2

Sample 3

Sample 2 ∩ 3

Two-step IPWRA: re-weight for attrition, then estimate the ATT

Step 1 — Selection weights (one per sample S, estimated separately by gender)

Step 2 — IPWRA within each sample S

Inference: Fisher randomization as primary, Romano-Wolf for multiple testing

Primary inference · Fisher randomization-inference p-values

Multiple-testing correction · Romano-Wolf step-down (Appendix Table A·rwolf)

Beyond IPWRA: bounds, near-miss diagnostic, and weight sensitivity

Lee (2009) sharp bounds

Near-miss diagnostic — harder-to-reach respondents

Four weight specifications

Reference-group accuracy

Results

Preferences unchanged — but the binding misperception falls for men

Spillover into spousal beliefs — even with no information about the partner

Belief-corrected men nominate their wives 23 % more often for the course

Women act in the market; men report stronger work–family balance preferences

Two mechanisms, one chain — community→partner spillover & husband as gatekeeper, supported

The gatekeeper releases — but only where the wife is labor-market attached

The boundary of the norms channel — where it stops

Five defenses against social-desirability bias

① Timing placebo (Sample 3 ∖ 2)

② Real behavior > stated preferences

③ Course as revealed preference

④ Engager balance

⑤ Spillover to partner-specific beliefs (RQ2)

The Main Results Are Robust

Conclusions

What We Found

① Own attitudes don't move — beliefs about the partner do

② Within-couple decisions move

③ Women's labor decisions follow

④ The boundary: where the chain stops

What It Means

Appendix

Does Information Correct Community Second-Order Beliefs? Complete Results

Spousal Beliefs: +6 pp Perceived Partner Support

Men +9.1 pp More Likely to Nominate Wife for the Course

Women Search More (+10 pp) · Men Value Work–Family Balance More (+11 pp)

Baseline Beliefs: Target Norm & Placebo

Information does not diffuse automatically within couples

Full Baseline Beliefs — 8 Gender Norms

Norms Behind Closed Doors:
Misperceptions and Maternal Employment in Couples