← → Space · A=Appendix · G=Go to

Norms Behind Closed Doors:
Misperceptions and Maternal Employment in Couples

A Field Experiment in Bogotá, Colombia
Marie Boltz  ·  Monserrat Bustelo  ·  Ana María Díaz  ·  Agustina Suaya
U. Strasbourg / BETA  ·  IADB  ·  Pontificia Universidad Javeriana  ·  IADB
RIDGE Forum · Barbados, May 22, 2026 Pre-registered · AEARCTR-0014648  ·  Under Review EDCC Sample · 1,732 couples · Bogotá, Colombia
Paper in a Nutshell

Paper in a Nutshell

Where?
1,732 cohabiting couples with at least one child under 6 in Bogotá, Colombia — designed to be representative on observable characteristics benchmarked to census and GEIH data.
How?
WhatsApp chatbot with personalized feedback on the gap between the respondent's belief and the true Bogotá-wide support; endline phone survey 2–3 months later.

Main results.

  • Beliefs: the binding misperception (men about other men's support) falls −11 pp**; spousal belief about partner +6 pp** (spillover, both genders).
  • Intra-household decision: treated men are 23 % more likely (+9.1 pp) to nominate their wife for an online career-development course instead of taking it themselves.
  • Labor outcomes: treated women search for work / change jobs more (+9.6 pp***, RW p = 0.013); men's work–family balance preference +11 pp*. Aspirations null.
  • Boundary: the chain runs end-to-end only among labor-attached women — where structural barriers are already low.
Motivation · Pluralistic Ignorance

Pluralistic ignorance is global — and LAC is no exception

  • Bursztyn, Cappelen, Tungodden, Voena, Yanagizawa-Drott (2023): 60-country survey on support for women working outside the home
  • In nearly every region, both men and women underestimate how much others support women working — and the gap is remarkably similar in size across contexts (≈ 15–25 pp); LAC is no exception
  • Our focus is sharper: pluralistic ignorance specifically about mothers of young children working — the norm that binds exactly when the gender gap in employment widens
Pluralistic ignorance — Global and by region, LAC highlighted
Bursztyn et al. (2023). Bars: actual support (orange) vs. perceived support (navy) for women working. Global on the left of the dashed line; regions on the right; LAC framed in green as our setting.
Research Questions

Three Research Questions

RQ 3 · Decisions & labor
Do belief updates affect (a) intra-household allocation of a career-building course, and (b) short-run job search and labor outcomes?
Causal pathway
Information Community &
spousal beliefs
Intra-household decision Labor outcomes
Information corrects second-order beliefs (community first; partner beliefs follow) — which jointly feed the household decision.
Motivation · Why LAC

Why LAC? — high private support, frozen gap, no obvious culprit

The LAC puzzle. Female LFP stalled around 2002 even as fertility fell, education rose, and private support for women's work approached parity (Gasparini & Marchionni 2015; Marchionni, Edo & Berniell 2024). The deceleration is a normative puzzle, not an income or institutional one.

The question. If norms — not preferences, not institutions — are the binding constraint, then correcting beliefs about the norm should move behaviour.

Motivation · Unit of analysis

Why couples: a joint decision, shaped by two complementary mechanisms

COMMUNITY → PARTNER SPILLOVER
  • We deliver community information only — nothing about the partner
  • Yet beliefs about the partner also update
  • Leading interpretation: the treatment makes the shared community misperception salient and triggers within-couple discussion
HUSBAND AS GATEKEEPER (Bernhardt et al. 2018)
  • The perceived social cost of women's work falls on men — the husband's status is what is seen as diminished
  • Correcting his belief about what other men accept is what moves the household decision — not changing her own attitude

Why complementary. Without ①, the community correction would not trigger discussion within the household nor reach beliefs about the partner; without ②, updated beliefs would not translate into intra-household action. We cannot fully isolate ①'s channel — but the couples design lets us measure how within-household beliefs and decisions move together after treatment.

Section II

Experimental Design

1,732 couples · Bogotá · WhatsApp + phone · Three survey waves
Experimental Design Baseline Facts Results Heterogeneity Conclusions
Experimental Design · Sample

A sample designed to be representative of Bogotá couples with young children

Eligibility
≥1 child <6
Stage when gender gaps in LFP widen most sharply
Frame
Census & GEIH
Benchmarked to Colombia's population census and the Gran Encuesta Integrada de Hogares on demographics, income, education
Design
1:1
Randomization at couple level; stratified by wife's LFP and husband's first-order belief
  • Representative of Bogotá households with ≥1 child under 6 on the dimensions that matter for our reference group (education, income, woman's labor attachment) — so the measured norm (89% of fathers / 91% of mothers support maternal employment) is a credible true reference value
  • Both partners individually surveyed in private (in-person or phone, July–September 2024) — necessary to measure misperceptions about the partner without contamination
  • Household income: 28% low · 60% middle · 12% high — mirrors city distribution
  • Three survey waves: Baseline (Jul–Sep 2024) → Midline/WhatsApp (Oct 2024) → Endline by phone (Nov 2024–Jan 2025)
Experimental Design · Stage 1

Stage 1 — Baseline Survey & Belief Elicitation

Stage 1 Baseline
  • First-order belief: do you agree mothers should be free to work?
  • Second-order beliefs: community-level estimates (how many fathers/mothers agree?)
  • Spousal beliefs: what do you think your partner believes?
Experimental Design · Stage 2

Stage 2 — Randomization at the Couple Level

Stage 2 Randomization
  • Unit: couple (both partners receive the same arm) — eliminates within-household spillover from arm assignment
  • Stratification: wife's labor status, husband's first-order support, presence of children <6
  • Treatment: WhatsApp chatbot with actual Bogotá-level support; Control: placebo on public-transport subsidies
Experimental Design · Two Arms

What Each Arm Saw — One Norm, Two Topics

Treatment (866 couples)

"Mothers of children under six should be free to work for pay outside the home."

Gender-norm statement (target).

Control (866 couples)

Same chatbot, same four steps, same schedule. The only difference is the norm each arm sees. Any T–C gap in downstream behavior is attributable to corrected beliefs about gender norms.

Experimental Design · Treatment Chatbot

Treatment Arm — Personalized Feedback on the Gender Norm

N
Norms Studyonline
Hello 👋 In the baseline survey you answered a question about the statement:

"Mothers of children under six should be free to work for pay outside the home."10:02
You estimated that out of 100 fathers in Bogotá, 60 agree with this statement.10:02
Do you think your estimate matches the true share in Bogotá?10:02
Yes  •  No  •  Not sure
No10:03 ✓✓
Actual share in Bogotá (baseline data):
Fathers
89 / 100
Mothers
91 / 100
In fact, 89 out of 100 fathers and 91 out of 100 mothers in Bogotá agree with the statement.10:03
How does this information feel to you?10:03
Interesting  •  Irrelevant  •  Disappointing
Treatment arm — gender norm
The four steps (as in the paper)
  • Step 1 — Recall: shows the respondent's own baseline estimate of fathers' / mothers' agreement
  • Step 2 — Check: asks whether that estimate matches reality
  • Step 3 — Reveal: shows the actual share computed from the baseline (as a WhatsApp message with numbers and emojis)
  • Step 4 — Rate: asks the respondent to rate the discrepancy — interesting / irrelevant / disappointing

The same four steps are repeated for beliefs about men's and women's support, with order randomized.

Experimental Design · Treatment — The Reveal

What Treated Respondents Saw at Step 3 — The Figure

Share of fathers in Bogotá who agree with the statement

"Mothers of children under six should be free to work for pay outside the home."
Your estimate(from the baseline survey)
60%
60 / 100
Actual share(measured in Bogotá)
89%
89 / 100
+29 pp  — fathers in Bogotá support working mothers much more than respondents believe
Experimental Design · Control Chatbot

Control Arm — Same Structure, Placebo Norm

N
Norms Studyonline
Hello 👋 In the baseline survey you answered a question about the statement:

"Companies should subsidize public transport."10:02
You estimated that out of 100 people in Bogotá, 75 agree with this statement.10:02
Do you think your estimate matches the true share in Bogotá?10:02
Yes  •  No  •  Not sure
Not sure10:03 ✓✓
Actual share in Bogotá (baseline data):
Men
94 / 100
Women
95 / 100
In fact, 94 out of 100 men and 95 out of 100 women in Bogotá agree.10:03
How does this information feel to you?10:03
Interesting  •  Irrelevant  •  Disappointing
Control arm — placebo norm
Why this placebo
  • Same channel (WhatsApp chatbot), same format, same four-step sequence, same schedule
  • Unrelated topic: attitudes toward corporate subsidies for public transport
  • Shares the belief-elicitation mechanics without addressing gender norms
Experimental Design · Stage 3

Stage 3 — Midline: WhatsApp Engagement

Stage 3 Midline
  • Delivery: WhatsApp chatbot with 4 interactive steps (Sep–Oct 2024)
  • Engagement: only ~29% of couples completed the interaction (501 of 1,732 T; 518 of 1,732 C)
  • Note: low engagement suggests digital interventions face uptake barriers in this population
Experimental Design · Two outcome measures

Two outcomes: an intra-household decision (in-chatbot) and individual labor responses (endline)

Follow-up timeline
① Intra-household decision · elicited at the end of the WhatsApp chatbot
  • Course nomination — one real online career course per household, keep it or give to partner
  • Zero-sum, revealed-preference choice with direct personal cost
  • Captures the joint household allocation decision, immediately after belief correction
② Individual labor outcomes · endline phone survey, 2–3 months later
  • Job search effort, mobility, aspirations, work–family balance
  • 1,382 of 3,464 individuals re-interviewed (≈40%)
  • Measured before any treatment reinforcement
Section III

Baseline Facts

Near-universal private support — and a 28 pp misperception
Experimental Design Baseline Facts Results Heterogeneity Conclusions
Baseline Facts · Sample

Individual Attributes — Large Gender Gap in LFP

Variable Husbands Wives Δ
Demographics
Age (years)34.932.02.8***
Education
Low14.3%10.8%3.5***
Medium69.5%71.1%−1.6
High16.2%18.1%−2.0
Employment Status
Employed90.5%52.0%38.5***
Unemployed5.0%6.3%−1.3
Inactive4.5%41.7%−37.2***
Weekly hours48.737.611.1***
Job Flexibility
High23.6%33.0%−9.4***
Some27.2%31.5%−4.3**
None48.9%35.1%13.8***
Job Search
Looking for job10.6%16.2%−5.5***
Start business9.1%7.2%1.9**
↳ Actively pursuing job mobility (sum)19.7%23.4%−3.7*
Not looking, but would like to49.3%51.8%−2.5
Satisfied31.0%24.9%6.2***
Baseline Facts · Household

Household Attributes & Income Distribution

Characteristic Sample Composition
Household Size & Composition
Average household size3.8 members
Children under 6 per HH1.13
HH with child <6 not in childcare27.6%
HH with member needing permanent care32.0%
Household Income Category
Low income (<1.3M COP)28%~$6,200 USD
Middle income (1.3–3.9M COP)60%~$6,200–$18,600 USD
High income (>3.9M COP)12%>$18,600 USD
Sample Composition
Total households1,732
Total individuals3,464 (1,732 couples)
Baseline Facts · Misperception of community support   📊 full table

Both sexes underestimate community support — sharply for men, mildly for women

Distribution of beliefs about men's support for maternal employment
True share
89 %
Perceived (peak)
60 %
Gap
27–33 pp
Men's support misperception distribution
Distribution of beliefs about women's support for maternal employment
True share
91 %
Perceived (peak)
80 %
Gap
10–11 pp
Women's support misperception distribution

Pluralistic ignorance runs on both sides. Both husbands and wives underestimate community support for maternal employment — sharply for men's support (≈ 30 pp), more mildly for women's (≈ 10 pp). The intervention corrects misperceptions about both by giving each spouse the true Bogotá distribution.

Section IV

Empirical Strategy

IPWRA · Attrition correction · Multiple testing
Experimental Design Baseline Facts Empirical Strategy Results Heterogeneity Conclusions
Empirical Strategy · Setup

The estimand, two threats, and three analysis samples

Threat 1 · Differential attrition

Baseline respondents are not all observed at midline / endline. If survey participation correlates with potential outcomes — and especially if it does so differently by treatment arm — ITT estimates are biased.

Threat 2 · Selective engagement

Only ~29% of treated couples complete the WhatsApp chatbot. Engagers can differ from non-engagers on covariates that also predict outcomes → covariate imbalance within the realized sample.

Three analysis samples — one for each class of outcome:

Sample 2

Chatbot completers → course nomination (decision elicited inside the WhatsApp module)

Sample 3

Endline survey → labor outcomes (job search, mobility, aspirations, work–family balance)

Sample 2 ∩ 3

Chatbot + endline → community & spousal second-order beliefs

Empirical Strategy · IPWRA

Two-step IPWRA: re-weight for attrition, then estimate the ATT

Step 2 — IPWRA within each sample S

Step-1 weights wiS enter as pweights. The estimator combines:

  • Treatment model e(XiD, strata) = Pr(D=1│X, strata) → propensity score
  • Outcome models md(XiD, strata) = E[Y│D=d, X, strata], d ∈ {0,1}

Reweights the control group toward the covariate distribution of the treated; adjusts flexibly for remaining differences via the outcome regression.

Doubly-robust property. The ATT is consistently estimated if either the treatment model e(·) or the outcome models md(·) are correctly specified — we do not need both. Reported with cluster-robust analytic variances at the household level.

Empirical Strategy · Inference

Inference: Fisher randomization as primary, Romano-Wolf for multiple testing

Multiple-testing correction · Romano-Wolf step-down (Appendix Table A·rwolf)
  • Step-down adjustment within each outcome family — F1 community beliefs · F2 spousal beliefs · F3 course outcomes · F4 labor outcomes
  • Controls family-wise error rate while preserving power vs. Bonferroni
  • Computed on unweighted OLS → conservative relative to our preferred IPWRA estimates (treats RW as a stress test of the headline pattern)

Headline robustness. Women's job mobility survives Romano-Wolf double-starred (RW p = 0.013). Men's course nomination and work–family balance are suggestive under RW but the Fisher p-values remain below 0.10.

Empirical Strategy · Attrition robustness

Beyond IPWRA: bounds, near-miss diagnostic, and weight sensitivity

Lee (2009) sharp bounds

Under monotone selection (treatment never makes someone less likely to be observed), trim the over-represented arm at the appropriate quantile to get sharp upper/lower bounds on the ATT — without parametric assumptions on selection. Reported in Appendix Table A·leebounds for all main outcomes where monotonicity is plausible.

Near-miss diagnostic — harder-to-reach respondents

Compare November (easy) vs. December–January (hard-to-reach) endline respondents on baseline covariates. The variable the treatment directly corrects — community SOB about men's support — does not differ: 58.3 vs. 59.4 (p>0.5). A 3.7 pp gap on FOB exists, but FOB is a stratification variable.

Four weight specifications
  • Baseline probit-PS (main spec)
  • Weights winsorised at p95
  • Trimmed sample: drop PS < 0.10
  • Alternative logit-PS specification

Headline results stable across all four (Appendix Table A·ipwrasens).

Reference-group accuracy

For the disclosed Bogotá-average to be the right anchor, engagers and non-engagers must share the targeted prior. They do: 58.1 vs. 58.6, p>0.5. Maximum subgroup deviation from the city-wide mean is 3.3 pp — far below the 28 pp misperception being corrected.

Macours (2025): only 28% of dev-econ RCTs explicitly correct for attrition in estimation; a further 23% stop at balance tests on attrition rates. Our IPWRA + Lee + near-miss design addresses both the estimation and the diagnostic gap.

Section V

Results

Community beliefs · Spousal beliefs · Course allocation · Labor market outcomes
Experimental Design Baseline Facts Empirical Strategy Results Heterogeneity Conclusions
Results · RQ1 — Community beliefs   📊 full table

Preferences unchanged — but the binding misperception falls for men

Men
N = 453 · Sample 2∩3
Own belief (FOB) ━  null
Belief about 's support +2.8 pp
Belief about 's support +4.5 pp
⚡ Misperception of ♂'s −11 pp ★★
Misperception of ♀'s ━  null
Women
N = 649 · Sample 2∩3
Own belief (FOB) ━  null
Belief about 's support +2.6 pp
Belief about 's support +2.3 pp
Misperception of ♂'s ━  null
Misperception of ♀'s ━  null
The binding misperception falls — men only · P[underestimate ≥ 5 pp]
Control men
♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂
83 of 100
underestimate
Treated men
♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂
72 of 100
−11 pp ★★

Why binary > continuous? The mean SOB shifts only +2.8 pp (Panel B), but the share who substantially underestimate (≥5 pp) drops 11 pp. The treatment compresses the distribution toward truth: a meaningful share of men crosses out of substantially underestimating, even when the average estimate moves modestly. Preferences (FOB) are unchanged.

Definitions: Misperception of ♂'s/♀'s = 1 if respondent underestimates true community support by ≥ 5 pp. ★ p<0.10 · ★★ p<0.05 · IPWRA · Fisher randomization p-values

Results · RQ2 — Spousal beliefs   📊 full table

Spillover into spousal beliefs — even with no information about the partner

Men
N = 453 · Sample 2∩3
Own attitude (first-order belief) ━  null
⚡ Belief about ♀'s support +5.9 pp ★★
Women
N = 649 · Sample 2∩3
Own attitude (first-order belief) ━  null
⚡ Belief about ♂'s support +6.3 pp ★★
The spillover — community correction lifts perceived partner support, even from an already-high baseline
♂ men's belief about ♀'s support for maternal employment
Control
90.9 %
Treated
96.8 %
+5.9 pp ★★
♀ women's belief about ♂'s support for maternal employment
Control
86.8 %
Treated
93.1 %
+6.3 pp ★★

Why this is striking. At baseline, spouses are already mildly optimistic about their partner (within-couple gap ≈ 1–3 pp). The dominant friction is at the community level, not within the couple. Yet correcting community beliefs lifts the partner-belief further. Leading interpretation: the treatment makes the shared misperception salient and triggers within-couple discussion — though the channel is not separately identified.

Why is the binary misperception null? The spousal misperception indicator is defined differently from RQ1: =1 if the respondent's dichotomised belief about the partner doesn't match the partner's actual yes/no support (no 5 pp threshold). At baseline most respondents already get the direction right (control mean ≈ 18 %), so there is little room for the binary to move — the action is in the continuous belief level.

★★ p<0.05 · ★★★ p<0.01 · IPWRA · Fisher randomization p-values

Results · RQ3 — Course allocation   📊 full table

Belief-corrected men nominate their wives 23 % more often for the course

Men
N = 373 · Sample 2
Control: nominate wife
40 %
Treated
49 %
+9.1 pp  (+23 %)   ★★
Fisher exact p = 0.011 · IPWRA p = 0.104
Women
N = 644 · Sample 2
Control: nominate self
84 %
Treated
84 %
━  ceiling effect
Already 84% nominate themselves — no room to move
Out of every 10 men, one more nominates his wife after the intervention
Control men
♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂
4 of 10
nominate wife
Treated men
♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂ ♂
5 of 10
+1 ★★

Robustness. Effect for men stable across all 4 IPWRA weight specs (0.091–0.094); Lee sharp bounds strictly positive at 0.8 % trimming. Secondary outcomes (Q&A backup): own interest in course — null for both genders; belief about partner's interest — null. The decision shifts; the stated preferences do not.

★★ p<0.05 · Course nomination is a zero-sum revealed-preference choice elicited inside the WhatsApp chatbot

Results · RQ4 — Labor outcomes   📊 full table

Women act in the market; men report stronger work–family balance preferences

Men
N = 453 · Sample 2∩3
Job mobility ━  null
LM aspirations ━  null
Work–family balance (stated preference) +11 pp

WFB is a stated preference — interpret with caution (possible social-desirability echo; see SDB slide). The clear behavioural result is on the women's side.

Women
N = 649 · Sample 2∩3
⚡ Job mobility +9.6 pp ★★★
LM aspirations ━ null
Work–family balance ━ null

Control mean 73 % → treated 82 % (+13 % relative). Survives Romano-Wolf: RW p = 0.013.

The asymmetric response — same belief update, different behaviour
♂ men's work–family balance preference
Control
32 %
Treated
43 %
+11 pp ★
♀ women's job mobility (changed job / started business)
Control
73 %
Treated
82 %
+9.6 pp ★★★

Not social desirability — job mobility. Placebo arm (Sample 3∖2, endline-only respondents who never saw the chatbot) shows negative coefficients on job mobility (−8.7 pp for women, n.s.) → rules out a survey-response artifact. Women's mobility survives Romano-Wolf (RW p = 0.013).   Men's work–family balance is a stated preference — interpret with caveat. Possible social-desirability echo of the treatment script. Main defense: placebo arm shows no analogous shift, and the effect persists in Sample 3 (+6.6 pp ★) under a separate window. Aspirations null for both genders.

★ p<0.10 · ★★★ p<0.01 · IPWRA · Fisher randomization p-values

Results · Mechanisms

Two mechanisms, one chain — community→partner spillover & husband as gatekeeper, supported

COMMUNITY → PARTNER SPILLOVER

The setup. We deliver community-level information only — nothing about the partner. If the treatment touched only community beliefs, beliefs about the partner should be unaffected.

The result. Belief about the partner's support moves +6 pp ★★ for both men and women — from an already-optimistic baseline.

Leading interpretation: the treatment makes the shared misperception salient and triggers within-couple discussion. The channel is not separately identified.

HUSBAND AS GATEKEEPER

The theory. Bernhardt et al. (2018): the social cost of women's work falls on men. So the belief to correct is his — what he thinks other men accept — not her own attitude.

  • Men's misperception of ♂'s support: −11 pp ★★   (the binding belief moves)
  • Treated men then nominate wife: +9.1 pp, Fisher p = 0.011   (the gatekeeper releases)

Women already nominate themselves 84 % — it is the husband's belief that has to move, not the wife's.

The 4-step chain that ties them together
Community info Husband's belief about other men  −11 pp Husband's belief about wife  +6 pp Husband releases as gatekeeper  +9 pp course Wife moves in the market  +9.6 pp mobility ***
Results · Heterogeneity (mechanism evidence)

The gatekeeper releases — but only where the wife is labor-market attached

Heterogeneity by wife's labor attachment — Course, Job mobility, Aspirations
Wife active (green): the chain runs — Course nomination by husband +12.8 pp ★, Job mobility +10.8 pp ★★, Aspirations +10.6 pp ★. Wife inactive (orange): all three point estimates are smaller and not statistically significant (Course −0.9 pp, Job mobility +6.1 pp, Aspirations +0.4 pp). Genuine boundary for Course (Δ −13.7 pp ★★★) and Aspirations (Δ −10.2 pp ★★★); for Job mobility the interaction is also n.s. (Δ −4.7 pp), so we cannot reject equal effects across groups, but neither can we claim the treatment works for inactive women on its own. The husband only releases as gatekeeper, and the women only act, where there is an actionable labor-market margin. Interaction model with Zi=1 if wife inactive; wife inactivity is a stratification variable → heterogeneity is causal. Magnitudes are IPW estimates; interaction stars from OLS Fisher randomization-inference p-values (paper Table G, Panel B Men for Course, Panel C Women for Job mobility and Aspirations).
Results · Synthesis

The boundary of the norms channel — where it stops

  • Among inactive wives, the chain breaks down:
    • Course nomination by husband ≈ 0
    • Labor-market aspirations ≈ 0
    • Job mobility — not statistically significant in this group
  • Structural barriers (childcare, labor demand, skill certification) are the first-order constraint here — information about norms is not enough
  • Information is insufficient where participation constraints bind: correcting the husband's beliefs about other men does not translate into intra-household action when there is no actionable labor-market margin for the wife
  • Consistent with Afridi, Dhillon, Roy & Sangwan (2023): when access and information are offered but structural barriers bind, women's outside-the-home employment does not move

Policy implication. Information is a low-cost lever where structural conditions already permit. Where they don't, norms and structure are complements, not substitutes — childcare, demand, and certification are the binding tools.

Robustness · Social desirability bias

Five defenses against social-desirability bias

① Timing placebo (Sample 3 ∖ 2)

Endline-only respondents received the info after their behavioral window. If results were SDB, job mobility would still appear here. Coefficient is negative / null for women (paper Table, col 3) → rules out a survey-response artefact.

② Real behavior > stated preferences

SDB would predict stated preferences (aspirations) move most. We find the opposite: aspirations null; behavior (course nomination +9.1 pp, job mobility +9.6 pp ★★★) moves.

③ Course as revealed preference

Zero-sum, costly choice (give up own slot to wife). Hard to drive via cheap talk — the husband faces a real loss if he nominates wife only to please the experimenter.

④ Engager balance

Treatment engagers vs. control engagers have near-identical baseline beliefs (max gap 2.5 pp on any belief variable) → no selection on motivated reasoning (paper §6, engager diagnostics).

⑤ Spillover to partner-specific beliefs (RQ2)

Treated respondents update beliefs about their own partner, whose views they were never told. Inventing partner-specific beliefs to please the experimenter is implausible — these are private, idiosyncratic targets. The spillover (+6 pp for both) is hard to reconcile with SDB.

One exception we flag honestly: men's work–family balance preference (+11 pp ★) is a stated preference; the timing placebo helps but a demand effect cannot be fully ruled out. We interpret it conservatively (paper §6.3).

Robustness

The Main Results Are Robust

Lee (2009) Sharp Bounds
  • Trimming fraction for course (men) < 0.8% — both bounds strictly positive
  • Women's job search: Lee bounds positive; belief outcomes include zero
  • Attrition-robust: monotonicity holds for course (Sample 2 = midline engagers)
IPWRA Sensitivity (4 specs)
  • Baseline probit · Winsorized (p95) · Trimmed (PS<0.10) · Logit PS
  • Course men: 0.091–0.094 across all specs; crosses 10% threshold under winsorized weights
  • Women job search: 0.091–0.096 (p=0.006–0.009) — stable and significant
  • Work–family men: fragile to trimming (p=0.198 with 11 high-leverage obs dropped)
Romano-Wolf Step-Down
  • F1 (community beliefs): RW p > 0.40
  • F3 (course, men): RW p = 0.151 (does not clear 10% under MHT correction; suggestive)
  • F4 (men's work–family balance): RW p = 0.077 (clears 10% threshold)
  • F4 (women's job mobility): RW p = 0.013 — survives MHT at 5%
Additional Checks
  • Near-miss timing placebo: no differential loss on key belief variables
  • Engager characterization: beliefs identical between engagers and non-engagers
  • Reference group accuracy: disclosed norm accurate within ±3.3 pp for all subgroups
Section VII

Conclusions

What we found · What it means · What comes next
Experimental Design Baseline Facts Empirical Strategy Results Conclusions
Conclusions · Summary

What We Found

② Within-couple decisions move
Treated men are 9.1 pp (+23%) more likely to nominate their wife for a career-development course. A zero-sum allocation with direct personal cost — a lower bound on the willingness to invest in the wife's career.
③ Women's labor decisions follow
Treated women report +9.6 pp more job mobility in 1–2 months (p=0.006; RW p=0.013). Placebo timing check rules out social desirability. The behavioral chain — SOB → bargaining → action — runs end-to-end.
④ The boundary: where the chain stops
For households with inactive wives, beliefs partially update (the inference channel still operates), but the chain breaks at the behavioral translation step: course allocation and labor-market margins do not move because there is no actionable margin for the husband to act on. Information about norms complements, not substitutes for, structural policy.
Conclusions · Implications

What It Means

  • The binding constraint is shared misperception about the community, not about each other. Spouses are already mildly optimistic about their partner's support; the friction sustaining the FLFP gap is the gap between private attitudes and what each spouse believes the broader community endorses. Correcting that community-level misperception is what unblocks couple-level bargaining at low cost.
  • Couples are the right unit of analysis. Single-respondent designs miss the spillover from community to partner beliefs and the within-household allocation margin. Surveying both spouses lets us measure and control the within-household belief gap — and track how beliefs and decisions move together after treatment.
  • We identify the boundary of the norms channel. The chain runs end-to-end (SOB → bargaining → labor decisions) among labor-attached households and is silent among inactive women. The asymmetry is itself a policy lesson: information is a low-cost tool for the first group; structural policy (childcare, demand, certification) is the binding tool for the second. Norms and structure are complements, not substitutes.
  • Scalability: WhatsApp-based norm correction is low-cost and digitally deliverable at scale in LAC cities. Engagement rates (~36%) are typical for low-cost digital interventions but underscore the importance of sustained exposure (endline reinforcement was necessary to achieve the full effect).
Appendix

Appendix

Full tables · Robustness · Diagnostics · Additional results
A2 Full beliefs A3 Romano-Wolf (+ intuition) A4 Lee bounds (+ intuition) A5 IPWRA sensitivity (+ intuition) A6 Balance tests A7 Attrition A8 PS overlap A9 OLS vs IPWRA A10 Het table A11 Indirect effects A12 Near-miss + Engagers (+ intuition)
Appendix · backup table
Results · RQ1 — Community Beliefs

Does Information Correct Community Second-Order Beliefs? Complete Results

Outcome First-Order
Belief (1)
2nd-Order:
Men's Support (2)
2nd-Order:
Women's Support (3)
Misperception
Men D (4)
Misperception
Women D (5)
Panel A — All (N = 1,102)
ATT 0.001 2.75* 3.58** −0.054* −0.047
Control mean 0.902 63.2 75.8 0.832 0.553
Panel B — Men (N = 453)
ATT −0.004 2.82 4.50* −0.110** −0.047
Control mean 0.874 65.6 75.3 0.830 0.566
Panel C — Women (N = 649)
ATT −0.002 2.57 2.30 −0.018 −0.040
Control mean 0.930 61.4 76.8 0.840 0.536

Key finding 2: Community beliefs do correct. Men's misperception of male support falls −11 pp (p=0.031). Women perceive male support +4.5 pp (p=0.058).

Sample: Respondents in both midline and endline surveys (Sample 2∩3, N=1,102). ATT = Average Treatment effect on the Treated (IPWRA, 90% CI). Control means are unadjusted baseline/endline values.
Appendix · backup table
Results · RQ2 — Spousal Beliefs

Spousal Beliefs: +6 pp Perceived Partner Support

Working Mothers Equal Task Sharing
Panel 1st-order 2nd-order (Spouse) Misperception D 1st-order 2nd-order (Spouse) Misperception D
Panel A — All (N = 1,102)
ATT 0.009 0.063*** −0.028 0.021** 0.038** −0.027
Control mean 0.900 0.885 0.184 0.965 0.899 0.104
Panel B — Men (N = 453)
ATT 0.010 0.059** −0.046 0.013 −0.004 −0.002
Panel C — Women (N = 649)
ATT 0.004 0.063** −0.004 0.025*** 0.061** −0.040

Working mothers: Perceived spousal support rises by 6.3 pp (p=0.001) — similar for men (+5.9 pp) and women (+6.3 pp). Community-level correction spills over into within-couple beliefs.

Appendix · backup table
Results · RQ3a — Course Allocation

Men +9.1 pp More Likely to Nominate Wife for the Course

Wife Should
Attend Course (1)
Are You
Interested? (2)
Is Partner
Interested? (3)
Panel A — All (N = 1,017)
ATT 0.023 −0.014 −0.022
Control mean0.6880.7930.460
Panel B — Men (N = 373)
ATT 0.091
(p = 0.104)
−0.046 0.018
Control mean0.4020.7430.574
Panel C — Women (N = 644)
ATT −0.006 0.007 −0.019
Control mean0.8410.8190.374

Men (+9.1 pp, +23%): IPWRA p=0.104; Fisher exact p=0.011; Lee bounds strictly positive at 0.8% trimming; stable across all 4 IPWRA weight specs (0.091–0.094). Suggestive but credible.

Appendix · backup table
Results · RQ3b — Labor Market

Women Search More (+10 pp) · Men Value Work–Family Balance More (+11 pp)

Job Mobility Aspires Better LM Work–Family Balance
Sample 2∩3 (1) Placebo 3∖2 (3) Sample 2∩3 (4) Sample 3 (5) Sample 2∩3 (6) Sample 3 (7)
Panel A — All (N = 1,102)
ATT 0.058* −0.056 0.005 0.000 0.052 0.038
Panel B — Men (N = 453)
ATT 0.012 −0.048 −0.042 −0.027 0.110* 0.066*
Control mean 0.664 0.492 0.317
Panel C — Women (N = 649)
ATT 0.096*** −0.087 0.054 0.018 −0.006 0.015
Control mean 0.725 0.507 0.361

Women's job mobility: +9.6 pp (p=0.006), +13% relative to control. Placebo negative and p>0.10 → not social desirability. RW p=0.013.

Men's work–family balance: +11 pp (p=0.054), +35% relative to control. Robust in Sample 3 (+6.6 pp, p=0.099). Aspirations: null for both.

Appendix · backup table
Appendix · Baseline beliefs — full table

Baseline Beliefs: Target Norm & Placebo

Belief Type Husbands Wives Difference
A. Target Norm: "Mothers of children <6 should be free to work"
First-order (own view)88.5%90.5%−2.0 pp**
Second-order: Men (estimate of fathers)61.0%55.7%+5.3 pp***
Second-order: Women (estimate of mothers)79.6%80.0%−0.4 pp
Spousal second-order93.9%89.9%+4.1 pp***
B. Placebo Norm: "Companies should subsidize public transport"
First-order (own view)93.5%94.9%−1.4 pp***

N = 1,732 couples. High first-order support for both norms (88–95%). Misperception concentrated on father's support for maternal employment (gap: 27–33 pp). Placebo norm shows no such gap.

Appendix
Appendix · Within-couple exposure (descriptive)

Information does not diffuse automatically within couples

⚠ Caveat — endogeneity: we did not randomize which spouse engaged with the WhatsApp module. Direct / indirect / joint exposure configurations are endogenous. We therefore interpret these results as descriptive, not causal.

Information diffusion patterns

Direct (own engagement): spousal SOB +6.2 pp — replicates main result

Indirect (only partner): spousal SOB ≈ 0; men's 1st-order ↓ 4–6 pp (possible reactance)

Joint (both partners): spousal SOB +8.9 pp; course allocation +9–10 pp — strongest

Appendix
Appendix · Baseline

Full Baseline Beliefs — 8 Gender Norms

Norm Statement Men
1st-order
Women
1st-order
Men's est.
men's support
Men's est.
women's support
Mothers with children <6 should be free to work88.5%90.5%61.0%79.6%
Fathers and mothers should share childcare equally
Children suffer when mother works
Problems arise if wife earns more than husband
Placebo: companies should subsidize green transport93.5%94.9%

Across all 8 gender-norm items, the same pattern holds: progressive private attitudes coexist with sizable misperceptions about others' views, particularly men's support. The placebo shows near-universal agreement and no misperceptions — confirming misperceptions are norm-specific, not general pessimism.

Equal tasks beliefs figure
Appendix
Appendix · Multiple Testing — What does it do?

Romano-Wolf Step-Down: Intuition

What we do: Romano-Wolf adjusts each p-value by simulating the joint distribution of all test statistics under the null (1,000 bootstrap replications, clustered by household). It produces a family-wise error rate–controlled p-value for every outcome, accounting for the dependence structure between them.

Plain English: "Even if I throw lots of outcomes at this experiment, here is the p-value adjusted for the fact that I am fishing in many ponds. Results that survive RW are not lucky strikes."

  • Computed on unweighted OLS — conservative relative to IPWRA
  • Outcome families: F1 community beliefs · F2 spousal beliefs · F3 course allocation · F4 labor
  • Headline: women's job mobility survives (RW p = 0.013); men's course nomination is marginal (RW p = 0.077)
Appendix
Appendix · Multiple Testing

Romano-Wolf Step-Down p-values

Outcome Family & Variable OLS coef. Fisher p RW p Survives?
F1 — Community Beliefs (4 outcomes)
Perceived men's support+2.75(0.089)(0.40+)
Misperception indicator, men−0.054(0.061)(>0.40)
F3 — Course Allocation (men only)
Wife should attend course (men)+0.0910.0110.077marginal
F4 — Labor Outcomes (women)
Job mobility (women, Sample 2∩3)+0.0960.0060.013 **
Aspires better LM (women)+0.054(0.223)(>0.49)

Notes: Romano-Wolf computed on OLS (unweighted) — conservative vs. IPWRA. 1,000 replications, seed(12345), clustered by household. Women's job mobility survives stepdown correction (RW p=0.013). Course nomination for men is marginal (RW p=0.077).

Appendix
Appendix · Attrition Robustness — What does it do?

Lee (2009) Sharp Bounds: Intuition

What we do: Trim observations from the lower-attrition arm to make response rates equal across treatment and control. Then compute the worst-case and best-case ATT — the interval brackets all possible values consistent with monotonicity (treatment doesn't change who attrits).

Plain English: "Imagine the absolute worst possible scenario about who dropped out. Even then, my treatment effect lies somewhere in this range. If both ends of the range exclude zero, my result holds even under unobservable bias."

  • Headline: Both bounds strictly positive for men's course nomination and women's job search
  • Belief outcomes: bounds include zero — consistent with no robust belief effects
  • Key assumption: monotonicity (treatment doesn't push you to drop out)
Appendix
Appendix · Attrition Robustness

Lee (2009) Sharp Bounds

Outcome ATT (IPWRA) Lower Bound Upper Bound Trimming % Both Positive?
Course Allocation
Wife attends course (men)0.0910.0790.1060.8%✓ Yes
Labor Market (Sample 2∩3)
Job mobility (women)0.0960.0480.1342.1%✓ Yes
Work–family balance (men)0.110−0.0080.2141.6%
Community Beliefs (Sample 2∩3)
Perceived men's support2.75−1.2+6.5

Interpretation: Lee bounds apply when treatment monotonically increases probability of being in sample. For the course (Sample 2), this is satisfied by design (engagers). Both bounds strictly positive for men's course nomination and women's job search — key results hold under worst-case attrition scenarios consistent with monotonicity.

Appendix
Appendix · IPWRA Sensitivity — What does it do?

IPWRA Sensitivity: Intuition

What we do: re-estimate the IPWRA ATT under 4 alternative weight constructions to check that headline results don't depend on the most extreme weights:

  • (i) Baseline: probit propensity score (preferred)
  • (ii) Winsorised: cap weights at 95th percentile
  • (iii) Trimmed: drop observations with PS < 0.10 (≈ 1% of obs)
  • (iv) Logit PS: alternative functional form for the selection model

Plain English: "If a handful of unusual observations were driving my result, the estimate would change a lot when I cap or drop them. It doesn't change → my result is robust, not artifact of extreme weights."

Headline: men's course estimate stable at 0.091–0.094 across all 4 specs; women's job mobility stable at 0.091–0.096.

Appendix
Appendix · IPWRA Sensitivity

IPWRA Sensitivity to Alternative Weight Specifications

Specification Course (men)
coef. / p
Job mobility (women)
coef. / p
Work–family (men)
coef. / p
Headline: Probit PS weights (untrimmed)
Baseline0.091 / (0.115)0.096 / (0.006)0.110 / (0.054)
Sensitivity checks
Winsorized (cap p95)0.094 / (0.086) *0.091 / (0.009)0.101 / (0.057)
Trimmed (drop PS < 0.10, N−11)0.091 / (0.108)0.096 / (0.006)0.072 / (0.198) ✗
Logit PS0.094 / (0.105)0.095 / (0.007)0.109 / (0.051)

Course (men): Estimate stable at 0.091–0.094 across all 4 specs. Crosses 10% threshold under winsorized weights. Lee bounds positive → the 9 pp estimate is credible.

Work–family balance (men): Fragile to trimming — 11 high-leverage observations matter. Interpret cautiously; direction consistent but precision conditional on those obs.

Appendix
Appendix · Diagnostics

Balance Tests: Treatment Assignment

Balance table

After IPWRA weighting, maximum absolute standardized mean differences (SMDs) are below 0.10 in all samples and genders. Some covariates show marginal imbalance in Sample 2∩3 (joint F tests reject), but effect sizes are small, and post-weighting balance is tight. The key variable — second-order belief about men's community support — does not differ significantly across treatment and control arms in any sample.

Appendix
Appendix · Attrition

Attrition Diagnostics

Attrition diagnostics

Endline attrition: 40% response rate. Attritors are more likely to be employed and younger — consistent with time availability. After weighting, SMDs < 0.10.

Appendix
Appendix · Diagnostics

Propensity Score Overlap

Treatment PS — All
Treatment PS overlap
Attrition PS — All
Attrition PS overlap

Propensity scores range from 0.06 to 0.76; overlap is adequate in all samples. Effective sample sizes remain large after weighting; mass outside common support is small. P-score densities from 0.06–0.76 → no extreme regions of non-overlap that would invalidate IPWRA.

Appendix
Appendix · Specification

OLS vs. IPWRA: Estimates Are Similar

Outcome OLS OLS +
weights
IPWRA
(preferred)
Direction
consistent?
Beliefs — Men's community SOB (men only)
Perceived men's support+3.1*+2.9*+2.82
Course — Wife attends (men only)
Wife should attend course+0.087*+0.089+0.091
Labor — Job mobility (women, Sample 2∩3)
Job mobility+0.094***+0.096***+0.096***
Labor — Work–family balance (men)
Wants work–family balance+0.108*+0.109*+0.110*

IPWRA is the preferred specification chosen a priori to address selection into midline take-up. OLS and weighted-OLS produce nearly identical point estimates across all headline results — the choice of estimator does not drive the findings.

Appendix
Appendix · Heterogeneity

Heterogeneity by Wife's Baseline Labor Status — Full Results

Outcome (Women) All Women Employed Unemployed Inactive
Job Mobility (Sample 2∩3)
ATT0.096***0.148**0.089*0.018
Control mean0.7250.7120.7800.699
Labor-Market Aspirations (Sample 2∩3)
ATT0.0540.0310.1190.018
Course — Wife attends (Men, by wife's status)
ATT (men)0.0910.134*0.0510.042

Job mobility effects are concentrated among employed (+14.8 pp) and unemployed (+8.9 pp) women. Inactive women show near-zero effects. The course nomination effect is also largest when the wife is employed (+13.4 pp, p<0.10). Together, these results suggest information works at the margin where action is already feasible.

Appendix
Appendix · Exposure Patterns

Indirect vs. Direct Exposure — Spillovers Within Couples

  • Direct exposure (Sample 2∩3): Respondent personally engaged with WhatsApp chatbot + received endline reinforcement. Main analysis sample.
  • Indirect exposure: Respondent did not engage at midline, but their partner did. Column (2) in labor market table includes "direct or indirect T" — captures potential within-couple discussion spillovers.
  • Result: Women's job mobility under direct+indirect exposure = +7.5 pp (p=0.028) — somewhat smaller than direct only (+9.6 pp). Suggests some information diffuses within couple, but weaker than direct receipt.
  • Men's beliefs: Indirect exposure effects on men's community beliefs and course nomination are small and p>0.10 — consistent with low within-couple discussion of labor-market plans for men.
Appendix
Appendix · Validity Checks — What do they do?

Near-Miss & Engager Diagnostics: Intuition

Engager characterization (selective take-up): only 36% engage with the WhatsApp module. We compare engagers vs. non-engagers on (a) demographics and (b) the targeted second-order belief.

Plain English: "Engagers are more inactive (selection on demographics — fix with IPW step 2). But they hold the same prior on community support as non-engagers — so the disclosed norm is accurate for them, the people who actually received it."

Reference-group accuracy: max deviation of any subgroup's mean SOB from city-wide average is 3.3 pp. The misperception we are correcting is 28 pp → reference-group mismatch is < 12% of the corrected signal. Disclosed Bogotá-average norm is a valid proxy for every demographic subgroup.

Appendix
Appendix · Validity Checks

Near-Miss Timing Placebo & Engager Characterization

Near-Miss Timing Placebo
  • Endline ran Nov 18 – Jan 20 (63 days). "Hard to reach" = Dec–Jan (N=492); "Easy" = November (N=379)
  • Key belief variable: 2nd-order belief about men's support. Nov: 58.3; Dec–Jan: 59.4; diff. +1.2 pp (p>0.5)
  • No differential loss on the variable the treatment corrects → attrition unlikely to confound
Engager Characterization
  • Engagers (N=1,236) vs. non-engagers (N=2,228): more inactive (+11 pp), more care burden, fewer employed (−11 pp)
  • BUT: 2nd-order beliefs virtually identical (58.1 vs. 58.6, diff 0.5 pp, p>0.5) → disclosed norm is accurate for engagers' reference group
  • Engagement balanced across arms: 35.8% treated vs. 35.6% control

Reference group accuracy: Max deviation of any subgroup's mean SOB from city-wide average = 3.3 pp (high-SES). The corrected misperception is ~28 pp → reference group mismatch is <12% of the corrected signal. Disclosed norm is valid for all demographic subgroups in the sample.

Appendix
Appendix · Figures

Spousal Beliefs — IPWRA Estimates by Gender

Spousal beliefs figure
IPWRA estimates of treatment effects on spousal second-order beliefs, by gender and norm. 90% CI. Sample 2∩3.
Appendix
Appendix · Mechanism

Mechanism: IV Mediation (Exploratory)

  • Setup: 2SLS system: treatment Z instruments mediator M (follow-up perceived community support); M instruments on labor outcomes Y. With one instrument and one mediator, the mediated share = 1 mechanically → interpreted as sign check, not a proportion estimate.
  • Sign pattern: Consistent with the proposed pathway. Updated perceived societal support → increased job search for women; updated work–family balance beliefs → increased aspiration for men.
  • Caveat: Cannot cleanly distinguish community-level vs. spousal-level channel, as both beliefs updated simultaneously (particularly under double exposure) — consistent with the treatment making the shared misperception salient and prompting within-couple discussion.
  • Belief updates close 15–25% of the gap between treatment and control on labor outcomes — the mediation channel is real but partial, consistent with norm correction being a necessary but not sufficient condition for full behavioral response.

The mediation exercise supplements rather than replaces the reduced-form evidence. We treat it as a consistency check on the sign pattern and direction of the channels.

Appendix
Appendix · Pre-registration & IRB

Pre-Registration, IRB, and Timeline

IRB
IRB certificate from Pontificia Universidad Javeriana · Approved 2024-04-24. Both partners consented individually.
StageDateN
Baseline survey (in-person/phone)Jul–Sep 20243,464 adults (1,732 couples)
RandomizationEnd Oct 20241,732 couples (1:1)
Midline — WhatsApp chatbotOct–Nov 20241,236 engaged (36%)
Endline — phone surveyNov 2024–Jan 20251,382 (≈40%)
Sample 2∩3 (both midline + endline)1,102

Replication data and code available at doi.org/10.7910/DVN/QYWHLA.