213344069_make-work-fair cover
Corporate Culture

213344069_make-work-fair

by Iris Bohnet, Siri Chilazi

18 min read
10 key ideas

Workplace bias isn't a people problem—it's a design flaw you can engineer away. Harvard behavioral scientists reveal evidence-backed system fixes, like…

In Brief

Workplace bias isn't a people problem—it's a design flaw you can engineer away. Harvard behavioral scientists reveal evidence-backed system fixes, like switching to 6-point rating scales to eliminate gender gaps overnight, that produce measurable equity without relying on goodwill or unconscious-bias training.

Key Ideas

1.

Universal screening replaces biased referrals

Replace subjective referrals with universal screening wherever possible—Broward County's switch required zero eligibility changes and produced 80-130% gains in underrepresented participation

2.

Measure what you control only

Measure only what you can control: Ros Atkins excluded himself and uncontrollable news figures from his gender tally, then used daily Post-it data to hit 50/50 in three months

3.

Multiple underrepresented candidates change statistics

Design slate requirements with at least two candidates from underrepresented groups—one 'different' finalist faces disproportionate scrutiny and is rarely hired; two changes the statistical outcome

4.

Give DEI substantial evaluation weight

Attach real weight to DEI in high-stakes decisions: the Massport Model gave DEI 25% of bid evaluation score, producing outcomes that voluntary goodwill never achieved

5.

Six-point scales eliminate gender bias

Switch performance rating scales from 10-point to 6-point: the gender gap in top ratings 'almost magically vanished' with this single change, because '10' conjures perfection while '6' signals impressive

6.

Eliminate friction signaling not belonging

Remove application friction that marginalized candidates read as 'you don't belong here'—the FDNY's $30 fee waiver raised female applications 83% and Black applications 84%

7.

Anonymize assessments to reduce bias

Move coding and work-sample tests from public whiteboards to private, anonymized environments—public performance measures stereotype threat, not ability

8.

Surface peer support before appeals

Correct pluralistic ignorance before appealing to values: Saudi men who learned the true level of peer support for women working were 57% more likely to act than those who received a moral appeal

9.

Pair pay data with action

Treat pay transparency as a lever, not a trophy—disclosure without a mechanism for stakeholders to act on it becomes wallpaper, as the US tech sector's 2014 data release demonstrated

10.

Audit workload, not talent pipeline

Audit 'greedy work' structures before assuming the problem is talent pipeline—the consulting firm's researchers found identical turnover for men and women; the problem was overwork culture, not women's choices

Who Should Read This

Business operators, founders, and managers interested in Organizational Behavior and Behavioral Economics who want frameworks they can apply this week.

Make Work Fair: Data-Driven Design for Real Results – The Harvard Experts' Blueprint: Actionable Practices for Equity and Inclusion in Organizations

By Iris Bohnet & Siri Chilazi

12 min read

Why does it matter? Because your DEI program is probably making things worse, not better.

Here's what most organizations get backwards: they treat workplace inequality as a motivation problem, so they run training sessions, post values statements, and wait for hearts to change. The data says that approach doesn't just fail—it frequently makes things worse. Iris Bohnet and Siri Chilazi spent years inside the research, and what they found is both humbling and, oddly, relieving. Think about the last time your team filled a role, or rated someone's performance, or decided whose idea to run with in a meeting. The unfairness baked into those moments isn't coming from bad people. It's coming from invisible architecture—flawed system design that nobody built on purpose and nobody thinks to question. The good news is that architecture can be rebuilt. Not with better intentions, but with better engineering: specific, testable changes to how decisions get structured. That's what the book is actually about—and why it's more useful than anything that ends with a call to do better.

The System Was Never Neutral—It Was Designed for Someone Else

In 2005, administrators in Florida's Broward County school district made a quiet procedural change: instead of waiting for teachers and parents to nominate promising students for the gifted program, they screened every second grader with a standardized test. Same eligibility requirements, same program. Just a different front door. The results were startling. Black enrollment in gifted classrooms jumped 80 percent. Hispanic enrollment jumped 130 percent. No one had been trained to see differently. No one's unconscious bias had been addressed. The system had simply stopped depending on someone's gut feeling about which kids looked the part.

The core disruption the book delivers early and hard: the problem was never primarily the people making decisions. It was the decision-making process itself. Under the old referral model, giftedness wasn't being missed because teachers were unusually prejudiced—it was being missed because subjective nomination is structurally terrible at catching talent that doesn't match the person doing the nominating. Swap the process, and the talent that was always there finally shows up in the data.

Most organizations respond to fairness gaps by trying to change what's happening inside people's heads—rolling out unconscious bias trainings, hosting awareness workshops, encouraging managers to examine their assumptions. The research is unambiguous that this approach rarely moves the needle on actual outcomes. One analysis covering more than four decades of personnel data from over 800 American companies found that mandatory diversity training was not correlated with more women or people of color reaching management. In some cases, it correlated with fewer. Good intentions don't survive contact with a broken process.

That's the lever worth pulling—not the attitudes of the people inside the system, but the design of the system itself.

You Can't Improve What You Won't Measure—But Most Leaders Won't Look

Ros Atkins had just returned from a week at Stanford, energized by research on how data drives organizational change, when he realized he had no idea what his own show actually looked like. Atkins presented a nightly BBC news program, and he cared about representing the communities his journalism served—but caring hadn't produced numbers. So in early 2017, he persuaded his producers to spend two minutes after each broadcast doing something almost embarrassingly low-tech: tallying women and men contributors on Post-it notes. The first month of results landed like cold water. Only 39 percent of contributors were women. Not because anyone had been selecting against them deliberately—but because no one had been watching.

Within three months of watching, the show hit 50 percent women contributors. Then it stayed there, month after month, for six years, until the program ended. The Post-it notes eventually scaled into a formal initiative across 750 BBC teams and 150 external partners. None of that happened through training or cultural messaging. It happened because real-time, specific data made the gap visible at the moment decisions were being made—and visibility changed the decisions.

The contrast that makes this stick is the CEO who, when asked about pay equity at his firm, explained that after forty years in the business he was confident there was no problem. Forty years of impressions, zero data. You cannot correct for a gap you haven't measured, and most leaders haven't measured—they've just accumulated a confident narrative assembled from memory and intuition and mistaken it for evidence.

What Atkins's Post-it notes reveal is that the act of measurement is itself an intervention. When his team started counting after every show, they weren't analyzing historical data and drawing conclusions. They were feeding information back into the same decisions, in real time, while those decisions could still be changed. The gap between 'we value representation' and 'we track it weekly' turns out to be the gap between intention and result. Your organization almost certainly has a version of the 39 percent waiting to be counted.

Good Intentions Don't Move Pipelines—Specific Targets Do (But Only If Designed Right)

The Rooney Rule looked like a structural fix. When the NFL required teams in 2003 to interview at least one minority candidate for every head coaching vacancy, a nonwhite candidate became about 20 percent more likely to get hired than in the pre-rule era. Then the gains evaporated. By 2013, all eight open coaching spots went to white candidates. In the four years leading up to 2023, at most three of the league's thirty-two head coaches were Black, despite Black players comprising 60 to 70 percent of rosters. The NFL Players Association's own executive director eventually called the Rooney Rule a tokenism tool—and the research tells you exactly why it became one.

When a single minority candidate is required in a finalist pool, that person faces a structurally different evaluation than everyone else. Being the only one from a given demographic triggers disproportionate scrutiny and exaggerated differences—the candidate isn't just being assessed, they're being assessed as a representative. Studies of three-person finalist pools found that evaluators reliably recommended one of the two candidates who shared a demographic characteristic, regardless of whether that majority pair was male or female, white or Black. The math is simple and brutal: mandate one, and you've nearly guaranteed a token. Require two, and something shifts. Between 2013 and 2017, a Black coaching candidate was significantly more likely to be hired when at least two Black finalists were interviewed rather than one. The process goal looked identical from the outside—interview minority candidates—but the design detail of how many produced statistically different outcomes.

The target didn't need to be bigger. It needed to be smarter.

Fairness Doesn't Happen Without Skin in the Game

When the Massachusetts Port Authority solicited bids for a major hotel development in Boston's Seaport District—a half-billion-dollar project on land it owned—it did something unusual: diversity and inclusion plans were given exactly the same weight in the bid evaluation as building design, construction experience, and financial capability. Twenty-five percent of the score. Not a tiebreaker. Not a bonus consideration. A full quarter of what would determine who won.

Jonathan Davis, a real estate developer on the winning bid, was candid about what happened to his thinking. Developers naturally gravitate toward the fastest and simplest path—working with people and firms they already know. That's not prejudice; it's friction avoidance. But when Massport changed the rules, Davis's calculus changed with them. The winning bid directed 30 percent of construction and architectural fees to minority- and women-owned businesses and set purchasing commitments for ongoing operations. Davis didn't arrive at those numbers because someone made a compelling presentation about the business case for diversity. He arrived there because the incentive structure made the familiar path the losing path.

Status quo bias is stubborn precisely because it doesn't feel like bias—it feels like efficiency. The path of least resistance is invisible until something raises the cost of taking it. Most organizations respond to persistent fairness gaps by making the case again, more urgently, to more senior people. That's mostly wasted effort. The gap between believing diversity matters and acting differently under real decision-making pressure isn't a knowledge gap. It's an incentive gap. Close it by attaching real consequences to DEI outcomes—not as a gesture, but at a weight that actually changes what winning looks like.

Transparency Is a Tool, Not a Trophy—And It Can Backfire

What happens when you force organizations to publish their diversity numbers? The intuitive answer—sunlight disinfects, embarrassment drives change—turns out to be right only under specific conditions that most transparency efforts fail to create.

In 2014, FOIA requests compelled Google, Facebook, and other major tech firms to release workforce demographic data. The numbers were bad: women made up less than a third of employees, and Black and Hispanic workers each came in under 10 percent. But very little followed. Because every company's numbers were roughly equally bad, the social stigma that transparency was supposed to generate never materialized. There was no outlier to shame. No one looked worse than their peers, so no one faced the reputational pressure to act differently. Intel's share of female employees barely shifted over the following two decades. Disclosure had become wallpaper—the data existed, but the accountability mechanism didn't.

The Danish experience adds a different wrinkle, and it's the more surprising one. A 2006 pay transparency law required companies to report salary data broken down by gender. It did narrow the gender wage gap—but it also slowed overall wage growth by about 3 percent. The mechanism was almost perverse: once managers knew that pay decisions were visible, they became reluctant to grant above-average raises to any individual man, since doing so might create a documented disparity they'd have to explain. Reducing discrimination came at the cost of suppressing pay across the board. The workers who weren't discriminated against paid a real price for the visibility that was supposed to help others.

The lesson running through both cases is the same. Transparency isn't a values statement—it's a design input, and it only produces accountability when someone with the power to act is watching the data and has a reason to respond. Publishing numbers into a vacuum, where peers are equally implicated and no stakeholder is positioned to apply pressure, produces disclosure without consequence. Before you announce what you'll measure, ask who will see it, what they can do about it, and why they'd bother. That's what determines whether your transparency effort changes anything or just adds another dashboard no one checks.

Your Job Ads and Application Process Are Quietly Filtering Out the Best Candidates

Every application process filters someone out. The question is whether it's filtering out the wrong people—and the answer, more often than not, is yes, in ways the organization has never noticed because the design was never questioned.

The New York City Fire Department discovered this with a $30 fee. The FDNY charged that amount to take the civil service exam required for firefighter candidates, and the fee seemed trivially small—an afterthought in a process full of real tests and real physical demands. But when the department waived it, applications from women rose 83 percent and from Black candidates 84 percent. Not because those applicants couldn't scrape together thirty dollars. Because the fee functioned as a signal. For candidates already uncertain whether someone like them belonged in this profession, a fee to even attempt the process read as confirmation of that doubt. The barrier wasn't financial—it was interpretive. The ambiguity of 'do I fit here?' defaulted, as it reliably does, to 'probably not.'

Ambiguity in a hiring process is never neutral. It lands differently depending on how confident the applicant already is about whether they belong, and that confidence tracks closely with how well-represented their group is in the profession. The same dynamic plays out in job postings that inflate credential requirements beyond what the role actually needs—men apply at 52 percent qualification match, women wait until they hit 56 percent. The gap isn't irrational timidity. It's a calibrated response to a history where rejection costs more if you're already reading every friction point as a verdict.

The fix, when organizations actually apply it, is almost insultingly simple: be specific. On Upwork, female applications for certain roles sat at 6 percent until the score threshold required to qualify as 'Expert-level' was stated explicitly—a number rather than a vague label. Female applications jumped to 29 percent. The pool didn't change. The talent didn't change. What changed was that ambiguity stopped doing the work of exclusion. Your job ad, your application steps, your qualifications list—each one is either telling a broad pool of candidates how to self-select in, or quietly telling some of them to self-select out. Right now, you probably don't know which.

Interviews Feel Like Judgment—They're Actually Just Noise

A computer science student sits down at a whiteboard in front of two evaluators. The task is a coding problem she's solved before, in private, without difficulty. But in this room, watched and expected to narrate her thinking aloud, she freezes. She doesn't complete the problem. Neither does a single other female student in the study when tested this way. Switch the setting—same problem, no observers—and every one of them gets it right.

That's not a story about confidence. It's a story about what the test is actually measuring. The public whiteboard exercise, standard practice at many tech companies, reliably induces stress, and stress hits harder when stereotype threat is already present—when a candidate is aware that their performance might confirm a negative assumption about their group. What looks like a competence filter is functioning as a stress test. The two things produce identical-looking results until you separate them, as data scientist Mahnaz Behroozi and colleagues did. In the private condition, performance was more than twice as high overall. The format was generating the outcome, not the candidates.

Slack noticed this and redesigned accordingly. Applicants now work through problems privately, and all identifying information is stripped before evaluators see the output. In one move, the company removed both the stereotype threat facing the candidate and the bias available to the reviewer. The result is a window into actual ability rather than into who handles an audience.

The same logic applies further down the pipeline, in how we rate people once they're inside an organization. Sociologists Lauren Rivera and András Tilcsik examined what happened when a single employer switched its performance rating scale from ten points to six. On the ten-point scale, men were about 50 percent more likely to receive the top score than equally performing women. On the six-point scale, the gap nearly disappeared. The scale wasn't measuring performance differently—it was triggering different mental comparisons. A rating of ten carries a cultural weight that six doesn't: it implies perfection, genius, someone extraordinary. And because we don't instinctively picture women as extraordinary in male-dominated fields, the bar shifts upward for them without anyone noticing. Shrink the scale, and you shrink the room for that unconscious adjustment. The ten-point version wasn't rating performance. It was rating who felt like a ten.

The Gender Gap Isn't About Women Leaving—It's About Jobs Designed for People With Wives

A major consulting firm hired three researchers to find out why women weren't reaching partner. They spent a year collecting data, and what they found was inconvenient enough that the CEO shut the project down. Turnover among women and men was virtually identical. Every employee, regardless of gender, was distressed by the firm's expectations of constant availability and extreme hours. Women advanced more slowly not because they were leaving, but because they were more likely to use the firm's own accommodations—reduced hours, internal projects instead of client-facing ones—and when they did, their careers stalled. The researchers recommended fixing the structure of work. The firm ended the engagement.

That episode contains the whole argument. When leaders explain the gender gap by pointing to women's choices, they are describing something real—women do use flexibility accommodations at higher rates—while misidentifying why. Claudia Goldin calls the underlying mechanism 'greedy work': professions like consulting, law, and finance pay disproportionately for extreme time, not just for output. Someone working 80 hours a week doesn't earn twice what someone working 40 hours earns—they earn more than twice, because availability itself is rewarded. That structure creates a financial incentive, in households with caregiving responsibilities, for one person to go all-in and the other to absorb everything else. In heterosexual couples, this plays out predictably. The choice that follows is real, but the architecture that shapes it was never neutral.

The pharmacy comparison makes this concrete. In 1970, more than 90 percent of pharmacists were men, and female pharmacists earned 67 cents to a male colleague's dollar. Over the following decades, independent pharmacies were absorbed into chains, drug standardization improved, and IT systems made prescriptions trackable and transferable. Pharmacists became interchangeable. If one person needs to leave early, take leave, or cut hours, a colleague can pick up exactly where they left off—no client relationship disrupted, no institutional knowledge lost. The flexibility penalty disappeared. Today women are the majority of pharmacists, and a pharmacist working half-time earns almost exactly half of what a full-time colleague earns. Not 67 cents on the dollar.

The lesson isn't that pharmacy is uniquely egalitarian. It's that pharmacy stopped being organized around the assumption that the person doing the job had no competing obligations. Law, finance, and consulting still are. They remain structured so that one colleague can't easily substitute for another—which means every hour you don't work is an hour that can't be covered, which makes every accommodation a career cost. Women bear that cost more often. The cost is real. So is the structure that created it.

Norms Are More Powerful Than Rules—If You Know How to Shift Them

Here's the question worth asking before you commission another all-hands training: what do your employees actually believe their colleagues are already doing?

In Saudi Arabia, researchers found that 87 percent of married men supported their wives working outside the home. But those same men estimated that only 63 percent of their peers felt the same way. The real norm and the perceived norm were miles apart—and that gap was doing all the work. When some of these men were told the actual figure (that 87 percent of their peers, not 63, were supportive), they became 57 percent more likely to sign their wives up for a job-matching app. Five months later, significantly more of those women had applied for actual jobs. No training, no mandate, no cultural overhaul. Just a correction to a false belief about what the people around them were already doing.

Behavioral scientists call this pluralistic ignorance—the condition where nearly everyone privately holds a view while publicly assuming they're the outlier. It's the mechanism behind most entrenched workplace norms. Employees who find an all-male panel uncomfortable often assume their discomfort is unusual. Managers who want to take parental leave assume their peers would judge them for it. These beliefs function like invisible fences, holding behavior in place long after the actual consensus has moved.

Norms, in other words, are a design element, not just a cultural mood. You can intervene directly. Find out what your people falsely assume their colleagues believe, correct the record, and let the actual majority norm do what majorities naturally do: spread.

The Question Worth Carrying Forward

Here is what the research keeps returning to, in case you carry nothing else forward: you don't have to change what people believe to change what they do. Broward County didn't persuade teachers to see differently—it stopped asking teachers to look. Ros Atkins didn't run a workshop—he grabbed Post-it notes. Every mechanism in this book works by redesigning the decision point itself, not the person standing at it.

Which means the hard part isn't finding the right intervention. It's admitting that the process you've been defending as a meritocracy was never actually neutral. That's uncomfortable. Organizations resist it, and so do the people whose judgment the old process was quietly deferring to.

But that's exactly where the leverage is. The question worth taking back to your organization isn't about culture or values or hearts and minds. It's this: where are you still running on the equivalent of teacher referrals? Where is a gut feeling doing the work that a process should be doing instead? Find that decision point. Change how it works. The talent was always there. You just need a better front door.

Notable Quotes

hit it out of the park,

The number ten carries this cultural connotation of perfection. . . . Research shows that, due to gender stereotypes of competence, we just don’t think women are perfect. We are more likely to scrutinize women and their performance.

Hey man, just remember that there are real people who are hurt when you harass them with that kind of language,

Frequently Asked Questions

What is the main argument of Make Work Fair?
Make Work Fair reframes workplace inequality as a design failure rather than a people problem. Drawing on behavioral science and case studies, authors Iris Bohnet and Siri Chilazi demonstrate that organizations can measurably reduce bias through evidence-backed interventions. Instead of focusing on changing people's attitudes, the book advocates for redesigning systems and processes—from hiring to performance evaluation—to make fair outcomes automatic. Examples include restructured rating scales, anonymized hiring processes, and universal screening mechanisms that consistently outperform goodwill-based approaches to equity.
What does Make Work Fair recommend about changing performance rating systems?
A critical finding involves switching performance rating scales from 10-point to 6-point systems. The book notes the gender gap in top ratings "almost magically vanished" with this change, explaining that "10" conjures perfection while "6" signals impressive. This single structural modification addresses unconscious bias without requiring people to change their thinking. The difference illustrates how design choices—not individual intentions—determine equitable outcomes. The book emphasizes this represents one of many evidence-backed interventions where changing systems achieves what appealing to values cannot.
What hiring process changes does Make Work Fair recommend based on real results?
Make Work Fair recommends replacing subjective referrals with universal screening wherever possible. The book cites Broward County's experience: switching to universal screening required zero eligibility changes yet produced 80-130% gains in underrepresented participation. Additionally, the book advocates designing slate requirements with at least two candidates from underrepresented groups, noting that one "different" finalist faces disproportionate scrutiny and is rarely hired. For technical roles, the book recommends moving coding tests from public whiteboards to private, anonymized environments to reduce stereotype threat and measure actual ability.
How does Make Work Fair address pay transparency and organizational culture issues?
Make Work Fair treats pay transparency as a mechanism for action, not a symbolic gesture. The book warns that disclosure without a mechanism for stakeholders to act on it becomes wallpaper, referencing how the US tech sector's 2014 data release failed to create meaningful change. Additionally, the book recommends auditing "greedy work" structures before assuming problems stem from talent pipeline issues. One consulting firm's research found identical turnover for men and women—the actual problem was overwork culture, not gender-based choices.

Read the full summary of 213344069_make-work-fair on InShort