Lex Fridman Podcast cover
Technology & the Future

#496 – FFmpeg: The Incredible Technology Behind Video on the Internet

Lex Fridman Podcast

Hosted by Unknown

4h 24m episode
15 min read
5 key ideas
Listen to original episode

Most engineers trust compilers to optimize their code — but the handwritten assembly inside FFmpeg beats them by 62x, built by 15 volunteers who've refused…

In Brief

Most engineers trust compilers to optimize their code — but the handwritten assembly inside FFmpeg beats them by 62x, built by 15 volunteers who've refused intelligence agency backdoors.

Key Ideas

1.

Handwritten Assembly Beats Compiler Output

dav1d has 240K lines of handwritten assembly — 8x more than all other FFmpeg codecs combined.

2.

Hand-Tuned SIMD Vastly Beats Auto-Vectorization

Compiler auto-vectorization loses to hand-tuned SIMD by 10–62x, not a rounding error.

3.

License Compliance Requires Genealogical Investigation

JB tracked down a dead man's factory-worker father just to change a software license.

4.

AI Spam Floods Open Source Maintainers

AI-generated bug reports are now a denial-of-service attack on volunteer open source maintainers.

5.

FFmpeg Mars, VLC Pancakes Paradox

FFmpeg runs on Mars. VLC can't open pancakes. Both facts are true.

Why does it matter? Because civilization's video infrastructure runs on volunteers one burnout away from breaking

The software secretly rendering every Netflix stream, every YouTube video, every Discord call, and every video conference you've ever joined is maintained by roughly ten to fifteen unpaid volunteers — and most engineers at trillion-dollar companies depending on it have no idea it exists. This conversation opens the hood on what it actually takes to build and sustain foundational open-source infrastructure, and why the assumptions most developers carry about compilers, codecs, and long-term sustainability are quietly, expensively wrong.

• Handwritten assembly beats compiler-optimized C by 10–62x — not percent — and the gap widens precisely as hardware stops getting faster • Changing a software license required JB to track down a factory worker whose programmer son had died, just to relicense two lines of code • AI-generated security reports are now a functional denial-of-service attack on volunteer maintainers, while trillion-dollar companies treat a public bug tracker like a paid SLA vendor • Two intelligence agencies asked for a backdoor in VLC; JB's architecture was designed from day one to have nothing to hand over

Modern compilers have not solved handwritten assembly — the gap is 10x to 62x, and it's widening as hardware plateaus

Ten to sixty-two times faster. Not percent. That's what separates handwritten SIMD assembly from compiler-optimized C on the operations that make video possible — and Kieran has the benchmarks posted publicly, to years of furious pushback from the broader software engineering community.

The argument from the optimism camp is reasonable on its face: modern auto-vectorization should handle SIMD. JB and Kieran have spent two years showing it doesn't. "For two years, and two years later, showing hundreds of examples of handwritten assembly. 'No, no, no, you're doing it wrong. The compiler can do this.'" No one has come back with code that closes the gap.

The reason the stakes are existential rather than academic: dav1d, VideoLAN's AV1 decoder, runs on an estimated three billion devices decoding video continuously. "Every cycle matters," JB says. "We are talking about probably three billion devices which are going to decode video nonstop because, for example, thirty percent of the video from Netflix are now in AV1, fifty percent of YouTube."

What JB and Kieran do with assembly goes beyond normal optimization. "We abuse the machine," Kieran says. "We go and use the machine in ways that the creator didn't expect. Sometimes we use an instruction that's completely unrelated to what we do. We use a cryptography instruction in video processing." dav1d also invents its own calling convention — bypassing the operating system's standard method of sharing state between functions — because saving registers the standard way burns cycles they cannot spare at this scale.

JB's synthesis is brisk: "We are at the end of Moore's law. You need to go down in the stack and optimize more to get more power from what you have." The compiler will not take you there. Learning CPU architecture still is, for certain classes of problem, the most powerful thing a developer can do.

Video compression isn't minimizing mathematical error — it's engineering deception calibrated to human eyes, and academia had it backwards for two decades

Hobbyists making anime fansubs figured out what MPEG academics couldn't: you are optimizing the wrong metric.

For twenty years, the holy measure in video compression was PSNR — peak signal-to-noise ratio, a mathematical score of how closely the compressed signal resembles the original. The problem, as Kieran explains, is that PSNR optimization produces blurring: "It leads to loads and loads of blurring." Distributing small errors across every pixel minimizes the average but makes images look washed out. A sharp image with a few visible artifacts might score worse mathematically while looking dramatically better to the person watching it.

JB frames the compression philosophy in terms that sound obvious only in retrospect: "All the compressions that we do, and that's very important, people forget about that, is to be viewed by humans." The operation is deliberate degradation, not fidelity: "We are degrading both the audio and the video signal in the best way possible."

What unlocked x264 — and through it, essentially all modern video compression — were two techniques that hobbyist encoder developers smuggled past the academic establishment. Psychovisual rate distortion used block energy to make encoding decisions based on human perceptual complexity rather than mean squared error. Adaptive quantization redistributed bits away from simple areas — flat grass, still backgrounds — toward the complex regions where the eye would actually notice loss.

The test environment that validated this wasn't a professional screening room. Loren Merritt, x264's key engineer, was explicit: "I don't wanna test this on a thirty thousand dollar screen. I want this to look good on someone's laptop at home." That constraint — optimize for the real viewer, not the ideal one — is part of why x264 became the reference against which every codec since has been benchmarked. When an entire field is optimizing the wrong number, the people closest to the actual use case can leapfrog decades of institutional work in months.

The Alliance for Open Media said AV1 required dedicated hardware decoding. VideoLAN proved them wrong with 240,000 lines of handwritten assembly.

The people who designed AV1 thought it was too complex to decode in software. The people who built dav1d were not interested in that consensus.

When AV1 launched, the view inside the Alliance for Open Media — which includes Google, Netflix, Amazon, Apple, and Mozilla — was that the format's complexity made hardware decoding effectively mandatory. JB's paraphrase is unsparing: "Many people said, especially even from the Alliance for Open Media, 'Well, this format is so complex, it must be done in hardware to do decoding.'"

VideoLAN built dav1d instead. The numbers are staggering: 30,000 lines of C and 240,000 lines of handwritten assembly. For context, all other codecs in FFmpeg combined contain roughly 100,000 lines of assembly. dav1d has more than twice that total — for a single decoder. The outcome: "With one or two cores you were able to decode 720p correctly." Not dedicated silicon. Not a GPU. One or two standard CPU cores.

That decoder now runs on every device streaming the 30% of Netflix traffic and 50% of YouTube traffic delivered in AV1. It did what the hardware manufacturers said was impossible, on hardware those manufacturers never designed for it.

Kieran's summary of the codebase is categorical: "This is what peak video codec should look like. Seventy-nine point nine percent assembly, nineteen point six percent C, and zero point five percent other."

The provocation isn't merely technical. dav1d is proof that when an entire industry agrees something requires a hardware solution, the real question is whether anyone with sufficient craft and stubbornness has actually tried to prove otherwise in software. JB and a small team found no fundamental ceiling. They kept writing assembly until the wall disappeared.

To change a software license, JB tracked down a grieving factory worker whose programmer son had died

Three hundred and fifty people. That's how many JB had to contact to move VLC's core from GPL to LGPL — finding, reaching, and getting explicit consent from every contributor, including people who had changed jobs, moved countries, gone silent, or died.

The legal logic is airtight: open source projects are collective works where each contributor retains copyright on what they wrote. You cannot relicense without everyone's permission. "Everyone" is not a manageable or findable list. JB tracked people by decade-old email addresses, traveled in person when necessary, and knocked on a factory door when the trail led there.

"I arrived to the work of a person who was a factory worker," JB says. "And I said, 'Well, I need you to sign that,' because it was his son who died who actually wrote the code. So I had to explain all those types of open source meanings, and no, I was not a company trying to rip out the two lines or five lines that that guy did." He was young. The father had never encountered any of this. "We talked about the photo of this guy," JB says. He was almost in tears.

What the process clarified: "The license is a social contract in terms of Rousseau de facto of the community. The community does not agree on much besides the license."

Kieran makes the operational consequence explicit for a project like FFmpeg, with thousands of contributors some of whom are no longer alive: relicensing is effectively impossible. "It would need all of their agreement." The license chosen on day one is nearly immutable — which is both a structural strength and the most consequential decision a project will ever make, usually before anyone understands what the project will become.

FFmpeg runs on up to a billion CPUs right now, is maintained by 10–15 volunteers, and AI-generated bug reports are now a viable denial-of-service weapon

Kieran states the scale without hedging: "As we speak, easily 100 million, maybe even a billion CPUs" are running FFmpeg at this moment. The project sustaining that load has a core team of ten to fifteen people. None of them are paid for it.

The gap between scale and resourcing was theoretical until Google deployed AI to generate security vulnerability reports on FFmpeg in volume. Kieran's description: "It's almost a denial of service by AI-generated bug reports on very niche codecs." The reports arrived with aggressive framing, standard 90-day commercial deadlines, and no accompanying patches. The vulnerability that sparked public controversy was in "an obscure 1990s game codec." It was flagged high priority.

The contrast Kieran draws is precise: a 16-year-old contributor named Ruikai Peng discovered and fixed a real vulnerability in three days, filed no CVE, requested no bounty, and generated no press release. The AI-driven process produced the inverse — maximum noise, minimum contribution.

Microsoft Teams filed a bug report naming the product and its user base to establish urgency, treating FFmpeg's public tracker like a vendor's paid support queue. JB's team responded by offering a support contract. Microsoft offered a one-time payment of a few thousand dollars.

Donations did rise after the public confrontation. "Donations have increased substantially," Kieran says. "They're still not enough to cover even a single full-time developer."

The XZ attack spelled out where this trajectory leads: one burned-out maintainer, two social engineers asking questions "nonstop at weird times at night to block him, and at some point he got fed up" and granted commit access just to make it stop. The exploit wasn't technical. It was attrition — and AI makes attrition cheap to industrialize.

The last offer was 'obscene' — JB turned it down because taking it would have ended the project in three years anyway

The Reddit meme is accurate. JB has turned down tens of millions of dollars, multiple times, to avoid bundling adware, spyware, or toolbars into VLC. He is careful to separate this from ideology.

"The last offer I had was obscene," he says. "And they say, 'Yeah, but imagine with all that money you could build something new, open source.' It was like the mind trick was… it was difficult."

The personal reckoning was direct: "I need to go to bed at night and be happy about what I've done." But the strategic calculus arrived at the same place. "If I do that, right, I would have a ton of money, right? And then three years later, the project is gone, right? Someone forks it and something else happens." A project built on community trust cannot survive its betrayal; the fork would happen before the runway ran out. "If I had sold out, I would have betrayed so many other people who work here."

Context matters. In the early 2000s, installing a program and finding three extra toolbars on your browser was industry standard. Saying no wasn't idealism — it was genuinely unusual and commercially costly. That VLC became a symbol of what free software could be is inseparable from JB holding the line during years when the rest of the software distribution world wasn't.

The ethical and strategic calculations were identical. That alignment — where doing the right thing and doing the smart thing converge — is rare enough that JB's account of it is worth sitting with. He wasn't betting on virtue. He was modeling second-order effects accurately when everyone else was fixated on the first.

One megabyte of binary blob takes roughly a month to reverse engineer. Kostya was doing 20- and 30-megabyte blobs — alone, for fun.

Without the people who reverse engineer proprietary codecs, GoToMeeting recordings go dark when the company pivots. Early 2000s CCTV archives become permanently unreadable when the hardware vendor disappears. An obscure Star Wars game's opening cinematic — decoded by one person, preserved in FFmpeg — would simply be gone.

Kostya Shishkov — described by JB as "borderline genius" — was one of the people who prevented that. He had a phrase for his methodology: binary specification. He needed no documentation, no source code, no vendor cooperation. "He looked at the world as a binary specification. He didn't need documentation or anything." He opened the binary and reasoned from it until it yielded.

Kieran walks through what the process actually involves: locate the decoding module inside a large application binary, find a way to hook into it and dump raw YUV output as ground truth, then open a disassembler and begin inferring codec structure from patterns in machine code. "For a long time, you don't see anything. So you're debugging purely in memory." Every stage — entropy decoding, intra prediction, motion compensation, inverse transform — must be reconstructed before a single valid pixel appears.

"You're stepping through the debugger, like one by one, instruction by instruction going, 'Hey, this instruction changes this.'" Some codecs have encryption on top, requiring a virtual machine to dump memory mid-execution.

A one-megabyte binary is about a month of work. Kostya routinely processed 20- and 30-megabyte blobs. The GoToMeeting decoder he produced came with jokes threaded through the comments — JB's name and Kostya's own, woven into code that no one was supposed to be able to write.

As JB frames the stakes: "This is why those type of work are exceptionally useful for humanity." Every proprietary format without an open decoder is a countdown clock on a category of human memory.

Two intelligence agencies asked for a backdoor in VLC. JB's answer was architectural: there is nothing to hand over.

The request came twice. JB was "a lot less polite" than no both times. But the more important answer isn't the refusal — it's why the refusal is structurally binding rather than merely principled.

"There is no telemetry in VLC," JB says. Binaries are compiled on machines that have never touched the internet. The compiler itself is compiled from source before anything else runs. Binaries carry double signatures. When a suspected government actor tried to push a fake binary onto VideoLAN's own servers, the architecture held. "We compile on boxes that are offline, where we start by compiling the compiler. We do everything offline on places that have never been connected to the internet."

The stated principle is absolute: "If we had to compromise our software, we would shut it down. This is clear." But the structural reality is that there is nothing to compromise in the conventional sense — no server to compel, no viewing history to subpoena, no telemetry pipeline to quietly redirect. Privacy-by-architecture beats privacy-by-policy every time a government decides policy is negotiable.

The CIA Vault 7 documents revealed the workaround: build a modified VLC with an added DLL that exfiltrates documents during playback, then deliver it via targeted advertising. The vulnerability wasn't in VLC's code. It was in users downloading from unofficial sources — VLC was impersonated, not penetrated. Which is, in a specific sense, the highest compliment you can pay software designed from day one to have nothing to hide.

The efficiency gap that made FFmpeg matter is opening in AI inference — and most engineers still can't close it

JB draws the line himself: the reason quantization in LLMs matters — why researchers are working in FP8, FP4, and 1.58-bit precision — is the same reason dav1d exists. Hardware is no longer keeping pace with demand. The efficiency gap that made 240,000 lines of handwritten assembly worth writing for a video decoder is opening now in machine learning, on devices that haven't doubled in speed in years.

The craft that most engineers have been trained to ignore — assembly, SIMD, CPU architecture, the actual shape of the machine underneath the language — is about to matter again at massive scale. It already runs on Mars.

The question this episode leaves hanging is who will write it.


Topics: open source, video codecs, FFmpeg, VLC, assembly language, multimedia compression, software engineering, reverse engineering, open source sustainability, AV1, x264, video streaming, patent licensing, privacy by design, software maintenance

Frequently Asked Questions

What is FFmpeg and why does it matter for video on the internet?
FFmpeg is the foundational technology powering video compression and playback across the internet. Built by just 15 volunteers, FFmpeg achieves remarkable performance through handwritten assembly code that beats compiler optimization by 62x. The project is notable not only for its technical excellence but also for its commitment to independence — the maintainers have refused backdoors from intelligence agencies. FFmpeg's efficiency makes it essential for streaming services, content creators, and any application requiring reliable video processing at scale.
Why does FFmpeg use handwritten assembly code instead of relying on compilers?
Compiler auto-vectorization loses to hand-tuned SIMD by 10–62x, representing massive performance gaps that compound across billions of devices. FFmpeg's dav1d codec alone contains 240K lines of handwritten assembly — 8x more than all other FFmpeg codecs combined. This manual optimization allows developers to exploit specific CPU capabilities that compilers miss, creating dramatically faster video processing. While hand-written assembly requires more expertise and maintenance, the performance benefits justify the effort for infrastructure used globally in real-time video processing.
What challenges do FFmpeg's volunteer maintainers face?
FFmpeg's 15 volunteer maintainers face escalating challenges from bad-faith actors and scale issues. AI-generated bug reports are now a denial-of-service attack on volunteer open source maintainers, flooding their systems without solving real problems. Beyond digital attacks, the project's history includes unusual legal complications — including one maintainer tracking down a dead man's factory-worker father just to change a software license. These barriers highlight how volunteer-maintained critical infrastructure faces mounting pressure from automated attacks and complex bureaucratic obstacles.
What are some surprising facts about FFmpeg's real-world impact?
FFmpeg runs on Mars, while VLC can't open pancakes — both facts are true. This episode showcases FFmpeg's presence across unexpected environments, from space exploration to consumer devices. The codec's reach extends far beyond typical video streaming; it powers everything from medical imaging to scientific research. FFmpeg's universal adoption stems from its reliability, open-source nature, and exceptional performance optimizations, demonstrating how critical the project has become to modern technology infrastructure.

Read the full summary of #496 – FFmpeg: The Incredible Technology Behind Video on the Internet on InShort