AI-Enabled Prediction Accountability

AI-Enabled Prediction Accountability | Generated by AI

Home 2026.04

Question: Is it now easier to judge someone’s ability to predict the future, given that billions of people leave public records online — and can AI be used to review and score past writings like those of He Qianlian (China analysis), Yin Wang (computer science), and Paul Graham (startups)?

Answer:

Yes — And This Is One of the Most Underrated Intellectual Tools of Our Era

You’ve identified something genuinely important. The combination of mass digitized public writing + AI review creates, for the first time in history, a scalable way to audit who was actually right and who was just loud.

The Core Idea: Prediction Accountability at Scale

With 7+ billion people, billions of social media posts, essays, and blog entries exist with timestamps. Before the internet age, predictions were largely ephemeral — spoken at a dinner, printed in a pamphlet, forgotten. Now they are permanently indexed.

The traditional problem, as Philip Tetlock documented in his landmark research on superforecasting, was precisely this: every day, the news media deliver forecasts without reporting, or even asking, how good the forecasters who made the forecasts really are. Tetlock noted that figures like Tom Friedman may have been famous, but the accuracy of his forecasting has never been rigorously tested — there are no hard facts about track records, just endless opinions.

AI changes this. You can now feed years of someone’s public writing into a language model and ask: “What did they predict? Were they right? How often? In what domains?” This is a form of post-hoc calibration scoring that previously required enormous human labor.

The Three Thinkers You Mentioned

He Qianlian (何清涟) — China Analysis

He Qianlian is a Chinese economist and social commentator, well-regarded for her critical analysis of China’s political economy, corruption, and propaganda systems. Her work (including books like China’s Pitfall) has been praised for anticipating the contradictions in China’s development model decades ahead of mainstream Western coverage. Because her writing spans 20+ years and covers specific structural claims, it is exactly the kind of corpus that AI can systematically audit for predictive accuracy.

Yin Wang (王垠) — Computer Science Trends

Yin Wang at yinwang.org is a well-known Chinese programmer and thinker who spent time at top U.S. universities (Indiana, Cornell) and worked at companies like Google and Uber. His blog covers topics like “There is no human-level computer vision” and general skeptical takes on mainstream CS trends. His writing style is contrarian and specific — ideal for AI review, since he makes falsifiable claims about what technologies will or won’t work. Over time, you can score which critiques aged well.

Paul Graham — Startups

From 1993 to 2020, Paul Graham published 188 essays on his website, totalling some 500,000 words — roughly 1,000 pages. His essays are among the most publicly documented, timestamped, and specific sets of predictions about startups and technology. His Y Combinator helped launch companies like Dropbox, Airbnb, and Reddit, which gives his pattern-recognition claims some real-world grounding. He has even noted himself that his startup essays get “tested by about 70 people every 6 months,” treating his own writing as falsifiable. This epistemic humility — combined with a large, public, timestamped corpus — makes him one of the best candidates for AI-assisted track-record analysis.

What Makes a Good “Futurist” by Rigorous Standards?

Tetlock’s superforecasting research gives us a useful framework here. Good political forecasters provide a percentage for what they believe will happen. Anyone who speaks with absolute certainty was never among the good forecasters and was often among the worst. Real superforecasters maintain a growth mindset: they treat their analyses with an open mind, not with arrogance or rigid attachment to a decision, and constantly update their analyses based on new data.

Key predictors of forecasting accuracy were cognitive ability, political knowledge, and open-mindedness. Superforecasters were better at inductive reasoning, pattern detection, cognitive flexibility, and open-mindedness.

This is the red flag for viral “predictors” like Jiang Xueqin (“China’s Nostradamus”): he does not present his claims as political analysis but as a scientific theory — but he provides no base rates, no percentages, no systematic framework for why he’s right. While some observers say his predictions look surprisingly accurate, others argue that such forecasts rely on selective historical parallels and speculative reasoning. Making two correct binary calls (Trump wins, US-Iran tension) out of broad, widely-discussed scenarios is not the same as having a calibrated track record.

The AI-as-Auditor Framework (Your Core Insight)

Your proposed method is actually sound and partially already being done:

Collect all public essays, blog posts, social media comments from a thinker (with timestamps)
Extract all falsifiable predictions (not vague or unfalsifiable ones)
Score each prediction against what actually happened
Calculate a calibration score — not just “right or wrong” but whether confidence matched outcome frequency

AI-based forecasting systems have already shown they can match or slightly exceed human superforecasters in accuracy on benchmark questions, which means the same AI machinery can be used inversely: to judge other humans’ past records.

The key challenge is prediction extraction — many thinkers (including Yin Wang and He Qianlian) write in analytical rather than predictive form (“X is happening because Y” vs. “Z will occur by 2025”). AI is now capable enough to disambiguate these, but it requires careful prompt engineering.

Why This Matters

The social media age has created an enormous false prophet problem — millions of accounts claim insight, get viral moments, and face zero accountability. Superforecasters anticipated events 400 days in advance as accurately as regular forecasters could see those events 150 days ahead — but most people have never heard of superforecasters, while loud pundits with no track record dominate the discourse.

Using AI to retroactively score public thinkers creates accountability infrastructure that didn’t previously exist. The thinkers you named — He Qianlian, Yin Wang, Paul Graham — all have large, public, timestamped corpora. That makes them unusually good candidates for this kind of analysis. In contrast, most social media commentators make vague, undated, or unfalsifiable claims specifically (often unconsciously) to avoid being held accountable.

References:

Back Donate