Something Big Is Happening — AI Perception Gap

Perception Gap

Same Era, Completely Different Realities

Most people use the free version of AI. Matt Shumer describes this as "judging the smartphone era with a flip phone." Free AI is more than a year behind the latest paid models.

TECH INDUSTRY

Latest paid model users

"When you ask AI to build an entire app, it tests itself, iterates, and completes it. It shows not just coding ability, but judgment."

GENERAL PUBLIC

Free version users

"It gives useful answers sometimes, but gets a lot wrong too. Isn't it just a chatbot that can't handle complex tasks?"

AI Capability Awareness by Group

How accurately each group understands AI's current level

AI Researchers

92%

Tech Workers

72%

Tech-Adjacent

35%

General Public

12%

Exponential Growth

AI progress follows an exponential curve — and it's accelerating

According to METR benchmarks, AI autonomous task capability doubles every ~6.5 months overall, and has accelerated to every ~89 days (~3 months) since 2024. GPT-5.3-Codex and Claude Opus 4.6, released simultaneously on Feb 5, show this acceleration intensifying.

197days

Capability doubling time
(overall average)

89days

Accelerated doubling
time since 2024

6.6hrs

METR official best
(GPT-5.2)

~113x

Capability increase
vs GPT-4 over 2.7 years

METR Benchmark: Autonomous Task Duration by Model

Task duration AI can complete with 50% probability without human expert help (50% time horizon)

Source: METR Time Horizon 1.1 (updated 2026.01.29). GPT-5.3-Codex and Opus 4.6 post-release; official METR evaluation not yet published. Applying the 89-day doubling trend, latest models' time horizons are estimated at 10+ hours.

2026. 2. 5

Two models launched on the same day

OpenAI's GPT-5.3-Codex and Anthropic's Claude Opus 4.6 were released on the same day. GPT-5.3-Codex was announced as "the first model to contribute to building itself," while Opus 4.6 nearly doubled its predecessor's score on ARC-AGI-2. Official METR evaluations haven't been published yet, but benchmark improvements show the curve steepening.

Benchmark	GPT-5.2	GPT-5.3-CodexNew	Opus 4.5	Opus 4.6New
SWE-bench Verified	80.0%	—	80.9%	80.8%
SWE-Bench Pro	56.4%	56.8%	—	—
Terminal-Bench 2.0	64.0%	77.3%	59.8%	65.4%
OSWorld	38.2%	64.7%	66.3%	72.7%
ARC-AGI-2	—	—	37.6%	68.8%
GPQA Diamond	—	—	87.0%	91.3%
HLE (with tools)	—	—	43.4%	53.1%
GDPval-AA Elo	~1462	—	1416	1606
BrowseComp	—	—	67.8%	84.0%

Source: OpenAI, Anthropic official announcements (2026.02.05). GPT-5.3-Codex: 25% faster + 400K context. Opus 4.6: 1M context + adaptive thinking.

Feb 5 Models vs Previous Gen — Key Benchmark Comparison

How much they improved over the previous generation on the same benchmarks

Exponential Growth in AI Autonomous Task Duration

50% time horizon change by model release date (log scale). The curve has steepened since 2024.

Source: METR Time Horizon 1.1 / Epoch AI. Gray dashed line: overall trend (197-day doubling). Red dashed line: post-2024 acceleration (89-day doubling).

Timeline

What Happened in 4 Years

From AI that couldn't do basic arithmetic to autonomous expert-level complex tasks.

2022

ChatGPT launches. The world is amazed, but it frequently gets basic arithmetic wrong. Autonomous task time: unmeasurable.

2023. 3

GPT-4 launches. Passes the US bar exam. METR autonomous task time: ~3.5 minutes.

2024. 10

Claude 3.5 Sonnet. Can write complete software. Autonomous task time grows to ~20 minutes.

H1 2025

Claude 3.7 Sonnet — ~1 hour. O3 — ~1.6 hours. Top engineers begin delegating most coding to AI.

H2 2025

Gemini 3 Pro — ~4 hours. Claude Opus 4.5 — ~5.3 hours. MathArena Apex: 1% to 23%, a 20x leap.

2025. 12

GPT-5.2 — ~6.6 hours (394 min) on METR. All-time record. "Everything before feels like a different era."

2026. 2. 5 — Present

Same day: GPT-5.3-Codex and Claude Opus 4.6 launch simultaneously. GPT-5.3-Codex jumps +13.3%p on Terminal-Bench, declared "the first model to contribute to building itself." Opus 4.6 leaps from 37.6% to 68.8% on ARC-AGI-2 in one generation. 1M token context. Doubling time confirmed at 89 days.

The Dangerous Gap

Why the Growing Gap Is Dangerous

Actual AI capability is rising exponentially, but public perception barely moves. Society faces shocks it's not prepared for.

Actual AI Capability vs. Public Perception

The wider the red area, the greater the societal shock

WARNING

"Within the next 1–5 years, 50% of entry-level office jobs could be eliminated. We are only 1–2 years away from AI being able to build models that are fundamentally far superior to the current generation."

— Dario Amodei, CEO of Anthropic, 2025

Impact

Jobs Already Under Threat

Click an item to see details.

Customer Service

95%

With complex multi-step problem solving now possible, AI is replacing not just simple inquiries but expert consulting roles.

Software Development

90%

AI auto-generates hundreds of thousands of lines of code. Even top engineers delegate most coding to AI. The developer's role is shifting from "writing code" to "directing AI."

Legal / Law

85%

AI performs contract review, case analysis, and brief drafting at or above human lawyer level. Law firms are already reducing junior hires.

Finance / Analysis

80%

Financial modeling, data analysis, and report writing are being automated. AI generates investment analysis reports in minutes.

Content / Marketing

75%

Ad copy, blog posts, social media content, and translation — writing-based work is being automated at scale.

Medical / Diagnostics

70%

AI has achieved specialist-level accuracy in image reading and initial diagnosis. Radiology and pathology are especially impacted.

What You Can Do

What You Can Do Now

Recognizing this gap already puts you one step ahead.

Try the latest AI yourself

A $20/month subscription can close a 1+ year technology gap.

Apply it to real work

Integrate AI into daily tasks like writing, analysis, and coding.

Invest 1 hour every day

One hour a day experimenting with AI tools. In a month, you'll be a completely different person.

Build adaptability

The ability to learn itself — not any specific skill — becomes your most valuable asset.

Spread the word

The fastest way to close the gap is to share this information.

The gap between the AI you see
and the real AI
is growing wider

Same Era, Completely Different Realities

Latest paid model users

Free version users

AI Capability Awareness by Group

AI progress follows an exponential curve — and it's accelerating

METR Benchmark: Autonomous Task Duration by Model

Two models launched on the same day

Feb 5 Models vs Previous Gen — Key Benchmark Comparison

Exponential Growth in AI Autonomous Task Duration

What Happened in 4 Years

See for Yourself

AI Capability Time Machine

Why the Growing Gap Is Dangerous

Actual AI Capability vs. Public Perception

Jobs Already Under Threat

What You Can Do Now

Try the latest AI yourself

Apply it to real work

Invest 1 hour every day

Build adaptability

Spread the word

The gap between the AI you see and the real AI is growing wider

Same Era, Completely Different Realities

Latest paid model users

Free version users

AI Capability Awareness by Group

AI progress follows an exponential curve — and it's accelerating

METR Benchmark: Autonomous Task Duration by Model

Two models launched on the same day

Feb 5 Models vs Previous Gen — Key Benchmark Comparison

Exponential Growth in AI Autonomous Task Duration

What Happened in 4 Years

See for Yourself

AI Capability Time Machine

Why the Growing Gap Is Dangerous

Actual AI Capability vs. Public Perception

Jobs Already Under Threat

What You Can Do Now

Try the latest AI yourself

Apply it to real work

Invest 1 hour every day

Build adaptability

Spread the word

The gap between the AI you see
and the real AI
is growing wider