Essay

Are You Getting Better, or Just Getting Used to It?

On measuring raw listening skill, not memory · 2026

After a year of serious language practice, most learners notice something unsettling. Their overall success rate is climbing. They feel like they are improving. But when they encounter a native speaker in the wild — a podcast they haven't heard before, a conversation they weren't expecting — they understand far less than their practice stats suggest.

The stats were not lying. But they were measuring the wrong thing.

The Recognition Problem

Every spaced repetition system accumulates a large body of material you have seen before. After a year, the vast majority of your practice is review — sentences you have heard dozens of times, whose rhythm and vocabulary have become familiar through repetition alone. Your success rate on these sentences reflects memory, not comprehension. You would recognize your own name even if you had never studied the language.

What overall success rate measures

Your ability to transcribe sentences you have encountered before — a mix of genuine skill and accumulated familiarity.

What you actually want to know

Your ability to transcribe a sentence you have never heard in your life — cold, no context, no prior exposure.

These two things diverge sharply over time. A learner who has reviewed the same 500 sentences a thousand times each can achieve a 90% overall success rate while being nearly helpless with unfamiliar input. The metric flatters. The skill hasn't transferred.

The Cold Start

The metric that actually matters is your success rate on first attempts — sentences you have never seen before. We call this the cold-start success rate.

It measures something purer: given a sentence spoken by a native speaker, with no prior exposure, no accumulated familiarity, no memory advantage — can you hear it and reproduce it? That is listening comprehension. Everything else is partially memory.

Your cold-start rate is the closest thing to a real-world test you can run at your desk. It measures the skill, not the history.

Most language learning apps have no equivalent metric. Anki, for instance, cannot easily compute it — because Anki, in its default configuration, only introduces new cards when the review queue is empty. At scale, the queue is never empty. New cards are rarely surfaced. The system has no meaningful data on first-attempt performance, because it almost never creates first attempts.

This is not an accident of implementation. It follows from the dogmatic scheduling philosophy described in the essay on spaced repetition: review always takes priority. New exposure is a second-class citizen. And so the one metric that would honestly reflect whether you are actually improving goes unmeasured.

What the Data Shows

With a fiat factor that regularly forces new, unseen sentences into every session — regardless of how large the review backlog grows — the cold-start rate accumulates enough data to be meaningful. Here is what four years of German practice produced:

Streak	Success rate	What it reflects
First attempt (cold start)	~45%	Raw listening skill
1 prior success	67%	Skill + one exposure
2 prior successes	75%	Skill + familiarity building
3 prior successes	85%	Mostly familiarity
4+ successes (mastered)	96–100%	Memory

The 45% cold-start rate is not a failure. It is an honest number. It means that on a genuinely unfamiliar sentence, the right answer comes roughly half the time on the first attempt. That number climbs — slowly, over years — as the underlying listening skill improves. Watching it climb is one of the few reliable signals that real progress is happening.

An overall success rate of 85% sounds better. It is less honest. It includes thousands of sentences you could transcribe in your sleep.

Why This Gives Away Nothing

Publishing this metric description does not hand competitors an advantage. The insight is simple once stated: measure first attempts separately from review attempts. Any system could do this. Most won't, because doing it properly requires a fiat factor — a deliberate, algorithmic commitment to surfacing new material even when reviews are pending. That commitment contradicts the foundational assumption of dogmatic spaced repetition. Systems built on that assumption will not add a metric that implicitly indicts their scheduling philosophy.

The real moat is not the metric. It is four years of iteration, a validated corpus of 26,000 Romanian sentences with bundled audio and translations, and an offline desktop app that runs without a network connection. The insight is free. The infrastructure is not.

What to Watch Instead

If you are serious about measuring progress in a foreign language, track these three numbers separately and watch them over months, not days:

Cold-start rate — your success on sentences you have never attempted. This is your raw skill. It should trend slowly upward over years. If it is not moving, you are not getting better at listening — you are getting better at reviewing.

Coverage — what percentage of your sentence corpus you have attempted at least once. A learner with 10,000 sentences who has only encountered 2,000 of them has a very different profile from one who has touched 8,000. Coverage measures breadth of exposure.

Mastery rate — the percentage of encountered sentences at streak 4 or above. This measures depth. Combined with coverage, it gives you an honest picture: wide but shallow, or narrow but deep.

Overall success rate — the number most apps show you most prominently — is the least informative of the four. Watch it last, if at all.

See your cold-start rate

Romanian shareware — free, offline, macOS. Session statistics built in.

Download Free