A Quiz in a Lab Coat

I took a pre-test on a learning platform and scored 100%. The platform was supposed to provide me with recommendations for studying based on my knowledge and experience, but, according to the assessment, there were “no observable skill gaps.” I am an expert, at least based on this pre-test.
There were 25 questions from a validated question pool of over 2000 questions. The assessment is timed and uses what they call an adaptive testing engine all based on industry benchmarking analytics.
The issue is that I’m not expert in the domain, though maybe my ego would love for that to be true. The truth is, the assessment is garbage.
Let’s explore why,
25 questions cannot reliably measure expertise across a domain that normally requires ~80–120 items
In psychometrics, reliability is a function of:
- domain breadth
- item discrimination
- item difficulty spread
- number of items
25 items is:
- barely enough for entry-level skill classification,
- nowhere near enough to make a claim like “no observable skill gaps” across an entire domain.
Even NCLEX, FAA tests, and CompTIA exams use 70–150 items with enormous research backing.
A validated pool of 2,000 questions doesn’t matter if you’re only sampling 1% of it.
“Adaptive testing engine” is usually marketing language unless it’s true IRT
Legitimate adaptive testing uses:
- 2PL or 3PL Item Response Theory
- item difficulty calibration
- discrimination parameters
- exposure control
- Bayesian ability estimation
Most commercial “adaptive tests” instead just do:
- If correct → harder question; if incorrect → easier question These types of tests are not adaptive measurements; they're just branching quizzes.
Branching ≠CAT (Computer Adaptive Testing).
If you end with 100%, the algorithm failed to probe your upper limit
A proper CAT exam will:
- push you until you hit your ceiling,
- present items until your ability estimate stabilizes,
- continue until the measurement error drops below a threshold.
If you hit 100%, that means one of two things:
- The test had no upper-difficulty items in the domain, or
- The engine reached its stopping rule too early (a common problem with poorly implemented CATs).
Either way: It didn’t actually measure your ability. It just ran out of questions.
No “observable skill gaps” is a nonsense conclusion from such a small sample
This is like giving someone:
- 25 random math problems
- They happen to be easy
- You conclude: “You have mastered all mathematics.”
A high score on a narrow sample ≠mastery of the whole domain.
In psychometrics, this is called domain underrepresentation, which is the single most common assessment error.
The Real Issue
This platform is offering:
- a diagnostic test
- personalized learning recommendations
- confidence-based claims about your “expert” level
But mathematically: A 25-item sample cannot produce that level of diagnostic precision.
The recommendations engine has no data to work with, so it returns the only thing it can:
“No observable skill gaps.”
This is not an insight; it’s a failure state.
What does a Real Adaptive Diagnostic Look Like
A legitimate assessment would:
- map skills to a competency model
- sample multiple items per skill
- adapt at the skill-cluster level
- continue until confidence intervals shrink
- estimate your ability, not your score
- return granular skill-gap probabilities
And would require at least:
- 60–120 items for broad domains
- 30–50 items for narrow domains
- or continuous sampling until uncertainty is low
Anything else is a quiz wearing a lab coat.