The Deskilling Cycle
When AI makes experts worse: is “deskilling” a strategic behavior, an information loss, or both?
Recently, these messy masterpieces were created in my kitchen:
Oops, wrong messy masterpieces. This one is mine:
Because this was not messy enough, I redrew it messier and quietly replaced it with this version in a few posts without comment:
This post explains what changed in the diagram, and why it shows how AI may cause unpredictable changes to the net effects of the interventions it’s meant to improve.
Decomposition Imposition
My kids obviously have innate artistic genius, but the rest of the world has to acquire skill through a combination of training and experience. Exercising skill can have social and political (strategic) as well as informational components, since the people exercising it are social and political animals. And there’s potentially useful variation right now in how experts in the real world are exercising their skill to make technology-mediated decisions, as artificial intelligence is being rolled out broadly to assist these decisions. You say heterogeneity, I say data.
Recall that we (I) need some kind of hack to begin decomposing causal pathways — classification, strategy, information, and resource reallocation — in mass screenings for low-prevalence problems. In security case studies, we can sometimes simulate the different pathways, albeit based on data of questionable internal and external validity. Polygraphs are a great example, because we have some data on deterrence (a facet of strategic behavior) and some data on the bogus pipeline effect (an information pathway phenomenon). (I did that in this working paper, but (a) come on — these simulations are fictional; and (b) I am now wondering if I should rewrite this paper completely to include 2-4 of my favorite medical case studies, since they offer more plausible validation problem solutions and this demonstrates the argument much better.)
In medical case studies, we sometimes have much better data in terms of internal and external validity, particularly on mammography and colonoscopy. It even looks like we have great randomized trial data in those particular cases, although in practice we still have to emulate target trials because people have minds of their own, and those TTEs still need to be replicated and modeled as draws from distributions. And there is another challenge here to tackle going forward: it is less obvious in this domain how to separate out the non-classification pathway effects from net effect estimates. But differential rollout of AI-assisted screening may give us leverage to begin doing that.
Why is this a hard problem?
As I’ve written previously, in the security and medical domains alike,
we have to worry about perverse incentives on both sides motivating “small fish” (individuals), “bigger fish” (an entire security-industrial complex and its adversaries, pictured here as octopi), and the “biggest fish” (societal-level predation, pictured here as circling sharks). As in all human endeavors, we have to worry about common cognitive biases including dichotomania (as in the dichotomization of the liberty-security sequential-feedback continua, and as represented by the rainbow smiley-face) and the base rate bias (star-eyed red smiley) in our expectations of what sorts of outcome spreads even highly accurate tests (Rudolf) produce.
Let’s say we can cobble together qualitative and quantitative data on the size of the incentives motivating these medical fish. Then how do we estimate and model the effects of those incentives on relevant clinician and patient behavior? I’m sure there are a range of defensible ways to actually do this. There’s just not one right way, it’s not established practice, and it’s going to include plenty of uncertainty and interpretation. Let’s call that strategy.
Same goes for information effects. Welch et al. and others have written about the psychosocial components of clinicians and patients alike wanting more and more screening tests. Getting a negative result feeds the cycle because you get relief; getting a positive result feeds the cycle because you think you got saved, even if, in all likelihood, you were a false positive.
Same goes for modeling resource reallocation effects. What if postpartum breast cancer (understudied and deadly) could be largely prevented through a strict regimen of low-dose ibuprofen, anti-inflammatory diet, exercise, sleep, and stress reduction in which, when you have a baby, you get help taking care of yourself and your kid when you need it. (Presumably this help comes from a decentralized, state-subsidized fleet of highly trained Swedish nannies.) But we don’t know that’s true, even though it would be way more cost-effective (and fun) than mammography, because mammography has captured the necessary funds to research and administer this intervention.
Yeah. What if. There are lots of what-if’s along the resource reallocation pathway, no one ever models it when they talk about net effects of these massive resource-hogging programs, and it’s not clear to me that there is a hugely credible way of doing it precisely because it deals with what we don’t know. Ok, we can work it out a bit. But the uncertainties will be large and the interpretive dimensions will be substantial.
Part of me just does not care how bad the argument is for generalizability (e.g., bogus pipeline lab to field contexts); I want some data to be able to simulate some effects to be able to add up the pathways and see what the net is. I want answer.
But no, I cannot have answer. We need to throw the ball again — to try to ask a more answerable empirical question. What would give us better traction? Where do we actually observe these pathways interacting in real life with good data we can access?
How does looking at AI rollout data make this picture clearer instead of hazier?
Deskilling. The term is descriptive, not causal. It tells you something happened to skill — not how, not through which pathway, and not whether the same mechanism applies across domains. Before we can use deskilling as evidence in a causal model of screening systems, we need to logically categorize it. Is it a behavioral adaptation (strategy)? A knowledge loss (information)? A feedback between the two? The answer matters for how we model it, measure it, and — if we want to prevent it, so AI-assisted screening net improves instead of degrading outcomes — intervene on it.
Budzyń, Mori, Bretthauer et al. (2025) showed AI-assisted colonoscopy creates deskilling — a cross-pathway feedback effect wherein strategic behavioral adaptation (reduced vigilance) produces cumulative skill degradation (information loss), creating a reinforcing cycle. It’s a preliminary finding from Polish data from ACCEPT, a much larger, ongoing cross-country colonoscopy trial.
Specifically, the data are compatible with as little as a 1.6% decrease in colonoscopist skill at detecting adenomas (adenoma detection rate or ADR), or as much as a 10.5% reduction. At either end, this is a clinically important effect: colonoscopists who get used to AI appear to be worse at their jobs when they then don’t have access to it. Each percentage point of ADR is associated with measurable downstream cancer risk.
The implications of this finding warrant explanation: Colonoscopy is a preventive treatment and screening in one, since the typical scenario is “find polyp, remove polyp, done.” So, unlike in mammographic breast cancer detection, a decrease in ADR is also a decrease in treated cancers, not just a decrease in diagnosed cancers. In other words, it looks like there’s a signal here and it’s one we care about, but we can’t be sure of its magnitude or generalizability.
What’s going on here?
Hypothesis 1: Strategy (behavioral adaptation)
One hypothesis is that the expert reduces visual attention because AI is handling detection. We go on autopilot when we can, because we have more important things to do in our heads, like replaying conversations and writing poetry (right, guys? right??). Troya et al. (2022) showed gaze patterns change when AI is present — endoscopists literally look differently. That’s a behavioral response to a changed decision environment. Classic strategy.
Hypothesis 2: Information (skill loss)
A competing hypothesis, however, is that this is really an information effect. After sustained AI exposure, the endoscopist’s pattern recognition has atrophied. They haven’t just chosen to look less carefully — they’ve actually lost the ability to detect as well as they used to. The skill is gone, not just the effort. That’s knowledge loss.
The answer, I think, is that it’s both. That’s the point: deskilling is not a single pathway effect. It’s a feedback in a dynamic system.
The Cycle
Here’s what I think is happening:
Strategy → Information (moderating): The endoscopist’s behavioral adaptation (reduced vigilance when AI is present) moderates what information effects occur. If you stop practicing a skill, you lose it. The strategy of relying on AI determines the rate and magnitude of skill degradation.
Information → Strategy (feedback): Once skill has degraded, the endoscopist’s strategy changes further. If you know (or sense) that you’re worse at detecting polyps than you used to be, you rely more on AI when it’s available. The skill loss reinforces the behavioral dependence.
Information → Classification: Degraded skill directly affects classification accuracy. When AI is unavailable, the endoscopist’s detection rate is now below their pre-AI baseline. (Think of how well you played the piano when you practiced daily as a kid, versus how you sound when you sit down on a lark now if you can’t do it often…)
Classification → Outcome → Strategy: Observed detection rates (outcomes) feed back into institutional decisions about AI adoption. If decision-makers see only AI-on performance, they conclude AI is essential — reinforcing the deployment decisions that created the deskilling in the first place.
This is a cycle. It can’t be represented in a directed acyclic graph (DAG), because the whole point is that the effects loop. We need a directed cyclic graph (DCG), and to build out Besserve and Schölkopf’s (2022) equilibrium framework accordingly.
Thus, the updated directed cyclic graph above adds two arrows between Strategy and Information that were missing from the previous version, which treated Strategy and Information as independent pathways from Test to Outcome. Orange (moderating): Strategy → Information — how the endoscopist behaves moderates what skill effects accumulate over time. Red (feedback): Information → Strategy — accumulated skill change feeds back to modify behavioral strategy. Deskilling shows these pathways are coupled, and the coupling is what creates the reinforcing loop.
Why This Matters
The deskilling cycle is likely not unique to AI-assisted colonoscopy. It poses a potential risk wherever AI assists human judgment. This could affect a variety of domains:
Radiology: Radiologists using AI-assisted mammography may develop similar attention patterns. We don’t have the deskilling data yet from mammography, but the mechanism is identical.
Aviation: Autopilot deskilling is well-documented. Pilots who rely on automated systems lose manual flying skills. Aviation has addressed this with mandatory manual flying requirements — essentially, intermittent AI deployment. It’s also customary in some niches to hack the AI use deskilling problem with more AI use: simulating manual flying under various more difficult conditions to keep or augment the human skill without burning the fuel and taking the live flight risks.
Security screening: TSA officers using automated threat detection may reduce visual vigilance — but here we usually can’t observe the deskilling because there’s no gold standard (the validation problem is unsolvable for most security applications). (Or can we? There have been some good studies involving contraband like guns in transportation hubs, so maybe I am being too hasty here.) Point is, we might be losing human skill by incorporating AI and never know it.
The Methodological Implication
If deskilling is a cross-pathway effect that operates through a Strategy → Information → Classification → Outcome → Strategy cycle, then any analysis that assigns it to a single pathway will underestimate its system-level consequences.
Standard RCT analysis treats AI as a classification intervention: does AI improve detection? Yes. But the deskilling finding shows that the same intervention also operates through the strategy and information pathways, creating feedback loops that the RCT wasn’t designed to detect — because RCTs measure short-term effects in AI-naive practitioners, not the equilibrium the system reaches after sustained exposure.
This is why the relevant professional guidance (the AGA’s Living Clinical Practice Guideline on Computer-Aided Detection-Assisted Colonoscopy, Sultan et al. 2025) couldn’t translate ADR improvement into a recommendation. The authors instead recognized that classification gains don’t automatically translate to outcome improvements. (My causal diagrams offer a logical framework explaining why.)
The deskilling cycle is one possible concrete mechanism for this translation failure: AI improves detection when present, but degrades the human component, making the system fragile and AI-dependent. The “human in the loop” gets crushed in the gears.
Open Questions
Can we just systematize AI access as periodic to maximize accuracy benefit and minimize deskilling cost? Is this an equilibrium-finding kind of question with a possible experimental answer we can test for generalizability across domains? Do we need to get enough data that we can run subgroup analyses by case difficulty level, following Ribers and Ullrich (2024) on how AI and humans can be better than each other at diagnosing different types of cases?
I don’t know. I drafted another apocryphal postdoc proposal last month on this topic and haven’t had time to look at it for a month. Ask me anything!






