Citation hallucination: the silent killer in AI-assisted writing
The Reality of AI in Academic Research
LLMs are structural mimics: they reproduce the form of scientific writing — citations, journal names, statistical phrasing — without any guarantee the content behind it is real. This is most dangerous with references, where a model will produce a citation that looks perfectly legitimate and points to nothing. The same mechanism that makes AI writing fluent is the one that makes it fabricate. That is why I treat verification, not detection, as the only durable safeguard.
When I sit down to write, I don't want magic. I want reliability. I've tested dozens of tools, and most fail when subjected to the rigors of peer review. I use my own workflows to ensure that the data I present is accurate, verifiable, and free of hallucinations.
The Mechanics of Hallucination
I caught 3 hallucinated references in a manuscript I drafted before submission. I was stunned. I had used an LLM to help me synthesize some background literature, and it confidently presented three papers that sounded perfectly relevant. I went to look them up, and they didn't exist. The DOIs were fake. The author combinations were plausible but incorrect. I realized that the model wasn't retrieving information; it was predicting the next most likely token. I knew I had to fix this.
Why It Happens
I've learned that LLMs are structural mimics. They know what a citation is supposed to look like. They know that a paper on cardiology should probably have "Circulation" or "JACC" in the journal field. I found that they use this structural knowledge to generate fake references that look entirely legitimate. I cannot emphasize enough how dangerous this is for scientific integrity. I have seen these slip past peer review.
How I Combat It
I use CiteCheck to verify every single reference against a database of 240 million real papers. I open-sourced this tool because I believe the community needs it. I run pip install citecheck on every new environment I set up. You can find it at pypi.org/project/citecheck. I also pass all my drafts through the AVR platform at aiforacademic.world/app to ensure that the references actually support the claims being made.
The Broader Impact
I talk more about the overall stack in my post /blog/ai-research-stack-5-tools-that-save-time. I also structured my entire approach into the CIVER framework, which I detail in /blog/civer-4-tier-research-integrity-framework. I believe we must shift from trying to detect AI to proving our own integrity. I don't care if you used AI to write a sentence; I care if the sentence is true and the citation is real.