For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its “Extended Thinking” version) to find an error in “Today’s featured article”. In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.


90% errors isn’t accurate. It’s not that 90% of all facts in wikipedia are wrong. 90% of the featured articles contained at least one error, so the articles were still mostly correct.
And the featured articles are usually quite large. As an example, today’s featured article is on a type of crab - the article is over 3,700 words with 129 references and 30-something books in the bibliography.
It’s not particularly unreasonable or unsurprising to be able to find a single error amongst articles that complex.