A review of regulator-approved medical AI models in eyecare has found that they vary widely in providing evidence for clinical performance – and lacked transparency about training data, including details of gender, age and ethnicity.
The analysis, led by researchers at Moorfields Eye Hospital and the UCL Institute of Ophthalmology (IoO), examined 36 regulator-approved artificial intelligence as a medical device (AIaMD) tools in Europe, Australia and the US, and found “concerning trends”.
Of the devices reviewed, 19 per cent had no published peer-reviewed data on accuracy or outcomes. In evaluating the available evidence for the remainder, the researchers found that across 131 clinical evaluations, only 52 per cent of studies reported patient age, 51 per cent reported sex, and only 21 per cent reported ethnicity.
The review also highlights that most validation used archival image sets, with limited diversity or inadequate reporting of basic demographic characteristics and uneven geographical distributions.
Very few studies compared the AI tools head-to-head with each other (eight per cent) or with the standard of care of human doctors (22 per cent). Only 11 of the 131 studies (eight per cent) were interventional – the kind that test devices in real-life clinical settings and affect clinical care. This means real-world validation is still scarce, the authors claim.
More than two-thirds of the AI tools target diabetic retinopathy in a screening context, either singly or together with glaucoma and macular degeneration, while other common sight-threatening conditions and settings remain largely unaddressed, the authors reported.
Almost all the devices examined (97 per cent) are approved in the European Union, but only 22 per cent have Australian clearance with just eight per cent are authorised in the US. “This uneven regulatory landscape means devices cleared on one continent may not meet standards elsewhere”, it is claimed.
The authors are calling for these “shortcomings” to be addressed and for “rigorous, transparent evidence and data that meets the FAIR principles of Findability, Accessibility, Interoperability, and Reusability, since lack of transparency can hide biases”.
Lead author Dr Ariel Ong commented: “AI has the potential to help fill the global gap in eyecare. In many parts of the world, there simply aren’t enough eye specialists, leading to delayed diagnoses and preventable vision loss. AI screening could help identify disease earlier and support clinical management, but only if the AI is built on solid foundations.
“We must hold AI tools to the same high standards of evidence as any medical test or drug. Facilitating greater transparency from manufacturers, validation across diverse populations, and high-quality interventional studies with implementation-focused outcomes are key steps towards building user confidence and supporting clinical integration”
Senior author Jeffry Hogg said: “Our review found that the evidence available to evaluate the effectiveness of individual AIaMDs is extremely variable, with limited data on how these devices work in the real world. Greater emphasis should be placed on accurate and transparent reporting of datasets. This is critical to ensuring devices work equally well for all people, as some populations may be underrepresented in the training data.”
The authors are encouraging manufacturers and regulators to adopt standardised reporting – for example, publishing detailed ‘model cards’ or trial results at each stage of development. They noted that regulatory frameworks for AIaMDs may benefit from a more standardised approach to evidence reporting, which would give clarity to both device developers and end users.
The review also highlights new guidance, such as the EU AI Act, that could “raise the bar for data diversity and real-world trials”.