Content Tags

There are no tags.

A Multicenter, Scan-Rescan, Human and Machine Learning CMR Study to Test Generalizability and Precision in Imaging Biomarker Analysis

Anish N. Bhuva, Wenjia Bai, Clement Lau, Rhodri H. Davies, Yang Ye, Heeraj Bulluck, Elisa McAlindon, Veronica Culotta, Peter P. Swoboda, Gabriella Captur, Thomas A. Treibel, Joao B. Augusto, Kristopher D. Knott, Andreas Seraphim, Graham D. Cole, Steffen E. Petersen, Nicola C. Edwards, John P. Greenwood, Chiara Bucciarelli-Ducci, Alun D. Hughes, Daniel Rueckert, James C. Moon, Charlotte H Manisty


Automated analysis of cardiac structure and function using machine learning (ML) has great potential, but is currently hindered by poor generalizability. Comparison is traditionally against clinicians as a reference, ignoring inherent human inter- and intraobserver error, and ensuring that ML cannot demonstrate superiority. Measuring precision (scan:rescan reproducibility) addresses this. We compared precision of ML and humans using a multicenter, multi-disease, scan:rescan cardiovascular magnetic resonance data set.


One hundred ten patients (5 disease categories, 5 institutions, 2 scanner manufacturers, and 2 field strengths) underwent scan:rescan cardiovascular magnetic resonance (96% within one week). After identification of the most precise human technique, left ventricular chamber volumes, mass, and ejection fraction were measured by an expert, a trained junior clinician, and a fully automated convolutional neural network trained on 599 independent multicenter disease cases. Scan:rescan coefficient of variation and 1000 bootstrapped 95% CIs were calculated and compared using mixed linear effects models.


Clinicians can be confident in detecting a 9% change in left ventricular ejection fraction, with greater than half of coefficient of variation attributable to intraobserver variation. Expert, trained junior, and automated scan:rescan precision were similar (for left ventricular ejection fraction, coefficient of variation 6.1 [5.2%–7.1%], P=0.2581; 8.3 [5.6%–10.3%], P=0.3653; 8.8 [6.1%–11.1%], P=0.8620). Automated analysis was 186× faster than humans (0.07 versus 13 minutes).


Automated ML analysis is faster with similar precision to the most precise human techniques, even when challenged with real-world scan:rescan data. Assessment of multicenter, multi-vendor, multi-field strength scan:rescan data (available at permits a generalizable assessment of ML precision and may facilitate direct translation of ML to clinical practice.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.