Publication
Mar. 11. 2025.
Title
Value of Using a Generative AI Model in Chest Radiography Reporting: A Reader Study
Author
Eun Kyoung Hong1,2, Byungseok Roh3, Beomhee Park3, Jae-Bock Jo4, Woong Bae4, Jai Soung Park5, Dong-Wook Sung6
1Department of Radiology, Mass General Brigham, Boston, Mass
2Department of Radiology, Brigham & Women’s Hospital, 75 Francis St, Boston, MA 02115
3Kakaocorp, Seoul, South Korea
4Soombit.ai, Seoul, South Korea
5Department of Radiology, Soonchunhyang University College of Medicine, Cheonan, South Korea
6Department of Radiology, Kyung Hee University School of Medicine, Seoul, South Korea
2Department of Radiology, Brigham & Women’s Hospital, 75 Francis St, Boston, MA 02115
3Kakaocorp, Seoul, South Korea
4Soombit.ai, Seoul, South Korea
5Department of Radiology, Soonchunhyang University College of Medicine, Cheonan, South Korea
6Department of Radiology, Kyung Hee University School of Medicine, Seoul, South Korea
Published
Radiology, 2025
https://doi.org/10.1148/radiol.241646
https://doi.org/10.1148/radiol.241646
Abstract
Back to News & Publications
Use of a multimodal generative artificial intelligence model increased the efficiency and quality of chest radiograph interpretations by reducing reading times and increasing report accuracy and agreement.
Background
Multimodal generative artificial intelligence (AI) technologies can produce preliminary radiology reports, and validation with reader studies is crucial for understanding the clinical value of these technologies.
Purpose
To assess the clinical value of the use of a domain-specific multimodal generative AI tool for chest radiograph interpretation by means of a reader study.
Materials and Methods
A retrospective, sequential, multireader, multicase reader study was conducted using 758 chest radiographs from a publicly available dataset from 2009 to 2017. Five radiologists interpreted the chest radiographs in two sessions: without AI-generated reports and with AI-generated reports as preliminary reports. Reading times, reporting agreement (RADPEER), and quality scores (five-point scale) were evaluated by two experienced thoracic radiologists and compared between the first and second sessions from October to December 2023. Reading times, report agreement, and quality scores were analyzed using a generalized linear mixed model. Additionally, a subset of 258 chest radiographs was used to assess the factual correctness of the reports, and sensitivities and specificities were compared between the reports from the first and second sessions with use of the McNemar test.
Results
The introduction of AI-generated reports significantly reduced average reading times from 34.2 seconds ± 20.4 to 19.8 seconds ± 12.5 (P < .001). Report agreement scores shifted from a median of 5.0 (IQR, 4.0–5.0) without AI reports to 5.0 (IQR, 4.5–5.0) with AI reports (P < .001). Report quality scores changed from 4.5 (IQR, 4.0–5.0) without AI reports to 4.5 (IQR, 4.5–5.0) with AI reports (P < .001). From the subset analysis of factual correctness, the sensitivity for detecting various abnormalities increased significantly, including widened mediastinal silhouettes (84.3% to 90.8%; P < .001) and pleural lesions (77.7% to 87.4%; P < .001). While the overall diagnostic performance improved, variability among individual radiologists was noted.
Conclusion
The use of a domain-specific multimodal generative AI model increased the efficiency and quality of radiology report generation.