top of page
brandmark-design (1).png

5 December 2025

8

min read

Study on Integration of AI for Breast Cancer Screening in Ireland

This pilot, prospective, population-based randomised double-blind study investigates whether artificial intelligence (AI) can be integrated into Ireland’s national BreastCheck programme by comparing AI + Radiologist double reading with the current gold standard of double reading by two Radiologists. The study evaluates cancer detection rates, false positives, and workflow efficiency to determine if AI can maintain diagnostic performance while reducing radiologist workload.

Updated: 

16 December 2025

Background


Population health screening is the application of a test on people who have no symptoms, but are still at risk of developing a particular disease. The objective is to reduce morbidity and mortality from the disease being screened for. Mammography (X-ray of the breast) has been used for breast cancer screening since the 1980s. Classical mammography has a sensitivity of 90% and specificity of 95%. Persistent global shortage of Radiologists means there's a mismatch between supply and demand, and therefore an unmet need in cancer prevention.1

Multiple retrospective studies suggest Artificial Intelligence (AI) tools can accurately read mammograms independently.2-8 A recent prospective study from Sweden demonstrated AI-powered mammography reading is non-inferior to current standard of care while reducing Radiologists' workload within an existing screening workflow.9 The national breast cancer screening programme for Ireland is BreastCheck that commenced in the year 2000. It offers biennial mammograms to all women between 50-69 years. Current standard practice is for each mammogram is to be read and reported by two independent radiologists. As per the latest BreastCheck Statistical Reports (2023), 123,891 women attended for screening during 2021 with 1,202 cancers detected in a year.10 This translates into a 'cancer detection rate' (CDR) of 9.7 per 1,000 invitees screened. The COVID-19 pandemic caused major disruptions to cancer screening services in Ireland and throughout the world. As such, innovative, automated strategies to speed up screening workflows and reduce human workload can have significant benefits in maximising limited specialist healthcare resources.


Research Definition


Can AI be integrated into the current workflow of BreastCheck and how would it affect cancer detection and false positive rates? Specifically, the project aims to prospectively evaluate if double reading by AI and one Radiologist (AI+R) is comparable to the Gold standard of double reading by two Radiologists (R+R) in a randomly selected sample from the eligible cohort for mammography in 2024. Comparison of CDRs will be based on the subsequent primary research outcome of screen detected breast cancer enabled through either method.


Research Strategy


The project will employ a quantitative research strategy by randomly assigning the cohort of women attending for mammography through the centralised BreastCheck Programme during a three-month period (Q2 of 2024). Null hypothesis (H0) is that there is no difference between (AI+R) and (R+R), thereby denoting non-inferiority of former compared to latter standard of care. Acceptance of H0 would suggest (AI+R) can be implemented into the screening workflow by virtue of non-inferiority to the current standard. The main 'parameter' is CDR, to be measured during Phase 1 (Q2 of 2024) in both groups and Phase 2 (Q3 of 2024) in experimental group only. The primary 'statistic' is formal diagnoses of breast cancer (biopsy proven) fed back to BreastCheck register within three months of any given mammographic reading. As AI is not an established modality yet, all those randomly assigned to the experimental group (AI+R) will subsequently be read by a Radiologist (R) as current standard of care during an extension phase (Q3 of 2024). The study workflow is summarised below (Figure 1):




Figure 1: Breast screening workflow over project cycle of nine months (Q2-Q4 of 2024).


Phase 1 (Q2 of 2024):

• Experimental Arm (AI+R) - double read

• Standard Arm (R+R) - double read


Phase 2 (Q3 of 2024):

• Experimental Arm (R) - single read

• Standard Arm - no read


Phase 3 (Q4 of 2024):

• Experimental Arm - routine monitoring

• Standard Arm - routine monitoring


Research Design & Methodology


The project aims to combine experimental and longitudinal designs: control group undergoing standard screening (R+R) and experimental group undergoing investigative screening (AI+R). Both groups will be compared using the CDR parameter at the end of Phase 1. Experimental group will subsequently enter an extension phase designed to ensure current standard of care in BreastCheck is maintained for all those being screened. CDR measurements of experimental group at the end of Phase 2 will again be compared to baseline values. Outcomes for both groups will be longitudinally compared with their respective eventual cancer diagnoses per 1,000 women within three months of a screen reading.


It should be noted that control and experimental groups cannot be matched on individual characteristics as BreastCheck is universally offered to all women in the target cohort between 50-69 years. However, this is less of an issue given the programme is well defined with a narrow screening objective for that age group. Further, the design would be double-blind, i.e. neither invitee nor Radiologist would be informed of which study arm a particular screen has been randomly assigned to. The AI tool will be operational independently (blinded to both invitee and Radiologist) in the background of current workflow during Phase 1 only. Once baseline mammography has been performed, participants are not required to reattend before next visit (unless in the event of an abnormal result), thus minimising the attrition problem of longitudinal design. The measurable parameter of CDR remains constant, and information of subsequent formal cancer diagnoses are fed back by specialty clinics to the existing central register. Finally, proposed study will utilise established resources of Breast Check, with only upfront overheads of AI tool implementation expected to be an additional cost.


Sampling & Research Instruments


Study 'population' is well defined as women aged between 50-69 in Ireland routinely invited to participate in BreastCheck once every two years. Specifically, those attending for mammography during Q2 of 2024 would constitute the 'sampling frame'. Inferring from the latest figures in 2021 of approx. 120,000 annually, this would translate to a 'sample size' of about 30,000, i.e. approx. 15,000 under each study arm. 'Sampling error' is likely insignificant given the population is homogenously defined based on age boundaries, unless the sampling frame becomes particularly skewed due to the profile of attendees during Q2 of 2024.


As it's impractical to sample the whole population through 'simple random sampling', the strategy is more consistent with 'systematic sampling' whereby participants during Q2 are randomly assigned to either study arm. Unless sampling frame is particularly skewed (unlikely scenario in post-COVID lockdowns), this method is not expected to introduce significant bias.


CDR data will be prospectively collected for both groups between April-June 2024, and compared with formal cancer diagnoses (biopsy proven) at the end of Phase 1. Subsequently, CDR data for experimental group will continue to be prospectively collected between July-September 2024, with a second comparison vis-à-vis formal cancer diagnoses (biopsy proven) at the end of Phase 2. Finally, both groups will be subjected to status quo monitoring as per current BreastCheck programme protocol between October-December 2024. The project will conclude at end of Phase 3 with a third and final comparison. Performing comparative analysis of CDR vs. cancer diagnoses at the end of each quarter allows to preserve double-blinding. Interpretation of the CDR parameter is dependent upon subsequent biopsy confirmation (or not) of breast cancer. Study workflow by design will proceed to biopsy only if there's at least one positive screen reading by either AI or R. Any positive screen by either modality subsequently biopsy proven as cancer is a true positive (TP); similarly, any positive screen by either modality subsequently biopsy disproven as cancer is a false positive (FP).


Data Analysis Techniques


Any screening test is expected to be able to satisfy seven criteria: simple and quick, inexpensive, acceptable to population, accurate, repeatable, sensitive and specific. Influencing the sensitivity and positivity of screening can be done modifying the 'Criterion of Positivity', i.e. value at which test outcome is considered positive. In health screening, a balanced Criterion of Positivity is often sought. Magnitude of false positives (increased anxiety and costs) due to lower Criterion of Positivity and false negatives (false sense of security) due to higher Criterion of Positivity are traded off against each other (illustrated in Figure 2).

Standard methodology for medical screening trials employs a 'screen positive design'. Accordingly, only those detected as positive upon screening would undergo additional investigations with biopsy to confirm a breast cancer diagnosis. This means 'absolute' sensitivity and specificity values cannot be measured as negative test results are not verified with biopsy to definitively rule out breast cancer (a notion true to all medical screening tests).



Figure 2. Sensitivity & specificity of CDR as determined by Gold Standard (R+R) vs. Test (AI+R)


[Table showing randomisation of Q2 sub cohort CDR with True Positive, False Positive, False Negative, True Negative calculations for Sensitivity and Specificity - see original Figure 2]


However, CDR can be considered a good surrogate marker which can be used to compare the relative strength of an experimental screen reading strategy against the current standard. Thus 'relative' CDR will be calculated by dividing number of positives per 1,000 screens under AI+R by number of positives per 1,000 screens under R+R plus subsequently biopsy confirmed breast cancer despite a negative AI screen; this 'relative positive fraction' represents sensitivity of experimental strategy. Similarly, 'relative false positive fraction' will be calculated by dividing number of negatives per 1,000 screens under AI+R by number of negatives per 1,000 screens under R+R plus subsequently biopsy confirmed as normal despite a positive AI screen; this represents specificity of experimental strategy.


Standards established by Alonzo et al. (2002) for comparative screening tests allow non-inferiority (novel compared to established) designs.11 Such a method allows to quantify 'inter-test variability' prior to approval and widespread adoption. There is also significant 'intra-test variability' in reading performance among high-volume Radiologist. For example, Salim et al. (2020) demonstrated that CDR sensitivity of the least sensitive quartile of Radiologists was 15% lower than that of the most sensitive quartile in the context of a population-based breast cancer screening cohort.12 This study will adopt the non-inferiority margin of 15% (0.15) relative reduction in CDR sensitivity with a one-tailed t-test significance level alpha (α) of 0·025. P-value will be set at 95% (≤0.05) as a measure of how likely any significant difference between AI+R and R+R occurs under random chance. Acceptance of the null hypothesis in this methodology denotes non-inferiority between standard and experimental strategies.


Discussion


This pilot project seeks to implement AI as an independent reader of mammograms within the established workflow of the BreastCheck programme. It's designed as a prospective, randomised, double-blind study of a representative cohort of women during one quarter of the annual screening cycle. Therefore, comparative findings between experimental (AI+R) and standard (R+R) groups can readily be normalised to a population of 100,000 women screened.


Building on numerous retrospective studies to-date, it will evaluate real-world feasibility of implementing AI in a population-based breast screening programme. Findings may inform if such a strategy could yield Radiologists' workload (therefore cost) reduction of up to 50%.

Yet unaddressed challenges remain: can 'data drift' affecting performance of the AI tool be controlled for? Currently no validated standardised AI system calibration or quality assurance protocols exist. The screen-positive study design limits determination of absolute sensitivity and specificity values, instead allowing indirect estimates through CDR measurements. A better estimate can be reached by longer term follow-up of readings over a 24-month period, which would capture interval cancers in between the usual biennial mammography screens. Finally, findings would not be generalisable beyond the particular AI tool adopted in the absence of universal standardisation among different platforms available in the market.


Conclusions


Feasibility of AI+R strategy may be assessed against the current standard of R+R using the above-described project design. It can be implemented into the existing workflow of BreastCheck. Longitudinal comparison of CDR with biopsy-confirmed breast cancer diagnoses may allow for estimation of relative positive and negative predictive values.


References


  • Kwee, T.C. and Kwee, R.M. (2021). Workload of diagnostic radiologists in the foreseeable future based on recent scientific advances: growth expectations and role of artificial intelligence. Insights into imaging, 12(1), pp.1-12


  • Kim, H.E., Kim, H.H., Han, B.K., Kim, K.H., Han, K., Nam, H., Lee, E.H. and Kim, E.K. (2020). Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. The Lancet Digital Health, 2(3), pp.e138-e148


  • Lotter, W., Diab, A.R., Haslam, B., Kim, J.G., Grisot, G., Wu, E., Wu, K., Onieva, J.O., Boyer, Y., Boxerman, J.L. and Wang, M. (2021). Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nature Medicine, 27(2), pp.244-249


  • McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G.S., Darzi, A. and Etemadi, M. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), pp.89-94


  • Dembrower, K., Wåhlin, E., Liu, Y., Salim, M., Smith, K., Lindholm, P., Eklund, M. and Strand, F. (2020). Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study. The Lancet Digital Health, 2(9), pp.e468-e474


  • Rodriguez-Ruiz, A., Lång, K., Gubern-Merida, A., Broeders, M., Gennaro, G., Clauser, P., Helbich, T.H., Chevalier, M., Tan, T., Mertelmeier, T. and Wallis, M.G. (2019). Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. JNCI: Journal of the National Cancer Institute, 111(9), pp.916-922


  • Leibig, C., Brehmer, M., Bunk, S., Byng, D., Pinker, K. and Umutlu, L. (2022). Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis. The Lancet Digital Health, 4(7), pp.e507-e519


  • Salim, M., Wåhlin, E., Dembrower, K., Azavedo, E., Foukakis, T., Liu, Y., Smith, K., Eklund, M. and Strand, F., 2020. External evaluation of 3 commercial artificial intelligence algorithms for independent assessment of screening mammograms. JAMA oncology, 6(10), pp.1581-1588


  • Dembrower, K., Crippa, A., Colón, E., Eklund, M. and Strand, F., 2023. Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. The Lancet Digital Health, 5(10), pp.e703-e711.


  • National Screening Service (2023). BreastCheck Statistical Report 2021. Dublin: National Screening Service


  • Alonzo, T.A., Pepe, M.S. and Moskowitz, C.S. (2002). Sample size calculations for comparative studies of medical tests for detecting presence of disease. Statistics in medicine, 21(6), pp.835-852.


  • Salim, M., Dembrower, K., Eklund, M., Lindholm, P. and Strand, F. (2020). Range of radiologist performance in a population-based screening cohort of 1 million digital mammography examinations. Radiology, 297(1), pp.33-39.

Dr. Suranga Senanayake

Physician (MD, DO, Resident)

ba5ce40f2f63d62c0d1604efd8628dcaa1910072.png
51cffa6ef17e6d092f78200435a6055df6b758c8.png
f264ab5cc2757f8fb5b333dcb8cd42905db961aa.png
1770449288b6b323310c7fc549b511399421d785.png

Dr. Suranga is a Doctor of Medicine and Registrar at Ireland's Health Services

Get the Latest Healthcare
Stories in Your Inbox.

This is the space to promote the business's email newsletter. Encourage people to subscribe here. Use this space to promote the business, its products or its services.

bottom of page