Female breast cancer demonstrates bimodal age frequency distribution patterns at diagnosis, interpretable as two main etiologic subtypes or groupings of tumors with shared risk factors. While RNA-based methods including PAM50 have identified well-established clinical subtypes, age distribution patterns at diagnosis as a proxy for etiologic subtype are not established for molecular and genomic tumor classifications.
We evaluated smoothed age frequency distributions at diagnosis for Carolina Breast Cancer Study cases within immunohistochemistry-based and RNA-based expression categories. Akaike information criterion (AIC) values compared the fit of single density versus two-component mixture models. Two-component mixture models estimated the proportion of early-onset and late-onset categories by immunohistochemistry-based ER (n = 2860), and by RNA-based ESR1 and PAM50 subtype (n = 1965). PAM50 findings were validated using pooled publicly available data (n = 8103).
Breast cancers were best characterized by bimodal age distribution at diagnosis with incidence peaks near 45 and 65 years, regardless of molecular characteristics. However, proportional composition of early-onset and late-onset age distributions varied by molecular and genomic characteristics. Higher ER-protein and ESR1-RNA categories showed a greater proportion of late age-at-onset. Similarly, PAM50 subtypes showed a shifting age-at-onset distribution, with most pronounced early-onset and late-onset peaks found in Basal-like and Luminal A, respectively.
Bimodal age distribution at diagnosis was detected in the Carolina Breast Cancer Study, similar to national cancer registry data. Our data support two fundamental age-defined etiologic breast cancer subtypes that persist across molecular and genomic characteristics. Better criteria to distinguish etiologic subtypes could improve understanding of breast cancer etiology and contribute to prevention efforts.