Interpreting Clinical Narrative Diagnosis Models with Sentence Importance

Research output: Contribution to conferenceAbstract

Abstract

Despite advances in the application of deep neural networks to various kinds of medical data, extracting information from unstructured textual sources remains a challenging task. Using a dataset of de-identified clinical letters gathered at a memory clinic, we evaluate recurrent neural networks (RNNs) on their ability to predict patients’ diagnoses of ‘Dementia’, ‘Mild Cognitive Impairment’ or ‘Non-impaired’. This classification framework can also have applications in the automatic identification of patients as candidates for clinical trials.

After showing that models trained on state-of-the-art sentence-level embeddings outperform both word-level models and a recent benchmark model that fine-tunes a pre-trained general-domain language model, we probe sentence embedding models in order to reveal interpretable insights into the types of sentence-level representations the RNNs build. Specifically, we take a measure of sentence importance with respect to a given class and identify clusters of sentences in the embedding space that correlate strongly with importance scores for each class. Extracting the most frequent phrases within each group of sentence representations shows that the model is sensitive to sentences that cluster around semantic concepts such as a patient’s level of geriatric depression and how independent the patient is in their daily activities.

In addition to showing which sentences in a document are most informative about the patient’s condition, our method can identify the types of sentences that can lead the model to make incorrect diagnoses.
Original languageEnglish
Number of pages1
Publication statusPublished - 02 Sep 2019
EventInternational Conference of the Royal Statistical Society (RSS 2019) - Belfast, United Kingdom
Duration: 02 Sep 201905 Sep 2019
https://events.rss.org.uk/rss/frontend/reg/thome.csp?pageID=83705&ef_sel_menu=1647&eventID=270

Conference

ConferenceInternational Conference of the Royal Statistical Society (RSS 2019)
Abbreviated titleRSS 2019
CountryUnited Kingdom
CityBelfast
Period02/09/201905/09/2019
Internet address

Fingerprint

Recurrent neural networks
Geriatrics
Semantics
Data storage equipment
Deep neural networks

Cite this

Ormerod, M., Martinez del Rincon, J., McGuinness, B., & Devereux, B. (2019). Interpreting Clinical Narrative Diagnosis Models with Sentence Importance. Abstract from International Conference of the Royal Statistical Society (RSS 2019), Belfast, United Kingdom.
Ormerod, Mark ; Martinez del Rincon, Jesus ; McGuinness, Bernadette ; Devereux, Barry. / Interpreting Clinical Narrative Diagnosis Models with Sentence Importance. Abstract from International Conference of the Royal Statistical Society (RSS 2019), Belfast, United Kingdom.1 p.
@conference{093091794dd34e23938d7c53a6815fa6,
title = "Interpreting Clinical Narrative Diagnosis Models with Sentence Importance",
abstract = "Despite advances in the application of deep neural networks to various kinds of medical data, extracting information from unstructured textual sources remains a challenging task. Using a dataset of de-identified clinical letters gathered at a memory clinic, we evaluate recurrent neural networks (RNNs) on their ability to predict patients’ diagnoses of ‘Dementia’, ‘Mild Cognitive Impairment’ or ‘Non-impaired’. This classification framework can also have applications in the automatic identification of patients as candidates for clinical trials.After showing that models trained on state-of-the-art sentence-level embeddings outperform both word-level models and a recent benchmark model that fine-tunes a pre-trained general-domain language model, we probe sentence embedding models in order to reveal interpretable insights into the types of sentence-level representations the RNNs build. Specifically, we take a measure of sentence importance with respect to a given class and identify clusters of sentences in the embedding space that correlate strongly with importance scores for each class. Extracting the most frequent phrases within each group of sentence representations shows that the model is sensitive to sentences that cluster around semantic concepts such as a patient’s level of geriatric depression and how independent the patient is in their daily activities.In addition to showing which sentences in a document are most informative about the patient’s condition, our method can identify the types of sentences that can lead the model to make incorrect diagnoses.",
author = "Mark Ormerod and {Martinez del Rincon}, Jesus and Bernadette McGuinness and Barry Devereux",
year = "2019",
month = "9",
day = "2",
language = "English",
note = "International Conference of the Royal Statistical Society (RSS 2019), RSS 2019 ; Conference date: 02-09-2019 Through 05-09-2019",
url = "https://events.rss.org.uk/rss/frontend/reg/thome.csp?pageID=83705&ef_sel_menu=1647&eventID=270",

}

Ormerod, M, Martinez del Rincon, J, McGuinness, B & Devereux, B 2019, 'Interpreting Clinical Narrative Diagnosis Models with Sentence Importance', International Conference of the Royal Statistical Society (RSS 2019), Belfast, United Kingdom, 02/09/2019 - 05/09/2019.

Interpreting Clinical Narrative Diagnosis Models with Sentence Importance. / Ormerod, Mark; Martinez del Rincon, Jesus; McGuinness, Bernadette; Devereux, Barry.

2019. Abstract from International Conference of the Royal Statistical Society (RSS 2019), Belfast, United Kingdom.

Research output: Contribution to conferenceAbstract

TY - CONF

T1 - Interpreting Clinical Narrative Diagnosis Models with Sentence Importance

AU - Ormerod, Mark

AU - Martinez del Rincon, Jesus

AU - McGuinness, Bernadette

AU - Devereux, Barry

PY - 2019/9/2

Y1 - 2019/9/2

N2 - Despite advances in the application of deep neural networks to various kinds of medical data, extracting information from unstructured textual sources remains a challenging task. Using a dataset of de-identified clinical letters gathered at a memory clinic, we evaluate recurrent neural networks (RNNs) on their ability to predict patients’ diagnoses of ‘Dementia’, ‘Mild Cognitive Impairment’ or ‘Non-impaired’. This classification framework can also have applications in the automatic identification of patients as candidates for clinical trials.After showing that models trained on state-of-the-art sentence-level embeddings outperform both word-level models and a recent benchmark model that fine-tunes a pre-trained general-domain language model, we probe sentence embedding models in order to reveal interpretable insights into the types of sentence-level representations the RNNs build. Specifically, we take a measure of sentence importance with respect to a given class and identify clusters of sentences in the embedding space that correlate strongly with importance scores for each class. Extracting the most frequent phrases within each group of sentence representations shows that the model is sensitive to sentences that cluster around semantic concepts such as a patient’s level of geriatric depression and how independent the patient is in their daily activities.In addition to showing which sentences in a document are most informative about the patient’s condition, our method can identify the types of sentences that can lead the model to make incorrect diagnoses.

AB - Despite advances in the application of deep neural networks to various kinds of medical data, extracting information from unstructured textual sources remains a challenging task. Using a dataset of de-identified clinical letters gathered at a memory clinic, we evaluate recurrent neural networks (RNNs) on their ability to predict patients’ diagnoses of ‘Dementia’, ‘Mild Cognitive Impairment’ or ‘Non-impaired’. This classification framework can also have applications in the automatic identification of patients as candidates for clinical trials.After showing that models trained on state-of-the-art sentence-level embeddings outperform both word-level models and a recent benchmark model that fine-tunes a pre-trained general-domain language model, we probe sentence embedding models in order to reveal interpretable insights into the types of sentence-level representations the RNNs build. Specifically, we take a measure of sentence importance with respect to a given class and identify clusters of sentences in the embedding space that correlate strongly with importance scores for each class. Extracting the most frequent phrases within each group of sentence representations shows that the model is sensitive to sentences that cluster around semantic concepts such as a patient’s level of geriatric depression and how independent the patient is in their daily activities.In addition to showing which sentences in a document are most informative about the patient’s condition, our method can identify the types of sentences that can lead the model to make incorrect diagnoses.

M3 - Abstract

ER -

Ormerod M, Martinez del Rincon J, McGuinness B, Devereux B. Interpreting Clinical Narrative Diagnosis Models with Sentence Importance. 2019. Abstract from International Conference of the Royal Statistical Society (RSS 2019), Belfast, United Kingdom.