Paper source: https://www.ncbi.nlm.nih.gov/pubmed/27185194
Deep learning has been applied successfully to several fields such as image retrieval, natural language processing, and speech recognition. However, deep learning has not been effectively used to derive patient representations from aggregated electronic health records (EHRs) to benefit preventive medicine. This paper applies deep learning to a large-scale EHR dataset to extract robust patient descriptors that can be used to predict future patient diseases. This paper was published in Scientific Reports, 2016.
What is electronic health records (EHRs)?
Developing predictive approaches to maintain health and to prevent diseases, disability, and death is one of the primary goals of preventive medicine. EHRs capture and integrate data on all aspects of care over time, with the data being represented according to relevant controlled vocabularies. EHR data comprise various data types, from structured information such as drug prescription data consisting of dates and dosages that are captured through a standardized system, to unstructured data such as clinical narratives that describe the medical reasoning behind the prescription (Figure 1). In this context, information retrieval applied to EHRs has shown great promise in providing search engines that could support physicians in identifying patients at risk of diseases given their clinical status.
Figure 1. Electronic health record content. Figure source: (2012) Nature Reviews Genetics. Mining electronic health records: towards better research applications and clinical care
EHR dataset and data pre-processing
The range of different data types highlights the challenge in EHR integration. To this end, the authors proposed a framework that allows flexible customization in terms of how to process and summarize patient EHRs. Briefly, for each patient in the dataset, they retained some general demographic details (i.e., gender and race) as well as diagnoses, medications, procedures, lab tests, and clinical notes recorded by the split-point. All the clinical features were pre-processed to obtain harmonized codes for procedures and lab tests, normalized medications based on brand name and dosages, and parsed representations of notes summarizing clinically relevant information extracted from the text. Finally, EHR data were grouped to be represented as one vector for every patient (Figure 2).
Figure 2. EHR data pre-processing.
Algorithm framework and performance
The vectors obtained from all the patients are then processed by the unsupervised deep feature learning architecture, which derives a set of high level descriptors through a multi-layer neural network (Figure 3). This type of deep architecture aims to combine the original features into a more compact representation in a hierarchical and non-linear manner. At every layer of the deep network, several overlapping descriptors are joined together to create a higher-level clinical concept (e.g., diseases and medications).
Figure 3. The raw patient representations are converged into a set of general and robust features by unsupervised deep feature learning architecture.
The authors used a stack of denoising autoencoders (SDA) to model EHRs. All the autoencoders in the deep architecture share the same structure. The output of the last layer is the patient representation that can be used to predict future diseases (Figure 4).
Figure 4. Each layer of the deep neural networks is trained to produce a higher-level feature representation.
The accuracy of DeepPatient in predicting the future of patients in temporal windows outperformed alternative feature learning strategies.
Figure 5. The precision results for patient representation with DeepPatient, original descriptors and alternative methods over several time intervals.
Limitations and future perspectives
This article demonstrates the feasibility of applying deep learning to prognosis using their EHRs. In addition to predicting patients’ diseases, HER data may have more potential clinical applications including personalized prescriptions, therapy recommendation, and clinical trial recruitment. There are also some limitations in the current study. For example, the authors used the frequency of lab tests, instead of test results to discover patient patterns. This can be improved by better data collection procedure and more sophisticated pre-processing techniques.
There are quite a few works in recent years on data analysis with EHRs from patients. The usage of the existing large amounts of EHR data is becoming widespread. The key aspect of EHR data mining to the success of medical applications is extracting effective features from patients’ EHRs. To overcome this challenge, deep learning approach is used as a powerful tool to fullfil this aim. Mining of EHRs has the potential for revealing unknown correlations between disease and diagnosis phenotype. In addition to better prediction performance, more efforts should be made for better understanding on the features learned from deep learning model, which can establish new patient-centric principles in medical practice.
Miotto R et al. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Scientific Reports, 2016.
Analyst: Genome Hunter | Localized by Synced Global Team : Xiang Chen