RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
The paper “RAM-EHR” introduces a new model based on retrieval augmented generation and deep learning for clinical predictions. Let’s go through it.
Motivation
EHRs, short for electronic health records, encompass information from patients’ hospital visits. As clinical concepts go beyond raw codes and lab values, it’s important to augment EHRs with external knowledge for improved predictions. However, this often introduces noise and irrelevant information.
RAM-EHR proposes a new multi-source retrieval augmentation pipeline for EHR-based predictions.
Retrieval Augmentation
First, they transform information from PubMed, Wikipedia, medical ontologies, and other sources into dense vectors. Then, they use RAG to retrieve passages based on their similarity with diagnosis, medication, and procedure codes observed in a patient’s visit.
To avoid long contexts, they summarize the passages with GPT 3.5, including instructions in the prompt relevant to their specific clinical task.
Co-training strategy
Visits capture co-occurrence relations between codes, while summaries encode semantic information. So, the authors use a co-training technique of two models, each focused on one aspect.
For the visits, they use a Hypergraph Transformer, where nodes are medical codes, and hyperedges represent visits. After aggregation via multi-head self-attention, hyperedge features are used for predictions via an MLP.
For the summarized information, they decompose the codes from a patient’s visit into three sets: diagnoses, medications, and procedures. Then, they concatenate the code names and their summaries into documents and use a pre-trained large language model (UMLS-BERT) to obtain a representation of each. Finally, they concatenate the three vectors and predict with an MLP.
The models are regularized through the loss function, which uses KL divergence to encourage similarity between the learned probability distributions. A weighted sum of the predictions from both models is used as the final output.
Results
RAM-EHR outperforms baselines that only use visit information, single external knowledge sources, or clinical notes. They also observe lightweight PLMs can be used for summarization without affecting much the performance, and that knowledge bases are more beneficial than literature sources.
Finally, they do a case study comparing the summarization from RAM-EHR and those coming directly from the LLM, showing the first captures more relevant information.
Conclusion
RAM-EHR is a deep model based on multi-source RAG and medical codes. The authors show that levering external knowledge, along both visit and semantic contexts, is relevant for improved clinical predictions.