GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs
The paper “GraphCare” introduces a new framework for creating personalized patient knowledge graphs for clinical predictions. Let’s go through it.
Motivation
There are many sources of external biomedical knowledge, such as knowledge graphs (e.g., UMLS). These connect related entities, such as diagnoses and medications, in a single big network.
However, it is often hard to use these resources to enrich patient data. How can KGs be leveraged to get better patient representations for ML?
Concept-specific KGs
GraphCare first generates concept-specific, small KGs for every condition, procedure, and medication from the EHR data.
For this, they either obtain triplets (head entity — type of relation — tail entity) via prompting an LLM; or extract a k-hop subgraph from UMLS:
- LLM: "Given the concept 'diabetes', return a list of triplets involving the concept and other related concepts."
- UMLS: Find the 'diabetes' node in the graph and extract its neighborhood.
Then, they perform hierarchical agglomerative clustering on nodes and edges based on the cosine similarity between their word embeddings; and map the original subgraphs to the clustered versions.
Patient KGs
Given the concepts that appear in the patient visits, specific KGs are retrieved (e.g., for visit 1: diabetes, tuberculosis, and acetaminophen; and visit 2: toe amputation and hemodialysis).
Then, the concept KGs are combined into a single graph based on intra- and inter- relations (co-occurrence and derived from relations present in the bigger KG, respectively). A new patient node acts as an anchor, connecting everything. This is the final representation used for predictions.
The patient graphs are processed by a novel bi-attention augmented graph neural network (BAT-GNN). One attention focuses on the temporal factor, which enforces smaller weights to nodes from older visits. The other concerns node relevance and is initialized as the cosine similarity between the word embedding of the node and a term relevant to the task (e.g. 'diabetes' and 'mortality', for mortality prediction).
Results
The authors focus on mortality, length of stay, readmission, and drug recommendation predictions. GraphCare + BAT model outperforms existing baselines.
Interpretability
The authors also show how graph patient representations can be useful for interpreting the predictions; in this case, for mortality. Important nodes, such as deadly cancer and respiratory failure, are emphasized with higher importance scores (based on attention).
Conclusion
GraphCare is a novel framework that effectively combines external KGs, based on both LLM innate knowledge and a biomedical KG, for improved patient predictions.