Projects

Projects

FaNDaE (Fake News Detection and Explanation using Knowledge Graphs)

Detecting fake news is of utmost importance in today’s social-media-centric society, where people often quickly form shallow or wrong opinions about complex facts. For example, the World Health Organization (WHO) observed that besides the coronavirus pandemics, there is also an infodemic caused by fake coronavirus claims. There is a plethora of domain-agnostic algorithms developed to detect fake news. However, applying these techniques to specific domains like health, where background activable knowledge plays a crucial role, can lead to unsatisfactory results. Besides, in today’s era dominated by black box data- driven AI algorithms, corroborating predictions (i.e., algorithmic verdicts about news) with some form of (visual) explanation becomes crucial. This project aims at developing a fake news detection framework called FaNDaE (Fake News Detection and Explanation using Knowledge Graphs) to tackle these challenges. FaNDaE will be designed as a modular architecture composed of three layers. First, a customizable knowledge substrate combines textual information from news in a target domain with external domain- specific knowledge available from knowledge graphs (KGs). Second, a learning layer builds upon the previous layer and intertwines data-driven AI learning models with knowledge-driven inference rules to make predictions. Third, an explanation layer leverages the previous layers to generate both plausibility scores of the news articles and explanations for the scores. FaNDaE’s data-level explanations can be abstracted into complex patterns that enrich KGs with new domain knowledge. We plan to test FaNDaE in both general and specific domains.

[ Funded by Sapienza University of Rome, 10K euros ]

SaGRL (Semantic-Aware Graph Representation Learning)

Knowledge Graph (KGs) are becoming an essential ingredient as a bridge between logic/symbolic knowledge and machine learning. Graph Neural Networks (GNNs) are the state-of-the-art for KG representation learning. However, we recognize that existing GNN methods primarily leverage the graph structure while the semantics in the KGs is almost discarded; indeed, neighbors are determined solely by relying on topological information, thus overlooking the pre-existing feature-based or schema-based node similarity. Moreover, these strategies mainly adopt a black-box approach and lack the interpretability of model predictions.

The goal of this project is to design, implement, and experimentally evaluate novel learning models for GNNs that, when incorporating the KG semantics, improve the model accuracy, scalability, and the interpretability.

We plan research work along the following lines:

-Semantic-based message passing focused on designing novel notions of node neighbors, and message passing strategies used by GNN to aggregate/refine node representations. Instead of recursively aggregating node neighbors, the idea is to leverage node/edge types, and the KG schema to (pre)build contextualized subgraphs and perform the aggregation over them, thus avoiding expensive recursive procedures. This will also pave the way toward semantic-driven node sampling strategies that will improve the scalability of GNNs.

-Model prediction explanations focused on generating model prediction explanations by intertwing schema information into the learning model to identify data in the KG that were activated during the generation of a model prediction. Explanations can have different levels of granularity, spanning from simple paths or fixed substructures (e.g., triangles) to complex subgraphs, thus allowing to obtain different explanation details and witnessing the formal reasoning process to some extent.

[ Funded by Sapienza University of Rome, ~50K euros ]

Predicting and Explaining Clinical Trial Outcomes

In today's era, where fake medical news (e.g., COVID can be treated by hydroxychloroquine) spread unprecedentedly, it is crucial to stress the importance of clinical research based on well-designed trials. However, conducting such trials is expensive, and their effects can be devastating if not well-designed. The main question we tackle in this project is whether it is possible to make an apriori "in silico" assessment of the feasibility of a clinical trial. The availability of open medical data and the technological advances in AI offer an unprecedented opportunity to tackle this challenge. This project aims at designing an approach to predicting clinical trial outcomes by enriching clinical trial information with background knowledge and applying AI techniques supported by human medical expertise. We foresee three main challenges. The first is about filtering, integrating, and representing both structured and unstructured data about clinical trials coherently and comprehensively. We will tackle this challenge by complementing clinical trial information with background knowledge represented via knowledge graphs. The second challenge is designing a learning solution that fits the richness of the data at hand and can scale to a large amount of data. We will tackle this challenge by developing novel graph neural network learning models combining textual and structured information and novel semantic-driven subgraph sampling techniques. Finally, in a medical context, blindly relying on AI techniques can be problematic; the main reason is that AI techniques lack explanation capabilities indicating, for instance, why a particular prediction was correct or wrong. The third challenge is about interpreting model predictions. We will tackle this challenge by developing an explanation component able to identify the portion of the data and the model involved in a prediction, thus facilitating the collection of medical feedback related to such prediction.

[ Funded by Sapienza University of Rome, 75K euros ]