Artificial Intelligence and Information Mining

Resources

Codes

GRETEL(steele): Graph Counterfactual Explanation Evaluation Framework

Machine Learning (ML) systems are a building part of the modern tools that impact our daily life in several application domains. Due to their black-box nature, those systems are hardly adopted in application domains (e.g. health, finance) where understanding the decision process is of paramount importance. Explanation methods were developed to explain how the ML model has taken a specific decision for a given case/instance. Graph Counterfactual Explanations (GCE) is one of the explanation techniques adopted in the Graph Learning domain. The existing works on Graph Counterfactual Explanations diverge mostly in the problem definition, application domain, test data, and evaluation metrics, and most existing works do not compare exhaustively against other counterfactual explanation techniques present in the literature. Here, we release GRETEL [1,2], a unified framework to develop and test GCE methods in several settings. GRETEL [1,2] is an open-source framework for Evaluating Graph Counterfactual Explanation Methods. It is implemented using the Object-Oriented paradigm and the Factory Method design pattern. Our main goal is to create a generic platform that allows the researchers to speed up the process of developing and testing new Graph Counterfactual Explanation Methods. GRETEL is a highly extensible evaluation framework that promotes Open Science and the reproducibility of the evaluation by providing a set of well-defined mechanisms to integrate and manage easily: both real and synthetic datasets, ML models, state-of-the-art explanation techniques, and evaluation measures. Code on Github

Datasets

CLAIRE - COVID19 Task Force Dataset

The initiative joins forces from AI, clinical and life-sciences experts working on the analysis of complex and multi-sourced biomedical data integrating clinical evidence on COVID-19 with genomic and proteomic information, as well as molecular data. We are exploring data-driven AI methodologies and bioinformatics approaches covering network data analysis, machine learning, and deep learning for graphs, predictive modelling, and feature selection of Omics data. Prof. G. Stilo and Dr. L. Madeddu et. al. assembled a resource that fuses information from heterogeneous sources and different studies from the literature into a unique network-based representation, facilitating the use of relational and graph-based learning methods. Dataset on Github

MNIST dataset for Outlier Detection [MNIST4OD]

Prof. G. Stilo and Dr. B. Prekaj present the MNIST4OD dataset. A dataset with a large size (number of dimensions and number of instances) suitable for Outliers Detection task. The dataset is based on the famous MNIST dataset. We build MNIST4OD in the following way: To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits. For implementation simplicity we then flatten the images (28 X 28) into vectors. Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x. The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9). Dataset on Figshare

LOD compliant multi-domain interests dataset [Wiki-MID]

Wiki-MID is a LOD compliant multi-domain interests dataset to train and test Recommender Systems. Our English dataset includes an average of 90 multi-domain preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million Twitter users traced during six months in 2017. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their “topical” friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to categorize preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others. Dataset release