Publications

Conference paper

Kotonya N, Spooner T, Magazzeni D, Toni Fet al., 2021,

Graph reasoning with context-aware linearization for interpretable fact extraction and verification

, FEVER 2021, Publisher: Association for Computational Linguistics, Pages: 21-30

This paper presents an end-to-end system for fact extraction and verification using textual and tabular evidence, the performance of which we demonstrate on the FEVEROUS dataset. We experiment with both a multi-task learning paradigm to jointly train a graph attention network for both the task of evidence extraction and veracity prediction, as well as a single objective graph model for solely learning veracity prediction and separate evidence extraction. In both instances, we employ a framework for per-cell linearization of tabular evidence, thus allowing us to treat evidence from tables as sequences. The templates we employ for linearizing tables capture the context as well as the content of table data. We furthermore provide a case study to show the interpretability our approach. Our best performing system achieves a FEVEROUS score of 0.23 and 53% label accuracy on the blind test data.

Conference paper

Albini E, Rago A, Baroni P, Toni Fet al., 2021,

Influence-driven explanations for bayesian network classifiers

, PRICAI 2021, Publisher: Springer Verlag, Pages: 88-100, ISSN: 0302-9743

We propose a novel approach to buildinginfluence-driven ex-planations(IDXs) for (discrete) Bayesian network classifiers (BCs). IDXsfeature two main advantages wrt other commonly adopted explanationmethods. First, IDXs may be generated using the (causal) influences between intermediate, in addition to merely input and output, variables within BCs, thus providing adeep, rather than shallow, account of theBCs’ behaviour. Second, IDXs are generated according to a configurable set of properties, specifying which influences between variables count to-wards explanations. Our approach is thusflexible and can be tailored to the requirements of particular contexts or users. Leveraging on this flexibility, we propose novel IDX instances as well as IDX instances cap-turing existing approaches. We demonstrate IDXs’ capability to explainvarious forms of BCs, and assess the advantages of our proposed IDX instances with both theoretical and empirical analyses.

Conference paper

Rago A, Cocarascu O, Bechlivanidis C, Toni Fet al., 2020,

Argumentation as a framework for interactive explanations for recommendations

, KR 2020, 17th International Conference on Principles of Knowledge Representation and Reasoning, Publisher: IJCAI, Pages: 805-815, ISSN: 2334-1033

As AI systems become ever more intertwined in our personallives, the way in which they explain themselves to and inter-act with humans is an increasingly critical research area. Theexplanation of recommendations is, thus a pivotal function-ality in a user’s experience of a recommender system (RS),providing the possibility of enhancing many of its desirablefeatures in addition to itseffectiveness(accuracy wrt users’preferences). For an RS that we prove empirically is effective,we show how argumentative abstractions underpinning rec-ommendations can provide the structural scaffolding for (dif-ferent types of) interactive explanations (IEs), i.e. explana-tions empowering interactions with users. We prove formallythat these IEs empower feedback mechanisms that guaranteethat recommendations will improve with time, hence render-ing the RSscrutable. Finally, we prove experimentally thatthe various forms of IE (tabular, textual and conversational)inducetrustin the recommendations and provide a high de-gree oftransparencyin the RS’s functionality.

Book chapter

Cocarascu O, Cyras K, Rago A, Toni Fet al., 2021,

Mining property-driven graphical explanations for data-centric AI from argumentation frameworks

, Human-Like Machine Intelligence, Pages: 93-113

Cite
Citations: 1

Conference paper

Cyras K, Rago A, Emanuele A, Baroni P, Toni Fet al., 2021,

Argumentative XAI: a survey

, The 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Publisher: International Joint Conferences on Artificial Intelligence, Pages: 4392-4399

Explainable AI (XAI) has been investigated for decades and, together with AI itself, has witnessed unprecedented growth in recent years. Among various approaches to XAI, argumentative models have been advocated in both the AI and social science literature, as their dialectical nature appears to match some basic desirable features of the explanation activity. In this survey we overview XAI approaches built using methods from the field of computational argumentation, leveraging its wide array of reasoning abstractions and explanation delivery methods. We overview the literature focusing on different types of explanation (intrinsic and post-hoc), different models with which argumentation-based explanations are deployed, different forms of delivery, and different argumentation frameworks they use. We also lay out a roadmap for future work.

Conference paper

Zylberajch H, Lertvittayakumjorn P, Toni F, 2021,

HILDIF: interactive debugging of NLI models using influence functions

, 1st Workshop on Interactive Learning for Natural Language Processing (InterNLP), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 1-6

Biases and artifacts in training data can cause unwelcome behavior in text classifiers (such as shallow pattern matching), leading to lack of generalizability. One solution to this problem is to include users in the loop and leverage their feedback to improve models. We propose a novel explanatory debugging pipeline called HILDIF, enabling humans to improve deep text classifiers using influence functions as an explanation method. We experiment on the Natural Language Inference (NLI) task, showing that HILDIF can effectively alleviate artifact problems in fine-tuned BERT models and result in increased model generalizability.

Journal article

Albini E, Baroni P, Rago A, Toni Fet al., 2021,

Interpreting and explaining pagerank through argumentation semantics

, Intelligenza Artificiale, Vol: 15, Pages: 17-34, ISSN: 1724-8035

In this paper we show how re-interpreting PageRank as an argumentation semantics for a bipolar argumentation framework empowers its explainability. After showing that PageRank, naively re-interpreted as an argumentation semantics for support frameworks, fails to satisfy some generally desirable properties, we propose a novel approach able to reconstruct PageRank as a gradual semantics of a suitably defined bipolar argumentation framework, while satisfying these properties. We then show how the theoretical advantages afforded by this approach also enjoy an enhanced explanatory power: we propose several types of argument-based explanations for PageRank, each of which focuses on different aspects of the algorithm and uncovers information useful for the comprehension of its results.

Report

Paulino-Passos G, Toni F, 2021,

Monotonicity and Noise-Tolerance in Case-Based Reasoning with Abstract Argumentation (with Appendix)

Recently, abstract argumentation-based models of case-based reasoning($AA{\text -} CBR$ in short) have been proposed, originally inspired by thelegal domain, but also applicable as classifiers in different scenarios.However, the formal properties of $AA{\text -} CBR$ as a reasoning systemremain largely unexplored. In this paper, we focus on analysing thenon-monotonicity properties of a regular version of $AA{\text -} CBR$ (that wecall $AA{\text -} CBR_{\succeq}$). Specifically, we prove that $AA{\text -}CBR_{\succeq}$ is not cautiously monotonic, a property frequently considereddesirable in the literature. We then define a variation of $AA{\text -}CBR_{\succeq}$ which is cautiously monotonic. Further, we prove that suchvariation is equivalent to using $AA{\text -} CBR_{\succeq}$ with a restrictedcasebase consisting of all "surprising" and "sufficient" cases in the originalcasebase. As a by-product, we prove that this variation of $AA{\text -}CBR_{\succeq}$ is cumulative, rationally monotonic, and empowers a principledtreatment of noise in "incoherent" casebases. Finally, we illustrate $AA{\text-} CBR$ and cautious monotonicity questions on a case study on the U.S. TradeSecrets domain, a legal casebase.

Journal article

Rago A, Cocarascu O, Bechlivanidis C, Lagnado D, Toni Fet al., 2021,

Argumentative explanations for interactive recommendations

, Artificial Intelligence, Vol: 296, Pages: 1-22, ISSN: 0004-3702

A significant challenge for recommender systems (RSs), and in fact for AI systems in general, is the systematic definition of explanations for outputs in such a way that both the explanations and the systems themselves are able to adapt to their human users' needs. In this paper we propose an RS hosting a vast repertoire of explanations, which are customisable to users in their content and format, and thus able to adapt to users' explanatory requirements, while being reasonably effective (proven empirically). Our RS is built on a graphical chassis, allowing the extraction of argumentation scaffolding, from which diverse and varied argumentative explanations for recommendations can be obtained. These recommendations are interactive because they can be questioned by users and they support adaptive feedback mechanisms designed to allow the RS to self-improve (proven theoretically). Finally, we undertake user studies in which we vary the characteristics of the argumentative explanations, showing users' general preferences for more information, but also that their tastes are diverse, thus highlighting the need for our adaptable RS.

Journal article

Cyras K, Oliveira T, Karamlou M, Toni Fet al., 2021,

Assumption-based argumentation with preferences and goals for patient-centric reasoning with interacting clinical guidelines

, Argument and Computation, Vol: 12, Pages: 149-189, ISSN: 1946-2166

A paramount, yet unresolved issue in personalised medicine is that of automated reasoning with clinical guidelines in multimorbidity settings. This entails enabling machines to use computerised generic clinical guideline recommendations and patient-specific information to yield patient-tailored recommendations where interactions arising due to multimorbidities are resolved. This problem is further complicated by patient management desiderata, in particular the need to account for patient-centric goals as well as preferences of various parties involved. We propose to solve this problem of automated reasoning with interacting guideline recommendations in the context of a given patient by means of computational argumentation. In particular, we advance a structured argumentation formalism ABA+G (short for Assumption-Based Argumentation with Preferences (ABA+) and Goals) for integrating and reasoning with information about recommendations, interactions, patient’s state, preferences and prioritised goals. ABA+G combines assumption-based reasoning with preferences and goal-driven selection among reasoning outcomes. Specifically, we assume defeasible applicability of guideline recommendations with the general goal of patient well-being, resolve interactions (conflicts and otherwise undesirable situations) among recommendations based on the state and preferences of the patient, and employ patient-centered goals to suggest interaction-resolving, goal-importance maximising and preference-adhering recommendations. We use a well-established Transition-based Medical Recommendation model for representing guideline recommendations and identifying interactions thereof, and map the components in question, together with the given patient’s state, prioritised goals, and preferences over actions, to ABA+G for automated reasoning. In this, we follow principles of patient management and establish corresponding theoretical properties as well as illustrate our approach in realis

Conference paper

Dejl A, He P, Mangal P, Mohsin H, Surdu B, Voinea E, Albini E, Lertvittayakumjorn P, Rago A, Toni Fet al., 2021,

Argflow: a toolkit for deep argumentative explanations for neural networks

, Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems, Pages: 1761-1763, ISSN: 1558-2914

In recent years, machine learning (ML) models have been successfully applied in a variety of real-world applications. However, theyare often complex and incomprehensible to human users. This candecrease trust in their outputs and render their usage in criticalsettings ethically problematic. As a result, several methods for explaining such ML models have been proposed recently, in particularfor black-box models such as deep neural networks (NNs). Nevertheless, these methods predominantly explain outputs in termsof inputs, disregarding the inner workings of the ML model computing those outputs. We present Argflow, a toolkit enabling thegeneration of a variety of ‘deep’ argumentative explanations (DAXs)for outputs of NNs on classification tasks.

Journal article

Cyras K, Heinrich Q, Toni F, 2021,

Computational complexity of flat and generic assumption-based argumentation, with and without probabilities

, Artificial Intelligence, Vol: 293, Pages: 1-36, ISSN: 0004-3702

Reasoning with probabilistic information has recently attracted considerable attention in argumentation, and formalisms of Probabilistic Abstract Argumentation (PAA), Probabilistic Bipolar Argumentation (PBA) and Probabilistic Structured Argumentation (PSA) have been proposed. These foundational advances have been complemented with investigations on the complexity of some approaches to PAA and PBA, but not to PSA. We study the complexity of an existing form of PSA, namely Probabilistic Assumption-Based Argumentation (PABA), a powerful, implemented formalism which subsumes several forms of PAA and other forms of PSA. Specifically, we establish membership (general upper bounds) and completeness (instantiated lower bounds) of reasoning in PABA for the class FP#P (of functions with a #P-oracle for counting the solutions of an NP problem) with respect to newly introduced probabilistic verification, credulous and sceptical acceptance function problems under several ABA semantics. As a by-product necessary to establish PABA complexity results, we provide a comprehensive picture of the ABA complexity landscape (for both flat and generic, possibly non-flat ABA) for the classical decision problems of verification, existence, credulous and sceptical acceptance under those ABA semantics.

Journal article

Lertvittayakumjorn P, Toni F, 2021,

Explanation-based human debugging of nlp models: a survey

, Transactions of the Association for Computational Linguistics, Vol: 9, Pages: 1508-1528, ISSN: 2307-387X

Debugging a machine learning model is hard since the bug usually involves the training data and the learning process. This becomes even harder for an opaque deep learning model if we have no clue about how the model actually works. In this survey, we review papers that exploit explanations to enable humans to give feedback and debug NLP models. We call this problem explanation-based human debugging (EBHD). In particular, we categorize and discuss existing work along three dimensions of EBHD (the bug context, the workflow, and the experimental setting), compile findings on how EBHD components affect the feedback providers, and highlight open problems that could be future research directions.

Conference paper

Paulino-Passos G, Toni F, 2021,

Monotonicity and Noise-Tolerance in Case-Based Reasoning with Abstract Argumentation

, Pages: 508-518

Conference paper

Lauren S, Belardinelli F, Toni F, 2021,

Aggregating Bipolar Opinions

, 20th International Conference on Autonomous Agents and Multiagent Systems

Cite

Conference paper

Kotonya N, Toni F, 2020,

Explainable Automated Fact-Checking: A Survey

, Barcelona. Spain, 28th International Conference on Computational Linguistics (COLING 2020), Publisher: International Committee on Computational Linguistics, Pages: 5430-5443

A number of exciting advances have been made in automated fact-checkingthanks to increasingly larger datasets and more powerful systems, leading toimprovements in the complexity of claims which can be accurately fact-checked.However, despite these advances, there are still desirable functionalitiesmissing from the fact-checking pipeline. In this survey, we focus on theexplanation functionality -- that is fact-checking systems providing reasonsfor their predictions. We summarize existing methods for explaining thepredictions of fact-checking systems and we explore trends in this topic.Further, we consider what makes for good explanations in this specific domainthrough a comparative analysis of existing fact-checking explanations againstsome desirable properties. Finally, we propose further research directions forgenerating fact-checking explanations, and describe how these may lead toimprovements in the research area.v

Conference paper

Kotonya N, Toni F, 2020,

Explainable Automated Fact-Checking for Public Health Claims

, 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP(1) 2020), Publisher: ACL, Pages: 7740-7754

Fact-checking is the task of verifying the veracity of claims by assessing their assertions against credible evidence. The vast major-ity of fact-checking studies focus exclusively on political claims. Very little research explores fact-checking for other topics, specifically subject matters for which expertise is required. We present the first study of explainable fact-checking for claims which require specific expertise. For our case study we choose the setting of public health. To support this case study we construct a new datasetPUBHEALTHof 11.8K claims accompanied by journalist crafted, gold standard explanations(i.e., judgments) to support the fact-check la-bels for claims1. We explore two tasks: veracity prediction and explanation generation. We also define and evaluate, with humans and computationally, three coherence properties of explanation quality. Our results indicate that,by training on in-domain data, gains can be made in explainable, automated fact-checking for claims which require specific expertise.

Conference paper

Lertvittayakumjorn P, Specia L, Toni F, 2020,

FIND: Human-in-the-loop debugging deep text classifiers

, 2020 Conference on Empirical Methods in Natural Language Processing, Publisher: ACL

Since obtaining a perfect training dataset (i.e., a dataset which is considerably large, unbiased, and well-representative of unseen cases)is hardly possible, many real-world text classifiers are trained on the available, yet imperfect, datasets. These classifiers are thus likely to have undesirable properties. For instance, they may have biases against some sub-populations or may not work effectively in the wild due to overfitting. In this paper, we propose FIND–a framework which enables humans to debug deep learning text classifiers by disabling irrelevant hidden features. Experiments show that by using FIND, humans can improve CNN text classifiers which were trained under different types of imperfect datasets (including datasets with biases and datasets with dissimilar train-test distributions).

Conference paper

Albini E, Baroni P, Rago A, Toni Fet al., 2020,

PageRank as an Argumentation Semantics

, Biennial International Conference on Computational Models of Argument (COMMA), Publisher: IOS PRESS, Pages: 55-66, ISSN: 0922-6389

Conference paper

Cocarascu O, Stylianou A, Cyras K, Toni Fet al., 2020,

Data-empowered argumentation for dialectically explainable predictions

, 24th European Conference on Artificial Intelligence (ECAI 2020), Publisher: IOS Press, Pages: 2449-2456

Today’s AI landscape is permeated by plentiful data anddominated by powerful data-centric methods with the potential toimpact a wide range of human sectors. Yet, in some settings this po-tential is hindered by these data-centric AI methods being mostlyopaque. Considerable efforts are currently being devoted to defin-ing methods for explaining black-box techniques in some settings,while the use of transparent methods is being advocated in others,especially when high-stake decisions are involved, as in healthcareand the practice of law. In this paper we advocate a novel transpar-ent paradigm of Data-Empowered Argumentation (DEAr in short)for dialectically explainable predictions. DEAr relies upon the ex-traction of argumentation debates from data, so that the dialecticaloutcomes of these debates amount to predictions (e.g. classifications)that can be explained dialectically. The argumentation debates con-sist of (data) arguments which may not be linguistic in general butmay nonetheless be deemed to be ‘arguments’ in that they are dialec-tically related, for instance by disagreeing on data labels. We illus-trate and experiment with the DEAr paradigm in three settings, mak-ing use, respectively, of categorical data, (annotated) images and text.We show empirically that DEAr is competitive with another transpar-ent model, namely decision trees (DTs), while also providing natu-rally dialectical explanations.

Journal article

Baroni P, Toni F, Verheij B, 2020,

On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games: 25 years later Foreword

, ARGUMENT & COMPUTATION, Vol: 11, Pages: 1-14, ISSN: 1946-2166

Conference paper

Čyras K, Karamlou A, Lee M, Letsios D, Misener R, Toni Fet al., 2020,

AI-assisted schedule explainer for nurse rostering

, AAMAS, Pages: 2101-2103, ISSN: 1548-8403

We present an argumentation-supported explanation generating system, called Schedule Explainer, that assists with makespan scheduling. Our stand-alone generic tool explains to a lay user why a resource allocation schedule is good or not, and offers actions to improve the schedule given the user's constraints. Schedule Explainer provides actionable textual explanations via an interactive graphical interface. We illustrate our system with a proof-of-concept application tool in a nurse rostering scenario whereby a shift-lead nurse aims to account for unexpected events by rescheduling some patient procedures to nurses and is aided by the system to do so.

Abstract
Cite

Conference paper

Albini E, Rago A, Baroni P, Toni Fet al., 2020,

Relation-Based Counterfactual Explanations for Bayesian Network Classifiers

, The 29th International Joint Conference on Artificial Intelligence (IJCAI 2020)

Cite

Book chapter

Cocarascu O, Toni F, 2020,

Deploying Machine Learning Classifiers for Argumentative Relations “in the Wild”

, Argumentation Library, Pages: 269-285

Argument Mining (AM) aims at automatically identifying arguments and components of arguments in text, as well as at determining the relations between these arguments, on various annotated corpora using machine learning techniques (Lippi & Torroni, 2016).

Abstract
Cite

Conference paper

Jha R, Belardinelli F, Toni F, 2020,

Formal verification of debates in argumentation theory.

, Publisher: ACM, Pages: 940-947

Conference paper

Cocarascu O, Cabrio E, Villata S, Toni Fet al., 2020,

Dataset Independent Baselines for Relation Prediction in Argument Mining.

, Publisher: IOS Press, Pages: 45-52

Conference paper

Lertvittayakumjorn P, Toni F, 2019,

Human-grounded evaluations of explanation methods for text classification

, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Publisher: ACL Anthology, Pages: 5195-5205

Due to the black-box nature of deep learning models, methods for explaining the models’ results are crucial to gain trust from humans and support collaboration between AIsand humans. In this paper, we consider several model-agnostic and model-specific explanation methods for CNNs for text classification and conduct three human-grounded evaluations, focusing on different purposes of explanations: (1) revealing model behavior, (2)justifying model predictions, and (3) helping humans investigate uncertain predictions.The results highlight dissimilar qualities of thevarious explanation methods we consider andshow the degree to which these methods couldserve for each purpose.

Conference paper

Schulz C, Toni F, 2019,

On the responsibility for undecisiveness in preferred and stable labellings in abstract argumentation (extended abstract)

, IJCAI International Joint Conference on Artificial Intelligence, Pages: 6382-6386, ISSN: 1045-0823

© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved. Different semantics of abstract Argumentation Frameworks (AFs) provide different levels of decisiveness for reasoning about the acceptability of conflicting arguments. The stable semantics is useful for applications requiring a high level of decisiveness, as it assigns to each argument the label “accepted” or the label “rejected”. Unfortunately, stable labellings are not guaranteed to exist, thus raising the question as to which parts of AFs are responsible for the non-existence. In this paper, we address this question by investigating a more general question concerning preferred labellings (which may be less decisive than stable labellings but are always guaranteed to exist), namely why a given preferred labelling may not be stable and thus undecided on some arguments. In particular, (1) we give various characterisations of parts of an AF, based on the given preferred labelling, and (2) we show that these parts are indeed responsible for the undecisiveness if the preferred labelling is not stable. We then use these characterisations to explain the non-existence of stable labellings.

Abstract
Cite

Conference paper

Karamlou A, Cyras K, Toni F, 2019,

Deciding the winner of a debate using bipolar argumentation

, International Conference on Autonomous Agents and MultiAgent Systems, Publisher: IFAAMAS / ACM, Pages: 2366-2368, ISSN: 2523-5699

Bipolar Argumentation Frameworks (BAFs) are an important class of argumentation frameworks useful for capturing, reasoning with, and deriving conclusions from debates. They have the potential to make solid contributions to real-world multi-agent systems and human-agent interaction in domains such as legal reasoning, healthcare and politics. Despite this fact, practical systems implementing BAFs are largely lacking. In this demonstration, we provide a software system implementing novel algorithms for calculating extensions (winning sets of arguments) of BAFs. Participants in the demonstration will be able to input their own debates into our system, and watch a graphical representation of the algorithms as they process information and decide which sets of arguments are winners of the debate.

Conference paper

Cocarascu O, Rago A, Toni F, 2019,

Extracting dialogical explanations for review aggregations with argumentative dialogical agents

, International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Publisher: International Foundation for Autonomous Agents and Multiagent Systems

The aggregation of online reviews is fast becoming the chosen method of quality control for users in various domains, from retail to entertainment. Consequently, fair, thorough and explainable aggregation of reviews is increasingly sought-after. We consider the movie review domain, and in particular Rotten Tomatoes' ubiquitous (and arguably over-simplified) aggregation method, the Tomatometer Score (TS). For a movie, this amounts to the percentage of critics giving the movie a positive review. We define a novel form of argumentative dialogical agent (ADA) for explaining the reasoning within the reviews. ADA integrates: 1.) NLP with reviews to extract a Quantitative Bipolar Argumentation Framework (QBAF) for any chosen movie to provide the underlying structure of explanations, and 2.) gradual semantics for QBAFs for deriving a dialectical strength measure for movies, as an alternative to the TS, satisfying desirable properties for obtaining explanations. We evaluate ADA using some prominent NLP methods and gradual semantics for QBAFs. We show that they provide a dialectical strength which is comparable with the TS, while at the same time being able to provide dialogical explanations of why a movie obtained its strength via interactions between the user and ADA.

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Mining property-driven graphical explanations for data-centric AI from argumentation frameworks

Monotonicity and Noise-Tolerance in Case-Based Reasoning with Abstract Argumentation (with Appendix)

Argflow: a toolkit for deep argumentative explanations for neural networks

Aggregating Bipolar Opinions

FIND: Human-in-the-loop debugging deep text classifiers

AI-assisted schedule explainer for nurse rostering

Relation-Based Counterfactual Explanations for Bayesian Network Classifiers

Formal verification of debates in argumentation theory.

Dataset Independent Baselines for Relation Prediction in Argument Mining.

On the responsibility for undecisiveness in preferred and stable labellings in abstract argumentation (extended abstract)

Deciding the winner of a debate using bipolar argumentation

Extracting dialogical explanations for review aggregations with argumentative dialogical agents