Computing Reviews, the leading online review service for computing literature.

Search

Cause Effect Pairs in Machine Learning
Guyon I., Statnikov A., Bakir Batu B., Springer, New York, NY, 2020. 372 pp. Type: Book (978-3-030218-09-6)

Date Reviewed: May 17 2022

From the earliest courses in statistics we are taught that correlation does not imply causation. However, apart from some special cases, it is also generally true that causation implies correlation (or dependency). While the first sentence is a severe warning against erroneous or even dangerous conclusions, the latter sheds light on the possibility of learning causal relations from observational data. The book explores some intriguing paths toward inducing causal relations from data--with special emphasis on bivariate data--on the basis of the experience collected after the 2013 ChaLearn cause-effect pairs challenge. The discovery of cause-effect relations is at the core of many scientific tasks because they are required to explain observed phenomena. In order to discover potential cause-effect relations, carefully designed experiments such as randomized controlled trials are required, which are commonplace in clinical research. Moreover, the recent advent of explainable artificial intelligence (XAI) brought fresh ideas on the development of intelligent systems that are not only capable of making decisions on the basis of observational data, but also focus on providing users with explanations concerning their decisions in order to detect possible erroneous or unethical use of data, as well as to discover new insights into real-world phenomena. Explanations are user-centered representations of causal chains, therefore the problem of discovering causal relations emerges in XAI and widens the scope of causal modeling beyond statistics. While controlled experiments are the best way to discover cause-effect relations, they are not always feasible for disparate reasons, including cost, physical impediments, ethics, and so on. As a result, often only observational data is available, for which dependency relations can be estimated. Yet not everything is lost: careful data analysis, together with some reasonable assumptions, may help to identify possible causal relations that could be used to guide large-scale experiments (when possible) to validate them or not. Needless to say, the problem of discovering causal relations from observational data is overwhelmingly complex; in fact, the book only focuses on pairs of variables for which, even in this simplified setting, several problems (including identifiability, confounding, non-causal dependency, and so on) could arise. The first short chapter of the book introduces the basic ideas of the so-called “cause-effect problem,” that is, the inference of the causal relation between two variables X and Y for which observational data is available. Given a pair of variables, various types of causal explanations may exist, including cases where a hidden confounding variable causing both X and Y interferes with the hypothetical causal relation between the two variables. Throughout the book (with some remarkable exceptions) only two or three cases are considered: either X causes Y, or Y causes X, or a special case where X and Y are unrelated or they are related by an unobserved confounder. (Additional cases where X and Y are both causally related and influenced by a confounder are ignored.) Based on this setting, the first chapter presents some general ideas and gives some hints on possible methods for causal discovery that are detailed in the subsequent chapters. Interestingly, some intuitive notions are explored and cautionary counterexamples are provided to warn the reader not to rely too much on possibly misleading intuitions. Chapter 2 offers an in-depth illustration of evaluation methods of cause-effect pairs by casting the problem of causal relation discovery to a pattern recognition problem. In fact, this approach is the core of the book and can be roughly synthesized by reinterpreting the problem of finding the causal relation between two variables into a supervised classification problem where the dataset is made of several samples, each one representing a joint probability distribution of variable pairs and classified by the causal direction. Supposing that a classification method is available, the problem of evaluating cause-effect pairs arises according to three points of view: that of the algorithm developer, who must face problems of identifiability; that of the practitioner, who wants to use the discovered causal relations for real-world problems; and, finally, that of the benchmark organizers, who need to identify a good collection of datasets (both synthetic and real world). Chapter 3 describes several methods for learning bivariate functional causal models by analyzing methods pertaining to several categories, like methods that assume that the observed variables are causally related through a structural (functional) model belonging to a restricted class of models (for example, linear models with non-Gaussian noise, additive noise models, and so on). These methods are effective if the assumed class is correct: a strong assumption that may pose serious practical limitations. To overcome such limitations, other categories of methods can be adopted, such as non-parametric methods or methods that take advantage of the independence between the cause and the mechanism that produces the effect. However, the more flexible the method, the harder it is to correctly identify the causal direction. The fourth chapter is the core of the book, as it gives an in-depth discussion of an approach for tackling the problem of inducing a causal model as a supervised classification task. The general procedure for carrying out such a task is to first find a proper representation of the joint distributions of the variable pair--reminder that a joint distribution corresponds to a data sample, hence a dataset is a collection of labeled joint distributions--and then a classification algorithm can be run on such a representation of data to induce a classification model that can be used to label a new data distribution, that is, to find the causal direction of a newly observed pair of variables. While the choice of classification problem does not pose a particular problem since standard machine learning algorithms can be used, the main issue is to find an effective representation of data samples that can be based on special-purpose causal features or can be learned automatically through embeddings. The second half of the book is devoted to more specific topics. Chapter 5 analyzes the problem of causal modeling in time series. Time series are a great benefit to causal modeling because the temporal dimension constrains the possible causal relations (being impossible that the future is the cause of the past). Chapter 6 gives an interesting account on extending the existing approaches for discovering more complex causal relations, possibly including confounding variables determining or interfering with the dependency of the observed variable pairs. Chapter 7 gives an account of the cause-effect pair challenge. The remaining short chapters describe specific methods: chapter 8 shows that in the case of cause-effect pairs with additive noise, an asymmetry in models can be exploited to infer the causal direction; chapter 9 presents a machine learning algorithm for predicting the existence of causal links between pairs of variables in a multivariate dataset; chapter 10 focuses on feature extraction for an effective representation of joint distributions to be fed to classification algorithms; chapter 11 presents a method for detecting causality based on information-theoretic features and gradient boosting machines; chapter 12 presents some measures of conditional distributions that can be effective for detecting causal directions; chapter 13 gives an account of the type of features that are most important when detecting causality in the presence of categorical and/or numerical variables; and chapter 14 introduces a feature ranking method that can be used for causal discovery in a multivariate setting. Overall, the book can be recommended for researchers in causal discovery with expertise in either statistics or machine learning. Although the chapters are written by different authors, readers will appreciate the book’s coherent organization, especially in the first part (notwithstanding some overlap among chapters). The book is not a primer in causal discovery and causal modeling; for introductory material, Pearl’s Causality [1] is recommended instead.

Reviewer: Corrado Mencar	Review #: CR147440 (2207-0094)

1)	Pearl, J. Causality: models, reasoning and inference (2nd ed.). Cambridge University Press, Cambridge, UK, 2009.

Learning (I.2.6 )

General (I.0 )

Would you recommend this review?

yes

Other reviews under "Learning":	Date

Learning in parallel networks: simulating learning in a probabilistic system Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article	Nov 1 1985

Macro-operators: a weak method for learning Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article	Feb 1 1986

Inferring (mal) rules from pupils’ protocols Sleeman D. Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings	Dec 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy