The Challenges and Opportunities of Explainable AI

One of the most notable themes of NIPS 2017, against a backdrop of spectacular progress in AI on many fronts, was the fear of machine learning systems as black boxes: closed systems that receive an input, produce an output, and offer no clue why. The topic received its own symposium, capped by a fascinating debate — highly recommended viewing — of the proposition “Interpretability is necessary in machine learning.” As a sign of the Zeitgeist, explainability in AI even reached the magazine section of the New York Times just weeks before NIPS.

Explainability is a scientifically fascinating and societally important topic that sits at the intersection of several areas of active research in machine learning and AI:

  • Bias: How can I ensure that my AI system hasn’t learned a biased view of the world (or perhaps an unbiased view of a biased world) based on shortcomings of the training data, model, or objective function? What if its human creators harbor a conscious or unconscious bias? (Kate Crawford’s NIPS Keynote did a great job of presenting the hazards here.)
  • Fairness: If decisions are made based on an AI system, can I verify that they were made fairly? And what does fair mean in this context (fair for whom, e.g.)?
  • Transparency: Do I have a right to have decisions affecting me explained to me in terms I can understand? On what basis can I appeal a decision? (For a great discussion of transparency in intelligent systems, see [Weller 2017].)
  • Safety: Can I gain confidence in the reliability of my AI system without an explanation of how it reaches conclusions? This is closely related to the fundamental problem of generalization in statistical learning theory: how tightly can I bound errors on unseen data?
  • Causality: If I can learn a model from data, can this model provide me not only correct inferences but also some explanation for the underlying phenomena? Can I gain a mechanistic understanding from a learned model?
  • Engineering: How do I debug incorrect outputs from a trained model?

Reflecting on the above, here are four additional thoughts:

1. All of the problems above are hard, but we and the community are hard at work. Intel AI is a member of the recently founded Partnership on AI, which was formed to bring together researchers, developers, and users to ensure that AI technologies work to serve people and society. The partnership’s mission includes addressing challenges and concerns around “the safety and trustworthiness of AI technologies, [and] the fairness and transparency of systems.” On the technical side, we recently presented a paper at NIPS on a novel technique for learning the structure of a deep network from unlabeled data. Techniques such as these can be useful for automatically matching the architecture of a model to the structure of the data, hopefully making it easier to train, interpret, and debug.

2. These issues are not specific to deep learning, or even machines. Deep neural networks have achieved dramatic improvements on a number of challenging tasks, in part due to their enormous expressive power. This power, enabled by large numbers of free parameters and non-linearities, can make it difficult to interpret the learned values of any given parameter, especially in deeper layers. But other classifiers, such as kernel machines, linear or logistic regressions, or decision trees can also become very difficult to interpret for high dimensional inputs [Lipton 2016]. Explainability can also be a challenge for decisions made by human experts. For example, college admissions officers face the enormous challenge of combing through a pool of high-school superstars and selecting the most promising few. Some of these officers may offer general guidelines and explanations for how they decide — but are these how they actually decide? Daniel Kahneman’s Thinking, Fast and Slow is a great introduction to the science of human decision making (which he pioneered, with Amos Tversky), including systematic cognitive biases, sub-conscious forces, and blind spots. In short, we humans also lack explainability!

3. We will need to learn how to deal with AI systems that outperform humans on specific tasks. One key reason for building AI systems is not just to match human performance but to exceed it, especially on tasks, such as predicting disease outbreaks or controlling a data-center cooling system, that may not play to a human’s evolutionary strengths. Indeed, AI systems that outperform humans in specific domains already exist and will become more common. One consequence of an AI system’s superhuman performance may be that there is no explanation for how it works that is easily digestible by a human. Yet there could be compelling social benefits from deploying some tools even before they are completely understood. As an analogy, modern medicine has developed a framework for testing the safety and efficacy of novel treatments, even when the underlying mechanism of action is unknown. We may need similar frameworks for AI systems in critical deployments.

4. There is an opportunity to make our decision-making more systematic and accountable. Many issues around explainability are social policy questions: What are the qualities of decision making that we want? To engineers, these can be translated into system requirements that can be designed, measured, and continuously tested. They will depend on the domain where they are applied — tagging vacation photos has different requirements compared to analyzing medical images. But as we rely more on automated systems for making decisions, we have an unprecedented opportunity to be more explicit and systematic about the values that guide how we decide.