A fully-fledged approach to the Winograd Schema Challenge: Tackling, utilizing and developing Winograd instances
Isaak, Nicos X.
MetadataShow full item record
The Winograd Schema Challenge (WSC), a new novel litmus test for machine intelligence, has been proposed to advance the field of Artificial Intelligence (AI). In the last decade, the challenge has received considerable interest as a step towards building machines with commonsense reasoning, humanity’s long-willed target since the late fifties. The WSC refers to resolving pronouns in carefully structured sentences, where the information needed to resolve them is not grammatically present. The challenge consists of pairs of halves (schemas), where each half comprises a sentence, a question referring to an unresolved pronoun, and two possible pronoun targets (answers). It is believed that tackling the challenge will advance the field of AI, helping at the same time the research community to understand human behavior, which relates to the unfolding of the human mechanisms used when answering such questions. In this regard, each WSC instance should tell us something about human behavior, which needs to be explained. Although humans have no difficulties tackling it, such systems’ development seems challenging and troublesome. This dissertation focuses on methods and tools covering multiple aspects of the WSC. Given the AI’s tendency to focus on behavior in a purely statistical sense, which can lead to the development of non-transparent systems (sub-symbolic AI), and that human language is not based on word patterns, we start by presenting how we developed a commonsense reasoning system to tackle the WSC. In terms of experimentation, we compare the developed system with well-known coreference resolvers. The compelling advantage of this transparent solution is presented through experiments performed on existing WSC schemas developed by experts in the field. The findings indicate that systems based on classical/symbolic AI must be a part of the solution toward the endowment of machines with commonsense reasoning. Additional systems based on both classical AI and machine learning were developed to answer research questions such as: a) How can we promote the WSC to various academic disciplines so that they could work on the problem of actually trying to solve the WSC? b) How can we design systems that automatically differentiate between Winograd instances according to their perceived human hardness? c) How can we build systems that automatically build or considerably help humans develop schemas from scratch? In this regard, we show how we utilized the WSC as a novel form of a completely automated public Turing test to tell computers and humans apart (CAPTCHA). We expect that the adoption and use of a WSC-based CAPTCHA will bring forward the WSC to various academic disciplines to work on the problem of actually trying to solve it, and perhaps, in the process, help build machines able to reason with commonsense knowledge. Experiments we undertook show that a WSC-based CAPTCHA is generally faster and easier to solve than, and equally entertaining as, the most typical existing CAPTCHA tasks. Based on the fact that this is a challenging task for machines and that future Winograd challenges should be organized according to how humans tackle them, this dissertation also shows how we designed multiple approaches that can automatically differentiate between Winograd instances according to their perceived hardness for humans. According to our results, the automated approaches’ performance correlates positively with the performance of humans, suggesting that these kinds of systems could be used as a metric of hardness for WSC instances. Finally, given that the schema availability is limited and that the schema development process is challenging and troublesome, this dissertation shows how we managed to provide the research community with the necessary tools for designing Winograd schemas from scratch. The undertaken experiments show the benefits of utilizing our developed systems, which, among others, can considerably help humans in the schema development task. The dissertation concludes with the thesis findings, discussing the implications of this research, accompanied by our thoughts on the missing links required for future progress in the field.