A fully-fledged approach to the Winograd Schema Challenge: Tackling, utilizing and developing Winograd instances
Abstract
The Winograd Schema Challenge (WSC), a new novel litmus test for machine intelligence,
has been proposed to advance the field of Artificial Intelligence (AI). In the last decade,
the challenge has received considerable interest as a step towards building machines with
commonsense reasoning, humanity’s long-willed target since the late fifties.
The WSC refers to resolving pronouns in carefully structured sentences, where the
information needed to resolve them is not grammatically present. The challenge consists of
pairs of halves (schemas), where each half comprises a sentence, a question referring to an
unresolved pronoun, and two possible pronoun targets (answers). It is believed that tackling
the challenge will advance the field of AI, helping at the same time the research community
to understand human behavior, which relates to the unfolding of the human mechanisms used
when answering such questions. In this regard, each WSC instance should tell us something
about human behavior, which needs to be explained. Although humans have no difficulties
tackling it, such systems’ development seems challenging and troublesome.
This dissertation focuses on methods and tools covering multiple aspects of the WSC.
Given the AI’s tendency to focus on behavior in a purely statistical sense, which can lead to
the development of non-transparent systems (sub-symbolic AI), and that human language
is not based on word patterns, we start by presenting how we developed a commonsense
reasoning system to tackle the WSC. In terms of experimentation, we compare the developed
system with well-known coreference resolvers. The compelling advantage of this transparent
solution is presented through experiments performed on existing WSC schemas developed by
experts in the field. The findings indicate that systems based on classical/symbolic AI must
be a part of the solution toward the endowment of machines with commonsense reasoning.
Additional systems based on both classical AI and machine learning were developed to
answer research questions such as: a) How can we promote the WSC to various academic
disciplines so that they could work on the problem of actually trying to solve the WSC? b)
How can we design systems that automatically differentiate between Winograd instances
according to their perceived human hardness? c) How can we build systems that automatically
build or considerably help humans develop schemas from scratch?
In this regard, we show how we utilized the WSC as a novel form of a completely
automated public Turing test to tell computers and humans apart (CAPTCHA). We expect
that the adoption and use of a WSC-based CAPTCHA will bring forward the WSC to various
academic disciplines to work on the problem of actually trying to solve it, and perhaps, in
the process, help build machines able to reason with commonsense knowledge. Experiments
we undertook show that a WSC-based CAPTCHA is generally faster and easier to solve than,
and equally entertaining as, the most typical existing CAPTCHA tasks.
Based on the fact that this is a challenging task for machines and that future Winograd
challenges should be organized according to how humans tackle them, this dissertation also
shows how we designed multiple approaches that can automatically differentiate between
Winograd instances according to their perceived hardness for humans. According to our
results, the automated approaches’ performance correlates positively with the performance
of humans, suggesting that these kinds of systems could be used as a metric of hardness for
WSC instances.
Finally, given that the schema availability is limited and that the schema development
process is challenging and troublesome, this dissertation shows how we managed to provide
the research community with the necessary tools for designing Winograd schemas from
scratch. The undertaken experiments show the benefits of utilizing our developed systems,
which, among others, can considerably help humans in the schema development task.
The dissertation concludes with the thesis findings, discussing the implications of this
research, accompanied by our thoughts on the missing links required for future progress in
the field.