Correct Reasoning in Hybrid Intelligence
This project is defined within the scope of the Hybrid Intelligence: Augmenting Human Intellect project. It addresses the Trust and Trustworthiness challenge, the Hybrid Law use case and the NLP4HI SIG. The project is a collaboration between partners from Utrecht University and the University of Groningen.
What is our aim?Sound reasoning in state-of-the-art machine learning, particularly in generative large language models, is a timely and open issue. While their performance is often good, they are prone to hallucination and they generally perform poorly in new, unseen reasoning tasks. These systems, moreover, act as black boxes making it impossible to analyze their internal structures. It is also difficult to evaluate their sound reasoning capabilities using static benchmarks, as state-of-the-art commercially available models are retrained regularly on new data, including these new, static benchmarks. By creating dynamically generated benchmarks of scaling complexity, we aim to evaluate LLMs quantitatively.
In previous research (HI project 1.02 - Aligning learning and reasoning systems for responsible HI), a neuro-symbolic system was designed that can solve legal textual entailment tasks as part of the international COLIEE competition on legal information extraction and entailment. In this project, we aim to extend the system to make it work as an assisting tool, which can be used by a human in a hybrid intelligent system, by building on elicited and artificially generated knowledge structures.
Why is this important?The project is primarily concerned with correct reasoning in data-driven approaches, which is essential for responsible hybrid intelligence. Collaboration with domain experts will not only ensure a hybrid approach during the design phase but also give us insight into how we can create tailored systems that can collaborate with the human end-users. In the neuro-symbolic experiments, explainability will play a key role, as we try to unravel how data-driven approaches reason and create neuro-symbolic systems that are explainable and adaptive by design. In the potential user studies, the user and the AI will form a hybrid intelligent system, to augment intelligence through synergy.
How will we approach this?To create benchmarks for evaluating the reasoning capabilities of LLMs, we will use argumentation frameworks that represent legal reasoning tasks. These representations are to be generated dynamically and with scaling complexity. They are then translated into natural language, which will act as the input query to the large language models that we evaluate.
The neuro-symbolic legal textual entailment system will be further explored by improving the NLP aspects, collaborating with legal experts to craft more knowledge representations of the law, and further investigating the idea of artificially generating knowledge representations using generative AI.