Deep Reinforcement Learning for Interactive Systems
Yang, Grace Hui
Artificial intelligence (AI) aims to build intelligent systems that can interact with and assist humans. During the interaction, a system learns the requirements from the human user and adapts to the needs to complete tasks. A popular type of interactive system is retrieval-based, where the system uses a retrieval function to retrieve relevant answers from a document collection or a knowledge repository. Because of the interactive setting and the goal-oriented objective, reinforcement learning (RL) becomes a trending solution. However, developing RL-based interactive systems is not always successful. Prior methods failed to build representations that provide an entire picture of the task or could not enable the system to control the retrieval results directly. The costly labeling process of interactive data further handicaps the application of RL-based methods. The RL agents trained on limited annotated data may fail to generalize. The evaluation metrics for interactive systems are often unbounded, and the huge variance among search tasks may bias the evaluation.In this dissertation, I formulate the task of building retrieval-based interactive systems as an RL problem and propose a systematic solution for building, training, generalizing, and evaluating RL-based interactive systems. I propose to provide the system with a global representation of the knowledge repository to enable the full exploration in state and action space. I then employ a differentiable retrieval action to allow the system to control the retrieval process effectively. To improve the generalizability, I propose methods that adaptively train the system in randomized environments and generate high-quality, diverse interactions. I also propose a metric normalization schema that effectively improves the fairness of evaluation.The proposed representation shows great improvement compared with other neural methods on ad-hoc retrieval tasks in Text REtrieval Conference (TREC) Web track and LETOR. With the representation and the differentiable retrieval function, the interactive system improves the state-of-the-art performance on TREC Dynamic Domain (DD) track. The proposed adaptive training method enhances the system's generalizability when tested in novel environments on TREC DD. The proposed trajectory diversification method boosts RL systems' performance on the Multiwoz dialogue dataset. The metric normalization schema is adopted by TREC DD and enables a more fair and robust evaluation.
MetadataShow full item record
Showing items related by title, author, creator and subject.
D'Mell, Anila; D'Mell, Anila (2012-05-01)Learning and changing behavior based on feedback, referred to as reinforcement learning, is an important method by which people plan actions in order to maximize reward. This study aimed to examine the effects of social ...