Conversational AI 15-LABORATORY STUDIES VS. EVALUATIONS IN THE WILD
For laboratory evaluations users are recruited from available resources, for example, in the case of academic research laboratories the users may be college students who have volunteered to take part, while in industrial laboratories there may be a panel of users who have agreed to take part in tests and evaluations. Users interact with the dialogue system in pre-defined scenarios and complete a questionnaire at the end of the session. In this way the evaluation is tightly controlled and a range of different scenarios can be investigated, ensuring within-test and between-test reliability and allowing more extensive data and feedback to be collected. However, a problem with evaluations in the laboratory is that they may not reflect real-life usage so that the validity of the measurements is negatively affected. Evaluations in the wild involve recording users interacting with a real dialogue system to accomplish a real task.
User simulators are one way to address the issue of the costs of recruiting real users and conducting user studies. The idea of a user simulator is that it should interact with a dialogue system as if it were a real user. User simulators enable the collection of large amounts of training data, which is a necessary requirement for statistical, data-driven dialogue systems. They also support the exploration of dialogue strategies that may be difficult to obtain using real users and that may not be represented in existing dialogue corpora.
Crowdsourcing allows developers to devise clearly defined tasks and to recruit users who fit a specified profile.