Close

Presentation

Evaluating Human-Centered AI in Human–AI and Human–Human Teams Using Performance Metrics
DescriptionAs artificial intelligence (AI) systems increasingly operate in high-stakes, team-based environments, such as disaster response and tactical simulations, there is growing interest in evaluating whether AI agents can function effectively as human-centered teammates. This study explores how team performance data can be used to evaluate Human-Centered AI (HCAI) more objectively. First, we identified key objective performance metrics, including task completion rate, efficiency, and risk management, under the IPSO (Input-Process-State-Outcome) teamwork framework to better support phase-specific evaluation of AI’s human-centeredness. Second, we applied these metrics in an experimental Minecraft tower defense task under four conditions: Human-Solo, Human–TaskVoyager (less communicative LLM-based AI agent), Human–TARS (normal communicative LLM-based AI agent), and Human–Human teams. Results show that Human–Human teams outperformed Human–AI teams in task success and efficiency, and both conditions performed better than the Human-Solo condition. These findings suggest that IPSO can be customized to measure various human–AI team effectiveness. Human-Solo performance serves as the benchmark for evaluating AI contributions, offering a grounded baseline of individual capability. In contrast, the Human–Human condition provides a reference point on team-based performance and serves as a reference point for identifying gaps in AI’s socio-collaborative competence.