BatchEval: Towards Human-like Text Evaluation
BatchEval: Towards Human-like Text Evaluation
Abstract
The paper introduces “BatchEval,” a new paradigm for text evaluation that conducts batch-wise evaluation iteratively to address limitations of sample-wise evaluation methods. The proposed approach aims to alleviate sensitivity to prompt design, poor resistance to noise, and inferior ensemble performance by incorporating batch-wise evaluation akin to the way humans assess text. The paper presents comprehensive experiments demonstrating that BatchEval outperforms state-of-the-art methods by 10.5% on Pearson correlations with a lower API cost.
Introduction
The paper outlines the significance of accurate text evaluation in the context of rapid progress in large language models (LLMs) and highlights the limitations of existing automatic evaluation methods in aligning with human judgments.
Background
The paper provides an overview of existing automatic text evaluation methods, including rule-based, embedding-based
Appendix
Date Generated | 2024-01-02 |
HTML | https://browse.arxiv.org/html/2401.00437v1 |
Truncated | True |
Word Count | 15893 |