Table 3. Evaluation criteria for assessing generated responses

Evaluation criteria Rating Description
Accuracy 1 Contains multiple major factual errors or the information significantly deviates from established facts
2 Contains noticeable factual inaccuracies or omissions that reduce reliability
3 Mostly accurate but minor imprecisions or missing details may appear
4 Generally factually sound with only minor oversights and main points are reliable and consistent with known facts
5 Highly accurate and factually sound with no or negligible errors and aligns well with established knowledge
Coherence 1 Extremely disjointed or unclear making it very hard to follow the argument or narrative
2 Some sections flow awkwardly or contain logical gaps that disrupt readability
3 Overall coherence is acceptable though occasional abrupt transitions or mild logical gaps may occur
4 The writing is mostly well-organized with sections and paragraphs linking smoothly and minimal logical gaps
5 Very clear and logically consistent throughout and paragraphs and sentences link seamlessly for a highly readable text
Fluency 1 Language use is awkward with frequent grammatical or spelling errors and comprehension is significantly hindered
2 Style or grammar issues occasionally impede reading and some expressions feel unnatural
3 Basic clarity is maintained though some minor awkwardness or errors can appear but do not severely impair understanding
4 The text reads smoothly with few grammatical errors and language style is appropriate and content is easy to understand
5 Demonstrates excellent command of language with near-perfect grammar, style, and fluidity making it effortless to read
Reasoning ability 1 Lacks clear explanations or causal links and conclusions seem unfounded or are drawn abruptly
2 Some rationales are given but key steps in reasoning are missing or not well explained
3 Provides reasonable explanations and causal links but may omit deeper details or skip certain logical steps
4 Offers solid rationales and logical links that explain causes, processes, and outcomes in a coherent manner
5 Thorough structured reasoning with detailed cause-effect analysis and robust argumentation suitable for expert review
Justification 1 Does not offer supporting evidence or references and claims and recommendations appear unsubstantiated
2 References or examples are mentioned but insufficiently support the main arguments
3 Includes general supporting details or references though some may be vague or incomplete
4 Claims and recommendations are consistently backed by relevant evidence, examples, or explanations and overall persuasive
5 Provides robust specific evidence, references, or data that thoroughly validate and strengthen the claims offering high credibility
Medical suitability 1 Contains information that is largely inapplicable or potentially harmful if applied in clinical or educational settings
2 Some parts could be applied but major content does not align well with medical knowledge or requires significant correction
3 Generally usable medical information but some sections require verification or expert supervision for practical use
4 Suitable for clinical and educational contexts with minimal adjustments and overall aligned with professional standards
5 Highly aligned with professional practice and thoroughly appropriate for direct application in clinical or educational settings with little to no modification needed