Initial evaluation of the ability to provide health information of Copilot and Gemini Pro: “Two virtual assistants for physicians and nurses”

HAND - HEART - HEAD - HONOR
pdf (Tiếng Việt) Download: 0 View: 0

Working Languages

How to Cite

Truong, V. D., Nguyen, T. M. C., Vu, T. H., Tran, T. N., Tran, T. H. O., Dao, D. M. T., Luong, T. V., Nguyen, N. T., Nguyen, T. X., Ngo, V. D., Mai, X. T., Le, T. L., & Bui, M. N. (2026). Initial evaluation of the ability to provide health information of Copilot and Gemini Pro: “Two virtual assistants for physicians and nurses”. Journal of Nursing Science, 9(03), 185–192. https://doi.org/10.54436/jns.2026.03.1325

Downloads

Download data is not yet available.

Abstract

Objectives: To evaluate the reliability and appropriateness of two AI chatbots, Copilot and Gemini Pro, in providing information regarding symptoms, diagnosis, treatment, care, consultation, and common disease prevention; To analyze factors associated with the evaluation scores of these two chatbots.

Methods: A comparative cross-sectional study was conducted using datasets of 246 health-related questions and 492 responses generated by Copilot and Gemini Pro in January 2026. Each response was independently evaluated by one physician and one specialist nurse (across five disease groups).

Results: Both Gemini Pro and Copilot demonstrated high reliability, with all median scores ≥ 4 and mean scores ranging from 3.9 to 4.7 on a 5-point scale. The rate of satisfactory responses was high, ranging from over 81% for Copilot (evaluated by physicians) to 99.6% for Gemini (evaluated by nurses). The agreement between physicians and nurses was very high for Gemini (Kappa = 0.83) compared to a moderate level for Copilot (Kappa = 0.59). Identified factors indicated that nurses tended to assign higher scores than physicians, and Gemini Pro was rated higher and demonstrated better appropriateness than Copilot.

Conclusion: Copilot and Gemini Pro, demonstrating high reliability, can be utilized as virtual assistants to support healthcare professionals in patient consultation.

https://doi.org/10.54436/jns.2026.03.1325

Keywords

AI trợ lý ảo, Copilot, Gemini Pro, tư vấn sức khỏe AI virtual assistant, Copilot, Gemini Pro, health consultation
pdf (Tiếng Việt) Download: 0 View: 0

References

Colak D, Yakut B, Agin A. Comparison of the accuracy, comprehensiveness, and readability of ChatGPT, Google Gemini, and Microsoft Copilot on dry eye disease. Beyoglu Eye J. 2025;10(3):168-174. doi: 10.14744/bej.2025.76743.

Cook DA. Creating virtual patients using large language models: scalable, global, and low cost. Med Teach. 2025 Jan;47(1):40-42. doi: 10.1080/0142159X.2024.2376879

Li D, Lutfi SL. Large language model–based virtual patient systems for history-taking in medical education: a comprehensive systematic review. JMIR Med Inform. 2026;14:e79039. Published January 2, 2026. doi:10.2196/79039

Ito S, Furukawa E, Okuhara T, Okada H, Kiuchi T. Leveraging artificial intelligence chatbots for anemia prevention: a comparative study of ChatGPT-3.5, Copilot, and Gemini outputs against Google Search results. PEC Innov.2025;6:100390. Published April 1, 2025. doi:10.1016/j.pecinn.2025.100390

Sabaner MC, Yozgat Z. Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on ophthalmology-related questions in the Turkish Medical Specialty Exam. Turk J Ophthalmol. 2025 Aug 21;55(4):177-185. doi: 10.4274/tjo.galenos.2025.27895.

Urda-Cîmpean AE, Leucuta DC, Drugan C, Dutu AG, et al. Assessing the accuracy of diagnostic capabilities of large language models. Diagnostics (Basel). 2025 Jun 29;15(13):1657. doi: 10.3390/diagnostics15131657.

World Health Organization. Ethics and governance of artificial intelligence for health: guidance on large multi-modal models. Geneva, Switzerland: World Health Organization; 2024.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright (c) 2026 Journal of Nursing Science