A Comparative Study of AI-Powered Tools for Arabic-English and English-Arabic Translation

Authors

  • Razan R. Khasawneh Amman Arab University
  • Bilal I. Alsharif Al-Zaytoonah University of Jordan

DOI:

https://doi.org/10.17507/jltr.1606.24

Keywords:

Computer Assisted Translation (CAT), Neural Machine Translation (NMT), Large Language Models (LLMs), Bilingual Evaluation Understudy (BLEU), Machine Translation (MT)

Abstract

This study is a quantitative analysis that investigates and compares the quality level of Computer Assisted Translation (CAT) tools, Neural Machine Translation (NMT) systems, and Large Language Models (LLMs) against each other and how well they can perform on translation tasks from English into Arabic and vice versa, in comparison to human translator. By utilizing a Bilingual Evaluation Understudy (BLEU) and comparing seven translation platforms, the study revealed that AI models outperform both CAT tools and NMT systems when translating in both directions. In addition, Gemini scores the highest BLEU score in both directions, surpassing all the other six platforms, while Google Translate scores are the lowest. The study also reveals that these platforms struggle more with English-to-Arabic translation due to the complexity of the Arabic language. Since CAT tools generally have the lowest scores, they assist the human translator rather than replacing them, providing a partially automated translation. Accordingly, AI models can be more helpful and preferable due to their high BLEU score. However, despite often producing accurate translations and conveying the core meaning, MT may still struggle to imitate the stylistic choices and conciseness that characterize a professional human translator at work.

Author Biographies

Razan R. Khasawneh, Amman Arab University

Department of English Language and Translation

Bilal I. Alsharif, Al-Zaytoonah University of Jordan

Department of Applied Linguistics

References

Abu-Elrob, R. A., & Tawalbeh, A. I. (2025). Jordanian Facebookers' attitudes: A speech act analysis. Indonesian Journal of Applied Linguistics, 15(1), 47-58.

Abdulsalami, B. A., & Akinsanya, B. J. (2017). Review of Different Approaches for Machine Translations. International Journal of Mathematics Trends and Technology-IJMTT, 48(3), 197-202.

Abiola, O. B., Adetunmbi, A. O., & Oguntimilehin, A. (2015). Review of the Various Approaches to Text-to-Text Machine Translations. International Journal of Computer Applications, 120(18), 7-12.

Adeoye, O. B. (2012). Web-Based English to Yoruba Noun-Phrases Machine Translation System. A Thesis submitted to the Department of Computer Science, Federal University of Technology, Akure, 2012.

Agrawal, S., Zhou, C., Lewis, M., Zettlemoyer, L., & Ghazvininejad, M. (2022). In-context examples selection for machine translation. arXiv preprint arXiv:2212.02437.

Algobaei, F., Alzain, E., Naji, E., & Nagi, K. A. (2025). Gender Issues between Gemini and ChatGPT: The Case of English-Arabic Translation. World Journal of English Language, 15(1), 9-16.

Alkhawaja, L. (2024). Unveiling the new frontier: ChatGPT-3 powered translation for Arabic-English language pairs. Theory and Practice in Language Studies, 14(2), 347-357.

Al-Kabi, M. N., Hailat, T. M., Al-Shawakfa, E. M., & Alsmadi, I. M. (2013). Evaluating English to Arabic machine translation using BLEU. International Journal of Advanced Computer Science and Applications, 4(1), 66-73.

Al-Salman, S., & Haider, A. S. (2024). Assessing the accuracy of MT and AI tools in translating humanities or social sciences Arabic research titles into English: Evidence from Google Translate, Gemini, and ChatGPT. International Journal of Data and Network Science, 8(4), 2483–2498.

Alsharif, B., & Khasawneh, R. (2025). Beyond the Literal: Machine Translation Performance and Stratgies in Renering Audiovisual Political Idioms. Research Journal in Advanced Humanities, 6(2). https://doi.org/10.58256/4442fg06

Alsharif, B., Khasawneh, R., & Alzghoul, M. (2025). Strategies of Rendering Metaphor from Arabic into English: A Comparative Study of ChatGPT and Matecat. World Journal of English Language, 16(1), 45–51.

Arthur, P., Neubig, G., & Nakamura, S. (2016). Incorporating discrete translation lexicons into neural machine translation. arXiv preprint arXiv:1606.02006.

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Bédard, C. (2000). Mémoire de traduction cherche traducteur de phrases. Traduire, 186(1), 41-49.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610–623).

Bi, X., Chen, D., Chen, G., Chen, S., Dai, D., Deng, C., & Zou, Y. (2024). Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954.

Castilho, S., Doherty, S., Gaspari, F., & Moorkens, J. (2018). Approaches to human and machine translation quality assessment. In Translation quality assessment: From principles to practice, (9–38). Springer.

Chatterjee, N., Johnson, A., & Krishna, M. (2007, March). Some improvements over the BLEU metric for measuring translation quality for Hindi. In 2007 International Conference on Computing: Theory and Applications (ICCTA'07) (pp. 485–490). IEEE.

Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., & Wei, J. (2024). Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70), 1–53.

Dhariya, O., Malviya, S., & Tiwary, U. S. (2017, January). A hybrid approach for Hindi-English machine translation. In 2017 International Conference on Information Networking (ICOIN) (pp. 389-394). IEEE.

Doherty, S. (2016). Translations| The impact of translation technologies on the process and product of translation. International journal of communication, 10, 947–969.

Ferrag, F., & Bentounsi, I. (2024). The Use of Artificial Intelligence in Academic Translation Tasks Case Study of Chat GPT, Claude and Gemini. Ziglobitha, (2), 173–192.

Ghassemiazghandi, M. (2024). An Evaluation of ChatGPT's Translation Accuracy Using BLEU Score. Theory and Practice in Language Studies, 14(4), 985–994.

Graham, Y. (2015, July). Improving evaluation of machine translation quality estimation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1804-1813).

Han, L. (2016). Machine translation evaluation resources and methods: A survey. arXiv preprint arXiv:1605.04515.

Han, L. (2022). An overview on machine translation evaluation. arXiv preprint arXiv:2202.11027.

Hutchins, W. J. (1995). Machine translation: A brief history. In Concise History of the Language Sciences (pp. 431–445). Pergamon.

Joshi, S. (2025). A Comprehensive Review of DeepSeek: Performance, Architecture and Capabilities. Retrieved from: A Comprehensive Review of DeepSeek: Performance, Architecture and Capabilities[v1] | Preprints.org

Khasawneh, R., & Alsharif, B. (2025). Translating Idiomatic Expressions: A Systematic Review. Forum for Linguistic Studies, 7(10). 881-893.

Karpina, O. (2024). Comparative study of modern CAT tools: Smartcat vs Matecat. In Пріоритети германської і романської філології. Волинський національний університет імені Лесі Українки.

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences, 103, 1-13.

Khasawneh, R. R., Moindjie, M. A., & Kasuma, S. A. A. (2025). Diachronic Translation of Figures of Speech in Antara's Mu'allaqa. World Journal of English Language, 15(3), 290-290. https://doi.org/10.5430/wjel.v15n3p290

Kleinman, G. (2023). Demystifying the BLEU Metric: A Comprehensive Guide to Machine Translation Evaluation | traceloop Blog. Demystifying the BLEU Metric: A Comprehensive Guide to Machine Translation Evaluation | Traceloop - LLM Application Observability.

Maučec, M. S., & Donaj, G. (2019). Machine translation and the evaluation of its quality. In Recent trends in computational intelligence. IntechOpen.

Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2025). Large Language Models: A Survey. arXiv preprint arXiv:2402.06196

Moslem, Y., Haque, R., Kelleher, J. D., & Way, A. (2023). Adaptive machine translation with large language models. arXiv preprint arXiv:2301.13294.

Mukhtar, I. A. (2025). Translation and Technology. Transcultural Journal of Humanities and Social Sciences, 6(2), 269–283.

Pan, R., Ibrahimzada, A. R., Krishna, R., Sankar, D., Wassi, L. P., Merler, M., & Jabbarvand, R. (2024, April). Lost in translation: A study of bugs introduced by large language models while translating code. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (pp. 1–13).

Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).

Peng, K., Ding, L., Zhong, Q., Shen, L., Liu, X., Zhang, M., & Tao, D. (2023). Towards making the most of ChatGPT for machine translation. arXiv preprint arXiv:2303.13780.

Poibeau, T. (2017). Machine translation. MIT Press.

Raymond, D. (2024). Top 10 Cons & Disadvantages of Google Translate | Project Managers Blog. 2025. Top 10 Cons & Disadvantages of Google Translate.

Rivera-Trigueros, I. (2022). Machine translation systems and quality assessment: a systematic review. Language Resources and Evaluation, 56(2), 593-619.

Schulman, J., Zoph, B., Kim, C., Hilton, J., Menick, J., Weng, J., & Ryder, N. (2022). ChatGPT: Optimizing Language Models for Dialogue. OpenAI blog, 2(4), 9-27.

Sharma, N., Bhatia, P. G., & Singh, V. G. (2011). English to Hindi statistical machine translation system [Doctoral dissertation]. Thapar University.

Team, G., Anil, R., Borgeaud, S., Alayrac, J. B., Yu, J., Soricut, R., & Blanco, L. (2023). Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.

Wołk, K., & Marasek, K. (2015). Neural-based machine translation for medical text domain. Based on European Medicines agency leaflet texts. Procedia Computer Science, 64, 2-9.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., & Dean, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.

Yang, Z., Liu, F., Yu, Z., Keung, J. W., Li, J., Liu, S., & Li, G. (2024). Exploring and unleashing the power of large language models in automated code translation. Proceedings of the ACM on Software Engineering, 1(FSE), 1585-1608.

Downloads

Published

2025-11-01

Issue

Section

Articles