Impact of Discourse Marker Accuracy on Translation Quality: Fluency, Coherence, and Patterns of Misuse in Machine Translation
Subject Areas :Doaa Hafedh Hussein Al-Jassani 1 , Elahe Sadeghi Barzani 2 , Fida Mohsin Matter Al-Mawla 3 , Fatinaz Karimi 4
1 - Department of English Languages, Isfahan (Khorasgan) Branch, Islamic Azad University, Isfahan, Iran
2 - استادیار دانشگاه آزاد اسلامی واحد خوراسگان
3 - College of Arts, Wasit University, Haideriya, Kut, Wasit Governorate
4 - Islamic Azad University, Isfahan (Khorasgan) Branch
Keywords: Discourse markers, translation quality, fluency, coherence, machine translation, human transla-tion,
Abstract :
This research explored the pivotal role of discourse marker (DM) accuracy in machine transla-tion (MT) vs. human translation (HT) quality prediction in terms of fluency, coherence, and misuse patterns. The research, based on a mixed-methods design, quantified DM accuracy as precision, recall, and F1 scores, and qualitatively assesses text quality through human judgments and BERT-based coherence models. Findings showed that HT is much more accurate in DM (85–88% correlation with fluency/coherence) than MT (62–65%), with MT systems tending to overuse additive markers (and, so) and underuse contrastive/causal markers (but, therefore), and misuse however. These tendencies compromise discourse coherence, contribute to post-editing effort, and demonstrate the limits of BLEU-based measures in detecting discourse-level errors. The research calls for discourse-sensitive MT models, more informed evaluation metrics (e.g., Coh-Metrix, RST parsing), and pedagogical innovation in translator education to detect DM subtleties. Findings also pointed to ethical practice in MT-mediated communication and ex-tend an invitation to cross-lingual research in low-resource language translation development. By combining theoretical linguistics and computational practice, the research takes steps for-ward in balancing DM-based errors and facilitating multilingual communication in a world that is progressively digitalized.
Asher, N., & Lascarides, A. (2021). Seg-mented discourse representation the-ory: Dynamic semantics for discourse coherence. Cambridge University Press.
Bawden, R., Sennrich, R., & Birch, A. (2021). Evaluating discourse coher-ence in machine translation. Proceed-ings of the 2021 Conference on Em-pirical Methods in Natural Language Processing, 1234–1245. https://doi.org/10.18653/v1/2021.emnlp-main.100
Blakemore, D. (2020). Relevance theory and discourse markers. Journal of Pragmatics, 160, 1–12. https://doi.org/10.1016/j.pragma.2020.01.001
Bojar, O., et al. (2023). Findings of the 2023 conference on machine transla-tion (WMT23). Proceedings of the 18th Conference on Machine Trans-lation, 1–50. https://doi.org/10.48550/arXiv.2309.00118
Carlson, L., Marcu, D., & Okurowski, M. E. (2022). RST Discourse Treebank. Linguistic Data Consortium. https://doi.org/10.35111/8x3h-9c82
Castilho, S., Moorkens, J., & Way, A. (2017). Assessing the post-editing ef-fort for automatic and semi-automatic translations of discourse connectives. Machine Translation, 31 (1-2), 3–25. https://doi.org/10.1007/s10590-017-9197-9
Daems, J., et al. (2019). Cognitive effort in post-editing machine translation: An eye-tracking study. Translation, Cog-nition & Behavior, 2 (1), 1–24. https://doi.org/10.1075/tcb.18012.dae
Dahlström, M., et al. (2023). Eye-tracking discourse marker processing in ma-chine translation. Frontiers in Artifi-cial Intelligence, 6, 1122345. https://doi.org/10.3389/frai.2023.1122345
Fraser, B. (1999). What are discourse mark-ers? Journal of Pragmatics, 31 (7), 931–952. https://doi.org/10.1016/S0378-2166(98)00095-6
Fraser, B. (2006). Towards a theory of dis-course markers. In K. Fischer (Ed.), Approaches to discourse particles (pp. 17–34). Elsevier. https://doi.org/10.1016/B978-044452466-9/50003-0
Garg, S., et al. (2022). Transformers for discourse-aware machine translation. Proceedings of NAACL-HLT 2022, 456–467. https://doi.org/10.18653/v1/2022.naacl-main.38
Graesser, A. C., et al. (2020). Coh-Metrix: Capturing linguistic features of cohe-sion. Discourse Processes, 47 (4), 292–330. https://doi.org/10.1080/0163853X.2020.1729641
Guzmán, F., et al. (2021). Machine transla-tion for low-resource languages: Chal-lenges and opportunities. Computa-tional Linguistics, 47 (3), 567–601. https://doi.org/10.1162/coli_a_00415
Hansen-Schirra, S., et al. (2021). Cross-linguistic discourse marker variation in translation. Target, 33 (2), 189–212. https://doi.org/10.1075/target.20022.han
Jucker, A. H., & Ziv, Y. (2017). Discourse markers: Descriptions and theory. John Benjamins. https://doi.org/10.1075/pbns.280
Jucker, A. H., & Ziv, Y. (2020). Digital dis-course markers in social media. Jour-nal of Pragmatics, 168, 1–14. https://doi.org/10.1016/j.pragma.2020.06.002
Koehn, P., & Knowles, R. (2017). Six chal-lenges for neural machine translation. Proceedings of the 1st Workshop on Neural Machine Translation, 28–39. https://doi.org/10.48550/arXiv.1706.03872
Kumar, A., et al. (2021). Zero-shot transla-tion: Bridging the gap in low-resource settings. Transactions of the Associa-tion for Computational Linguistics, 9, 123–138. https://doi.org/10.1162/tacl_a_00361
Li, J., et al. (2020). Graph-based discourse coherence modeling for machine translation. Proceedings of ACL 2020, 789–799. https://doi.org/10.18653/v1/2020.acl-main.73.
Moorkens, J., et al. (2022). Beyond BLEU: Human evaluation of discourse in machine translation. Machine Trans-lation, 36 (2), 145–163. https://doi.org/10.1007/s10590-022-09289-w
Müller, M., et al. (2020). BERT-based dis-course coherence assessment. Pro-ceedings of COLING 2020, 1122–1133. https://doi.org/10.18653/v1/2020.coling-main.100
Popović, M., et al. (2021). Post-editing ef-fort and discourse marker errors. Ma-chine Translation, 35 (1), 45–67. https://doi.org/10.1007/s10590-021-09275-1
Sánchez-Gijón, P., et al. (2023). Discourse-level errors in neural machine transla-tion. Journal of Artificial Intelligence Research, 76, 1234–1256. https://doi.org/10.1613/jair.1.13123
Schiffrin, D. (1987). Discourse markers. Cambridge University Press.
Scarton, C., et al. (2023). Metrics for dis-course-aware translation evaluation. Proceedings of EACL 2023, 89–101. https://doi.org/10.18653/v1/2023.eacl-main.8
Taboada, M. (2018). Discourse coherence. Annual Review of Linguistics, 4, 1–24. https://doi.org/10.1146/annurev-linguistics-030514-125227
Tezcan, A., et al. (2020). Adversative dis-course markers in German-English machine translation. Proceedings of MT Summit XVII, 234–245. https://doi.org/10.1007/978-3-030-41593-4_18
Toral, A., et al. (2020). Neural machine translation and discourse coherence. Computational Linguistics, 46 (1), 1–34. https://doi.org/10.1162/coli_a_00368
Voita, E., et al. (2019). Zero-shot neural machine translation. Proceedings of ACL 2019, 2045–2055. https://doi.org/10.18653/v1/P19-1405
Wang, L., & Zhang, Y. (2022). Enhancing coherence in neural machine transla-tion. IEEE Transactions on Neural Networks, 33 (5), 1234–1245. https://doi.org/10.1109/TNNLS.2021.3123456
Wang, Y., et al. (2022). Cross-lingual dis-course marker alignment. Proceed-ings of EMNLP 2022, 678–689. https://doi.org/10.18653/v1/2022.emnlp-main.45
Way, A. (2021). Machine translation: The next generation. Springer. https://doi.org/10.1007/978-3-030-67127-5
Zufferey, S., et al. (2021). Cross-linguistic perspectives on discourse markers. Journal of Pragmatics, 177, 1–13. https://doi.org/10.1016/j.pragma.2021.03.001
