Inkonsistensi Bahasa Ujaran Navigasi Google Maps: Benarkah AI Belum Sepenuhnya Memahami Bahasa Manusia?
Keywords:
navigation code, google maps, artificial intelligence, sistem text-to-speechAbstract
Artificial Intelligence (AI) has become essential in digital navigation systems such as Google Maps, whose Indonesian Text-to-Speech (TTS) feature still struggles with mispronouncing abbreviations, acronyms, and proper names, often producing speech that sounds unnatural or confusing. This study aims to identify the linguistic and technical causes of pronunciation errors in Google Maps’ Indonesian TTS system. This study employs a qualitative approach combining content analysis and computational linguistics. Audio samples from Google Maps were analyzed to identify recurring pronunciation issues, followed by a technical examination of phonetic modeling, text-to-phoneme conversion, and text-normalization processes within the TTS system. The results show that inaccuracies arise from three key factors: insufficient contextual recognition of abbreviations, limited lexical databases, and weaknesses in text-normalization mechanisms. These findings suggest that the challenges are not purely technical but also linguistic and sociocultural, reflecting the complexity of Indonesian with its diverse dialects, orthography, and local naming practices. The study recommends improving lexical coverage, developing more context-aware linguistic models, and training AI with voice and text data that better represent Indonesian language variation. Overall, this research contributes to the development of a more accurate, natural, and culturally attuned TTS system while underscoring the importance of integrating AI innovation with linguistic sensitivity.
References
Adriati, Rieke, et al. (2016). “Pengembangan Aplikasi Text-to-Speech Bahasa Indonesia Menggunakan Metode Finite State Automata Berbasis Android”. Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), 15(1), 1—8. https://www.researchgate.net/publication/309625428_Pengembangan_Aplikasi_Text-to-Speech_Bahasa_Indonesia_Menggunakan_Metode_Finite_State_Automata_Berbasis_Android.
Ahmad, Hawraz, et al. (2024). “Planning the Development of Text-To-Speech Synthesis Models and Datasets with Dynamic Deep Learning”. Journal of King Saud University - Computer and Information Sciences, ELSEIVER, 1—18. https://www.sciencedirect.com/science/article/pii/S1319157824002209.
Badawi, A. (2021). “The Effectiveness of Natural Language Processing (NLP) as a Processing Solution and Semantic Improvement”. International Journal of Economic, Technology and Social Sciences, 2(1), 36—44. https://jurnal.ceredindonesia.or.id/index.php/injects/article/view/194
Bekmurodovna, Y. D., dkk. (2024). “Linguistics in the 21st Century: Artificial Intelligence and Natural Language Processing”. Web of Teachers: Inderscience Research, 2(12), 366—370. https://webofjournals.com/index.php/1/article/view/2616
Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5185–5198. https://doi.org/10.18653/v1/2020.acl-main.463
Bolshakov, I. A. dan Alexander, G. (2004). Computational Linguistics Models, Resources, Applications. Mexico: Universidad Nacional Autónoma. https://www.gelbukh.com/clbook/Computational-Linguistics.pdf
Clark, A., Fox., Lappin. (2010). The Handbook of Computational Linguistics and Natural Language Processing. United Kingdom: Blackwell Publishing. http://ndl.ethernet.edu.et/bitstream/123456789/14228/1/28%20pdf.pdf
Deng, Li. (2006). Dynamic Speech Models: Theory, Algorithms, and Applications. United States of America: Morgan & Claypool Publisher. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Deng- Book2006.pdf
Desikan, B. S. (2018). Natural Language Processing and Computational Linguistics: a Practical Guide to Text Analysis with Python, Gensim, spaCy and Keras. Birmigham: Packt Publishing Ltd
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, 4171–4186. https://doi.org/10.48550/arXiv.1810.04805
Dutoit, T. (1997). An Introduction to Text-to-Speech Synthesis. London: Springer.
Gelbukh, Alexander. (2011). Computational Linguistics and Intelligent Text Processing.
Springer-Verlag Berlin Heidelberg. http://ndl.ethernet.edu.et/bitstream/123456789/11965/1/12pdef.pdf
Harris, Z. S. (1954). Distributional structure. Word, 10(2–3),
146–162. https://doi.org/10.1080/00437956.1954.11659520
Kao dan Stephen. (2007). Natural Language Processing and Text Mining. London: Springer.
Lane, H., Cole, dan Hannes. (2018). Natural Language Processing in Action: Understanding, Analyzing, and Generating Text with Python. New York: Manning Publications. https://manning-content.s3.amazonaws.com/download/a/c9fc557-b088- 4f1f-87a8-7ea2e488d262/Lane_NLPiA_MEAP_V10_ch1.pdf
Leech, G. N. (1983). Principles of pragmatics. London: Longman.
Liddy, E. (2001). “Natural Language Processing”. In Encyclopedia of Library and Information Science, 2nd Ed. NY. Marcel Decker, Inc: School of Information Studies (iSchool). https://surface.syr.edu/cgi/viewcontent.cgi?params=/context/istpub/article/1043/&pat h_info=Natural_Language_Processing.pdf
Malar, N., Swetha, dan Yamini. (2020). “Computational Linguistic”. International Journal of Advance Research and Innovative Ideas in Education, 2544—2549.
Mitkov, R. (2003). The Oxford Handbook of Computational Linguistics Second Edition.
Britania Raya: Oxford University Press. https://api.pageplace.de/preview/DT0400.9780191625534_A43561407/preview- 9780191625534_A43561407.pdf
Mey, J. L. (2001). Pragmatics: An introduction (2nd ed.). Oxford: Blackwell.
Nurhadi, A., & Sari, D. R. (2023). Comparative Evaluation of Acronym Interpretation in Multilingual Language Models. Journal of Computational Linguistics and AI Studies, 5(2), 112—128.
Palivela, Hemant. (2021). “Optimization of Paraphrase Generation and Identification Using Language Models in Natural Language Processing”. International Journal of Information Management Data Insights, ELSEIVER, 1—9. https://www.sciencedirect.com/science/article/pii/S2667096821000185
Wang, Xinsheng, et al. (2025). “Spark-TTS: an Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens”. ARXIV: Cornell University. https://arxiv.org/html/2503.01710v1
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Krismonika Khoirunnisa, Sahrul Romadhon, Fakhriyyah Asmay Aidha, Siti Zumrotul Maulida, Silfia Qurrotul A’yun

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







Creative Commons Attribution 4.0 International License