Advances in Computational Linguistics through Big Data: Deep Learning Approaches to Natural Language Processing

Authors

  • M. Noer Fadli Hidayat S3 Teknik Elektro dan Informatika, Universitas Negeri Malang
  • Abu Thalib Universitas Negeri Malang

Keywords:

Big Data, Computational Linguistics, Deep Learning, Natural Language Processing, Neural Networks

Abstract

Computational linguistics has entered a transformative era with the integration of Big Data and deep learning. Traditional approaches to natural language processing (NLP) relied on rule-based systems and limited corpora, often constrained by linguistic coverage and scalability. The advent of Big Data has made it possible to train large-scale neural architectures capable of modeling complex linguistic phenomena across diverse languages and domains. This article examines how Big Data-driven deep learning advances computational linguistics in three key areas: semantic representation, language generation, and cross-linguistic modeling. Using data from large-scale repositories, including multilingual web corpora and open-source datasets, we demonstrate how deep neural networks outperform traditional models in both accuracy and adaptability. The results highlight not only technical progress but also challenges related to interpretability, bias, and ethical implications. We argue that computational linguistics, strengthened by Big Data, is moving beyond descriptive modeling to predictive and generative capabilities that reshape communication technologies, education, and cross-cultural understanding.

References

Baker, P. (2021). Corpus linguistics and big data: Methods, challenges, and applications. Cambridge University Press.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.18653/v1/N19-1423

Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12. https://doi.org/10.1109/MIS.2009.36

McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. Proceedings of ACL 2002, 311–318. https://doi.org/10.3115/1073083.1073135

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS 2017), 5998–6008.

Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3), 55–75. https://doi.org/10.1109/MCI.2018.2840738

Downloads

Published

30-12-2024

How to Cite

M. Noer Fadli Hidayat, & Abu Thalib. (2024). Advances in Computational Linguistics through Big Data: Deep Learning Approaches to Natural Language Processing. Prosiding SENALA (Seminar Nasional Linguistik Indonesia), 1(1), 7–11. Retrieved from https://senala.upnjatim.ac.id/index.php/senala/article/view/3

Similar Articles

You may also start an advanced similarity search for this article.