Linguistic Big Data Analytics for Combating Cybercrime: Detecting Fraud, Hate Speech, and Online Deception
Keywords:
Big Data, Cybercrime, Hate Speech, Linguistic Forensics, Online DeceptionAbstract
The rise of cybercrime has created unprecedented challenges for governments, law enforcement, and digital communities. Traditional investigative approaches, limited by scale and speed, struggle to keep pace with the volume and velocity of online communication. In the era of Big Data, linguistic analysis emerges as a powerful tool for combating cybercrime by identifying fraud, hate speech, and online deception. This article explores how natural language processing (NLP), corpus-based methods, and machine learning techniques are applied to massive digital datasets to detect malicious communication patterns. Drawing on large corpora of phishing emails, extremist forums, and social media platforms, the study demonstrates how linguistic fingerprints—such as lexical markers, syntactic anomalies, and discourse structures—reveal deceptive practices and harmful content. Results highlight significant improvements in detection accuracy compared to traditional methods, but also point to challenges related to multilingual data, adversarial obfuscation, and ethical concerns of surveillance. The discussion argues that while Big Data analytics strengthens the fight against cybercrime, it must be guided by ethical safeguards to balance digital security with privacy rights. Ultimately, Big Data-driven forensic linguistics represents both a technological advancement and a societal responsibility in ensuring safer digital environments.
References
Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104(3), 671–732. https://doi.org/10.2139/ssrn.2477899
Bursztyn, L., Ederer, F., Ferman, B., & Yuchtman, N. (2019). Understanding mechanisms underlying peer effects: Evidence from a field experiment on financial decisions. Econometrica, 87(5), 1687–1704. https://doi.org/10.3982/ECTA15734
Chandrasekaran, M., Gupte, S., & Singh, P. (2021). Combating hate speech with Big Data: Advances in computational detection. Journal of Online Trust and Safety, 1(2), 1–18. https://doi.org/10.54501/jots.v1i2.11
Fitzgerald, J., Hancock, J. T., & Markowitz, D. M. (2022). Deception and linguistic style: Using linguistic markers to detect online manipulation. Journal of Language and Social Psychology, 41(2), 145–162. https://doi.org/10.1177/0261927X221087
Holt, T. J., & Bossler, A. M. (2021). The Palgrave handbook of international cybercrime and cyberdeviance. Springer.
Pustejovsky, J., & Stubbs, A. (2012). Natural language annotation for machine learning. O’Reilly Media.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Achmad Naufal Irsyadi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.