Linguistic Big Data Analytics for Combating Cybercrime: Detecting Fraud, Hate Speech, and Online Deception

Achmad Naufal Irsyadi

Authors

Achmad Naufal Irsyadi S3 Pendidikan Bahasa dan Sastra, Universitas Negeri Surabaya (UNESA)

Keywords:

Big Data, Cybercrime, Hate Speech, Linguistic Forensics, Online Deception

Abstract

The rise of cybercrime has created unprecedented challenges for governments, law enforcement, and digital communities. Traditional investigative approaches, limited by scale and speed, struggle to keep pace with the volume and velocity of online communication. In the era of Big Data, linguistic analysis emerges as a powerful tool for combating cybercrime by identifying fraud, hate speech, and online deception. This article explores how natural language processing (NLP), corpus-based methods, and machine learning techniques are applied to massive digital datasets to detect malicious communication patterns. Drawing on large corpora of phishing emails, extremist forums, and social media platforms, the study demonstrates how linguistic fingerprints—such as lexical markers, syntactic anomalies, and discourse structures—reveal deceptive practices and harmful content. Results highlight significant improvements in detection accuracy compared to traditional methods, but also point to challenges related to multilingual data, adversarial obfuscation, and ethical concerns of surveillance. The discussion argues that while Big Data analytics strengthens the fight against cybercrime, it must be guided by ethical safeguards to balance digital security with privacy rights. Ultimately, Big Data-driven forensic linguistics represents both a technological advancement and a societal responsibility in ensuring safer digital environments.

References

Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104(3), 671–732. https://doi.org/10.2139/ssrn.2477899

Bursztyn, L., Ederer, F., Ferman, B., & Yuchtman, N. (2019). Understanding mechanisms underlying peer effects: Evidence from a field experiment on financial decisions. Econometrica, 87(5), 1687–1704. https://doi.org/10.3982/ECTA15734

Chandrasekaran, M., Gupte, S., & Singh, P. (2021). Combating hate speech with Big Data: Advances in computational detection. Journal of Online Trust and Safety, 1(2), 1–18. https://doi.org/10.54501/jots.v1i2.11

Fitzgerald, J., Hancock, J. T., & Markowitz, D. M. (2022). Deception and linguistic style: Using linguistic markers to detect online manipulation. Journal of Language and Social Psychology, 41(2), 145–162. https://doi.org/10.1177/0261927X221087

Holt, T. J., & Bossler, A. M. (2021). The Palgrave handbook of international cybercrime and cyberdeviance. Springer.

Pustejovsky, J., & Stubbs, A. (2012). Natural language annotation for machine learning. O’Reilly Media.

Linguistic Big Data Analytics for Combating Cybercrime: Detecting Fraud, Hate Speech, and Online Deception

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

people

conference info

submission

journal-policy

other

our editorial team

template

reference tool

plagiarism-checker

visitors

Make a Submission

Office: