Violencia Identificada en el Lenguaje (VIL): Creación de recurso para mensajes violentos

Martínez-Barco, Patricio; Saquete Boró, Estela; Botella, Beatriz; Sepúlveda-Torres, Robiert

Violencia Identificada en el Lenguaje (VIL)Creación de recurso para mensajes violentos

Journal:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2023

Issue: 70

Pages: 187-198

Type: Article

DIALNET GOOGLE SCHOLAR RUA editor

More publications in: Procesamiento del lenguaje natural

Abstract

Society is moving forward full of new and very accessible knowledge, which is published in the virtual world. It is a reality that ICTs have brought many benefits to our lives but we also see how year after year the use of violence on digital platforms increases. Our work focuses on the detection of violent messages in the social network Twitter. Starting from the creation of a fine-grained annotation guide to obtain a corpus of violent messages (VIL) in order to use Machine Learning tools that help us to automatically detect the problem Two language models are trained with this corpus (BETO and RoBERTa base) with which a value of 97.03% and 96.51% is reached in the F1m metric, classifying whether or not a tweet is violent.

Bibliographic References

Alonso, L. y V. J. Vázquez. 2017. Sobre la libertad de expresión y el discurso del odio: Textos críticos. Athenaica ediciones universitarias.
Arcila-Calderón, C., J. J. Amores, P. Sánchez-Holgado, y D. Blanco-Herrero. 2021. Using shallow and deep learning to automatically detect hate motivated by gender and sexual orientation on twitter in spanish. Multimodal technologies and interaction, 5(10):63.
Badjatiya, P., S. Gupta, M. Gupta, y V. Varma. 2017. Deep learning for hate speech detection in tweets. En Proceedings of the 26th international conference on World Wide Web companion, páginas 759–760.
Basile, V., C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. R. Pardo, P. Rosso, y M. Sanguinetti. 2019. Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. En Proceedings of the 13th international workshop on semantic evaluation, páginas 54–63.
Bassignana, E., V. Basile, y V. Patti. 2018. Hurtlex: A multilingual lexicon of words to hurt. En 5th Italian Conference on Computational Linguistics, CLiC-it 2018, volumen 2253, páginas 1–6. CEUR-WS.
Bruns, A. 2019. After the ‘apicalypse’: Social media platforms and their fight against critical scholarly research. Information, Communication & Society, 22(11):1544– 1566.
Burnap, P. y M. L. Williams. 2014. Hate speech, machine classification and statistical modelling of information flows on twitter: Interpretation and communication for policy decision making.
Cañete, J., G. Chaperón, R. Fuentes, y J. Pérez. 2020. Spanish pre-trained bert model and evaluation data. PML4DC at ICLR, 2020.
Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.
Dadvar, M., D. Trieschnigg, R. Ordelman, y F. d. Jong. 2013. Improving cyberbullying detection with user context. En European Conference on Information Retrieval, páginas 693–696. Springer.
del Arco, F. M. P., M. D. Molina-González, L. A. Ureña-López, y M.-T. MartınValdivia. 2022. Integrating implicit and explicit linguistic phenomena via multi-task learning for offensive language detection. Knowledge-Based Systems, 258:109965.
Devlin, J., M.-W. Chang, K. Lee, y K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Fernández, J., F. Llopis, P. Martínez-Barco, Y. Gutiérrez, y A. Dıez. 2017. Analizando opiniones en las redes sociales. Procesamiento del Lenguaje Natural, 58:141–148.
Flores, J. y M. Casal. 2008. Ciberbullying. Guıa rápida para la prevención del acoso por medio de las nuevas tecnologías.
Fortuna, P. y S. Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4):1–30.
Frenda, S., A. T. Cignarella, V. Basile, C. Bosco, V. Patti, y P. Rosso. 2022. The unbearable hurtfulness of sarcasm. Expert Systems with Applications, 193:116398.
Frenda, S., V. Patti, y P. Rosso. 2022. Killing me softly: Creative and cognitive aspects of implicitness in abusive language online. Natural Language Engineering, páginas 1– 22.
Gitari, N. D., Z. Zuping, H. Damien, y J. Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering, 10(4):215–230.
Gutiérrez-Fandiño, A., J. Armengol Estapé, M. P`amies, J. Llop-Palao, J. Silveira-Ocampo, C. P. Carrino, A. Gonzalez-Agirre, C. Armentano-Oller, C. Rodriguez-Penagos, y M. Villegas 2021. Spanish language models. arXiv preprint arXiv:2107.07253.
Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, y V. Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Martins, R., M. Gomes, J. J. Almeida, P. Novais, y P. Henriques. 2018. Hate speech classification in social media using emotional analysis. Proceedings - 2018 Brazilian Conference on Intelligent Systems, BRACIS 2018, páginas 61–66, 12.
Mathew, B., P. Saha, S. M. Yimam, C. Biemann, P. Goyal, y A. Mukherjee. 2021. Hatexplain: A benchmark dataset for explainable hate speech detection. En Proceedings of the AAAI Conference on Artificial Intelligence, volumen 35, páginas 14867–14875.
McMenamin, G. R. 2017. Introducción a la lingüística forense: un libro de curso. Press at California State University, Fresno.
Nielsen, L. B. 2002. Subtle, pervasive, harmful: Racist and sexist remarks in public as hate speech. Journal of Social Issues, 58:265–280, 1.
Nobata, C., J. Tetreault, A. Thomas, Y. Mehdad, y Y. Chang. 2016. Abusive language detection in online user content. En Proceedings of the 25th international conference on world wide web, páginas 145–153.
Olteanu, A., C. Castillo, J. Boy, y K. Varshney. 2018. The effect of extremist violence on hateful speech online. En Proceedings of the international AAAI conference on web and social media, volumen 12.
Ott, B. L. 2017. The age of twitter: Donald j. trump and the politics of debasement. Critical studies in media communication, 34(1):59–68.
Plaza-Del-Arco, F.-M., M. D. MolinaGonzález, L. A. Ureña-López, y M. T. Martın-Valdivia. 2020. Detecting misogyny and xenophobia in spanish tweets using language technologies. ACM Transactions on Internet Technology (TOIT), 20(2):1–19.
Plaza-del Arco, F. M., A. B. P. Portillo, P. L. Úbeda, B. Gil, y M.-T. Martın-Valdivia. 2022. Share: A lexicon of harmful expressions by spanish speakers. En Proceedings of the Thirteenth Language Resources and Evaluation Conference, páginas 1307–1316.
Poletto, F., V. Basile, M. Sanguinetti, C. Bosco, y V. Patti. 2021. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, 55(2):477–523.
Qian, J., M. ElSherief, E. Belding, y W. Y. Wang. 2019. Learning to decipher hate symbols. arXiv preprint arXiv:1904.02418.
Rosenthal, S., P. Atanasova, G. Karadzhov, M. Zampieri, y P. Nakov. 2020. A largescale semi-supervised dataset for offensive language identification. arXiv preprint arXiv:2004.14454.
Salado, M. R. 2022. Análisis ling¨uıstico del discurso de odio en redes sociales. VISUAL REVIEW. International Visual Culture Review/Revista Internacional de Cultura Visual, 9(Monográfico):1–11.
Sánchez-Junquera, J., P. Rosso, M. Montes, B. Chulvi, y others. 2021. Masking and bert-based models for stereotype identication. Procesamiento del Lenguaje Natural, 67:83–94.
Sarkar, D., M. Zampieri, T. Ranasinghe, y A. Ororbia. 2021. Fbert: A neural transformer for identifying offensive content. arXiv preprint arXiv:2109.05074.
Song, B., C. Pan, S. Wang, y Z. Luo. 2021. Deepblueai at semeval-2021 task 7: Detecting and rating humor and offense with stacking diverse language modelbased methods. En Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), páginas 1130–1134.
Sood, S. O., E. F. Churchill, y J. Antin. 2012. Automatic identification of personal insults on social news sites. Journal of the American Society for Information Science and Technology, 63:270–285, 2.
Stenetorp, P., S. Pyysalo, G. Topic, T. Ohta, S. Ananiadou, y J. Tsujii. 2012. Brat: a web-based tool for nlp-assisted text annotation. En Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, páginas 102– 107.
Tiedemann, J. 2012. Parallel data, tools and interfaces in OPUS. En Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), páginas 2214–2218, Istanbul, Turkey, may. European Language Resources Association (ELRA).
WeAreSocial y Hootsuite. 2022. Digital report espaNa 2022: Nueve de cada diez españoles usan las redes sociales y pasan casi dos horas al día en ellas.
Wiegand, M., J. Ruppenhofer, A. Schmidt, y C. Greenberg. 2018. Inducing a lexicon of abusive words–a feature-based approach. En Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), páginas 1046–1056.
Xu, J.-M., K.-S. Jun, X. Zhu, y A. Bellmore. 2012. Learning from bullying traces in social media. En Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, páginas 656–666.

Data source: Dialnet