Anotando la confiabilidad para mejorar la tarea de detección de desinformaciónesquema de anotación, recurso y evaluación
ISSN: 1135-5948
Año de publicación: 2023
Número: 70
Páginas: 15-26
Tipo: Artículo
Otras publicaciones en: Procesamiento del lenguaje natural
Resumen
La desinformación es un problema crítico en nuestra sociedad. La pandemia de covid19 y la guerra entre Rusia y Ucrania han sido escenarios clave para la difusión de noticias falsas. Partiendo de la base de que las noticias falsas mezclan información confiable y no confiable, proponemos RUNAS (Reliable and Unreliable Annotation Scheme), un esquema de anotación de grano fino que etiqueta las partes estructurales y los elementos de contenido esenciales de una noticia y permite clasificarlos en Confiable y No confiable. Esta anotación será usada en el entrenamiento de sistemas para la clasificación automática de la confiabilidad de una noticia. Para ello, se construyó el corpus RUN en español y se anotó con RUNAS. Se llevó a cabo un conjunto de experimentos para validar el esquema de anotación. Los experimentos evidencian la validez del esquema de anotación propuesto, obteniendo el mejor F1m 0,948.
Referencias bibliográficas
- Assaf, R. and M. Saheb. 2021. Dataset for arabic fake news. In 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pages 1–4. IEEE.
- Bergmeir, C. and J. M. Benıtez. 2012. On the use of cross-validation for time series predictor evaluation. Information Sciences, 191:192–213, may.
- Canete, J., G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Perez. 2020. Spanish pre-trained bert model and evaluation data. Pml4dc at iclr, 2020:2020.
- Chakma, K. and A. Das. 2018. A 5w1h based annotation scheme for semantic role labeling of english tweets. Computacion y Sistemas, 22(3):747–755.
- Chakma, K., S. D. Swamy, A. Das, and S. Debbarma. 2020. 5w1h-based semantic segmentation of tweets for event detection using bert. In International Conference on Machine Learning, Image Processing, Network Security and Data Sciences, pages 57–72. Springer.
- DeAngelo, T. I. and N. S. Yegiyan. 2019. Looking for efficiency: How online news structure and emotional tone influence processing time and memory. Journalism & Mass Communication Quarterly,96(2):385–405.
- Ferreira, W. and A. Vlachos. 2016. Emergent: a novel data-set for stance classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1163–1168, San Diego, California, June. Association for Computational Linguistics.
- Figueira, A . and L. Oliveira. 2017. The current state of fake news: challenges and opportunities. Procedia Computer Science,121:817–825.
- Giansiracusa, N. 2021. How Algorithms Create and Prevent Fake News. Springer.
- Gruppi, M., B. D. Horne, and S. Adali. 2018. An exploration of unreliable news classification in brazil and the us. arXiv preprint arXiv:1806.02875.
- Hamborg, F., C. Breitinger, M. Schubotz, S. Lachnit, and B. Gipp. 2018. Extraction of main event descriptors from news articles by answering the journalistic five w and one h questions. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pages 339–340.
- Horne, B. and S. Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 759–766.
- Khodra, M. L. 2015. Event extraction on indonesian news article using multiclass categorization. In 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), pages 1–5. IEEE.
- Mottola, S. 2020. Las fake news como fenómeno social. análisis lingu¨ıstico y poder persuasivo de bulos en italiano y espan˜ol. Discurso & Sociedad, (3):683– 706.
- Norambuena, B., M. Horning, and T. Mitra. 2020. Evaluating the inverted pyramid structure through automatic 5w1h extraction and summarization. In Computational Journalism Symposium.
- Paka, W. S., R. Bansal, A. Kaushik, S. Sengupta, and T. Chakraborty. 2021. Crosssean: A cross-stitch semi-supervised neural attention model for covid-19 fake news detection. Applied Soft Computing,107:107393.
- Patwa, P., S. Sharma, S. Pykl, V. Guptha, G. Kumari, M. S. Akhtar, A. Ekbal, A. Das, and T. Chakraborty. 2021. Fighting an infodemic: Covid-19 fake news dataset. In International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, pages 21–29. Springer.
- Perez-Rosas, V., B. Kleinberg, A. Lefevre, and R. Mihalcea. 2017. Automatic detection of fake news. arXiv preprint arXiv:1708.07104.
- Posadas-Durán, J.-P., H. Gomez-Adorno, G. Sidorov, and J. J. M. Escobar. 2019. Detection of fake news in a new corpus for the spanish language. Journal of Intelligent & Fuzzy Systems, 36(5):4869–4876.
- Rashkin, H., E. Choi, J. Y. Jang, S. Volkova, and Y. Choi. 2017. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing, pages 2931–2937.
- Saquete, E., D. Tomas, P. Moreda, P. Martínez Barco, and M. Palomar.2020. Fighting post-truth using natural language processing: A review and open challenges. Expert systems with applications, 141:112943.
- Sepúlveda-Torres, R., E. Saquete Boro, et al.2021. Gplsi team at checkthat! 2021: Finetuning beto and roberta. CEUR. Shahi, G. K., J. M. Struß, and T. Mandl. 2021. Overview of the clef-2021 checkthat! lab task 3 on fake news detection. Working Notes of CLEF.
- Shao, C., G. L. Ciampaglia, O. Varol, A. Flammini, and F. Menczer. 2017. The spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96:104.
- Shu, K., S. Wang, D. Lee, and H. Liu.2020. Mining disinformation and fake news: Concepts, methods, and recent advancements. In Disinformation, Misinformation, and Fake News in Social Media. Springer, pages 1–19.
- Silva, R. M., R. L. Santos, T. A. Almeida, and T. A. Pardo. 2020. Towards automatically filtering fake news in portuguese. Expert Systems with Applications, 146:113199. Stenetorp, P., S. Pyysalo, G. Topic, T. Ohta, S. Ananiadou, and J. Tsujii. 2012. Brat: a web-based tool for nlp-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 102–107.
- Thomson, E. A., P. R. White, and P. Kitley. 2008. “objectivity” and “hard news” reporting across cultures: Comparing the news report in english, french, japanese and indonesian journalism. Journalism studies, 9(2):212–228.
- Vieira, S. M., U. Kaymak, and J. M. Sousa.2010. Cohen’s kappa coefficient as a performance measure for feature selection. In International Conference on Fuzzy Systems, pages 1–8. IEEE.
- Vlachos, A. and S. Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 workshop on language technologies and computational social science, pages 18–22.
- Vosoughi, S., D. Roy, and S. Aral. 2018. The spread of true and false news online. science, 359(6380):1146–1151.
- Wang, W. Y. 2017. ” liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648.
- Wolf, T., L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Brew. 2019. Huggingface’s transformers: State-of-theart natural language processing. ArXiv, abs/1910.03771.
- Zhang, A. X., A. Ranganathan, S. E. Metz, S. Appling, C. M. Sehat, N. Gilmore, N. B. Adams, E. Vincent, J. Lee, M. Robbins, et al. 2018. A structured response to misinformation: Defining and annotating credibility indicators in news articles. In Companion Proceedings of the The Web Conference 2018, pages 603–612.
- Zhou, X. and R. Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1–40.