Publication:
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings

relationships.isAuthorOfPublication
relationships.isSecondaryAuthorOf
relationships.isDirectorOf
Authors
García Díaz, José Antonio ; Cánovas-García, Mar ; Colomo-Palacios, Ricardo ; Valencia García, Rafael
item.page.secondaryauthor
Facultad de Informática
item.page.director
Publisher
Elsevier
publication.page.editor
publication.page.department
DOI
https://doi.org/10.1016/j.future.2020.08.032
item.page.type
info:eu-repo/semantics/article
Description
Abstract
Online social networks allow powerless people to gain enormous amounts of control over particular people's lives and pro t from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great e orts have recently been made to identify misogyny, it is still di cult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not su cient. Moreover, as Spanish is spoken worldwide, context and cultural di erences can complicate this identi cation. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classi ed it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classi cation based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identi cation of misogyny. We have evaluated our proposal with three machine-learning classi ers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results
Citation
José Antonio García-Díaz, Mar Cánovas-García, Ricardo Colomo-Palacios, Rafael Valencia-García, Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings, Future Generation Computer Systems, Volume 114, 2021, Pages 506-518, ISSN 0167-739X, https://doi.org/10.1016/j.future.2020.08.032
item.page.embargo
Collections