Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings

García Díaz, José Antonio; Cánovas-García, Mar; Colomo-Palacios, Ricardo; Valencia García, Rafael

Publication:
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings

Files

FGCS_Special_issue_sentiment_2020___Misogyny.pdf(1 MB)

Date

2020-08-22

relationships.isAuthorOfPublication

Person

García Díaz, José Antonio

Person

Valencia García, Rafael

Authors

García Díaz, José Antonio ; Cánovas-García, Mar ; Colomo-Palacios, Ricardo ; Valencia García, Rafael

item.page.secondaryauthor

Facultad de Informática

Publisher

Elsevier

publication.page.department

Informática y Sistemas

DOI

https://doi.org/10.1016/j.future.2020.08.032

item.page.type

info:eu-repo/semantics/article

Abstract

Online social networks allow powerless people to gain enormous amounts of control over particular people's lives and pro t from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great e orts have recently been made to identify misogyny, it is still di cult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not su cient. Moreover, as Spanish is spoken worldwide, context and cultural di erences can complicate this identi cation. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classi ed it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classi cation based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identi cation of misogyny. We have evaluated our proposal with three machine-learning classi ers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results

publication.page.subject

Misogyny detection , Text classification , Natural language processing , Machine-learning

Citation

José Antonio García-Díaz, Mar Cánovas-García, Ricardo Colomo-Palacios, Rafael Valencia-García, Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings, Future Generation Computer Systems, Volume 114, 2021, Pages 506-518, ISSN 0167-739X, https://doi.org/10.1016/j.future.2020.08.032

URI

http://hdl.handle.net/10201/187629

Collections

Artículos

Full item page

Ir a Estadísticas

Este ítem está sujeto a una licencia Creative Commons. http://creativecommons.org/licenses/by-nc-nd/4.0/

Publication:
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication: Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication:
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings