DigitalUM :: Browsing by Subject "Text classification"

Browsing by Subject "Text classification"

Now showing 1 - 5 of 5

Open Access
Compilation and evaluation of the Spanish SatiCorpus 2021 for satire identification using linguistic features and transformers
(Springer , 2021-12-17) García Díaz, José Antonio; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Open Access
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings
(Elsevier, 2020-08-22) García Díaz, José Antonio; Cánovas-García, Mar; Colomo-Palacios, Ricardo; Valencia García, Rafael; Informática y Sistemas; Facultad de Informática
Online social networks allow powerless people to gain enormous amounts of control over particular people's lives and pro t from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great e orts have recently been made to identify misogyny, it is still di cult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not su cient. Moreover, as Spanish is spoken worldwide, context and cultural di erences can complicate this identi cation. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classi ed it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classi cation based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identi cation of misogyny. We have evaluated our proposal with three machine-learning classi ers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results
Open Access
Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers
(Springer, 2023) García Díaz, José Antonio; Jiménez Zafra, Salud María; García Cumbreras, Miguel Ángel; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
The rise of social networks has allowed misogynistic, xenophobic, and homophobic people to spread their hate-speech to intimidate individuals or groups because of their gender, ethnicity or sexual orientation. The consequences of hate-speech are devastating, causing severe depression and even leading people to commit suicide. Hate-speech identification is challenging as the large amount of daily publications makes it impossible to review every comment by hand. Moreover, hate-speech is also spread by hoaxes that requires language and context understanding. With the aim of reducing the number of comments that should be reviewed by experts, or even for the development of autonomous systems, the automatic identification of hate-speech has gained academic relevance. However, the reliability of automatic approaches is still limited specifically in languages other than English, in which some of the state-of-the-art techniques have not been analyzed in detail. In this work, we examine which features are most effective in identifying hate-speech in Spanish and how these features can be combined to develop more accurate systems. In addition, we characterize the language present in each type of hate-speech by means of explainable linguistic features and compare our results with state-of-the-art approaches. Our research indicates that combining linguistic features and transformers by means of knowledge integration outperforms current solutions regarding hate-speech identification in Spanish.
Open Access
Spanish MEACorpus 2023: a multimodal speech–text corpus for emotion analysis in Spanish from natural environments
(Elsevier, 2024-08) Pan, Ronghao; García Díaz, José Antonio; Rodríguez García, Miguel Ángel; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
In human–computer interaction, emotion recognition provides a deeper understanding of the user’s emotions, enabling empathetic and effective responses based on the user’s emotional state. While deep learning models have improved emotion recognition solutions, it is still an active area of research. One important limitation is that most emotion recognition systems use only text as input, ignoring features such as voice intonation. Another limitation is the limited number of datasets available for multimodal emotion recognition. In addition, most published datasets contain emotions that are simulated by professionals and produce limited results in real-world scenarios. In other languages, such as Spanish, hardly any datasets are available. Therefore, our contributions to emotion recognition are as follows. First, we compile and annotate a new corpus for multimodal emotion recognition in Spanish (Spanish MEACorpus 2023), which contains 13.16 h of speech divided into 5129 segments labeled by considering Ekman’s six basic emotions. The dataset is extracted from YouTube videos in natural environments. Second, we explore several deep learning models for emotion recognition using text- and audio-based features. Third, we evaluate different multimodal techniques to build a multimodal recognition system that improves the results of unimodal models, achieving a Macro F1-score of 87.745%, using late fusion with concatenation strategy approach.
Open Access
Spanish MTLHateCorpus 2023: multi-task learning for hate speech detection to identify speech type, target, target group and intensity
(Elsevier, 2025-08) Ronghao Pan; García Díaz, José Antonio; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
The rise of digital communication has exacerbated the challenge of tackling harmful speech online, particularly hate speech, which dehumanises individuals or groups on the basis of traits such as race, gender or ethnicity. This study highlights the urgent need for fine-grained detection methods that take into account several subtasks of hate speech detection, including its intensity, determining the groups to which hate speech is directed, and whether the target is an individual or a group. Furthermore, there is a gap in comprehensive Spanish language corpora that cover these subtasks of hate speech detection. Therefore, we created a novel corpus entitled Spanish MTLHateCorpus 2023 to facilitate the analysis of hate speech in these subtasks and evaluated the effectiveness of the multi-task learning strategy evaluating mBART and T5, comparing its results with other Large Language Models using Zero-Shot Learning as a lower bound and an ensemble based on the mode of several Fine-Tuning as an upper bound. The results achieved by the Multi-Task Learning strategy demonstrated its potential to increase model versatility, allowing a single model to effectively tackle multiple tasks while achieving competitive results, particularly in target group recognition. However, the ensemble learning slightly outperforms the Multi-Task Learning strategy.

Browsing by Subject "Text classification"

Results Per Page

Sort Options