García Díaz, José Antonio

Person:
García Díaz, José Antonio

Name

García Díaz, José Antonio

publication.page.department

Universidad de Murcia. Departamento de Informática y Sistemas

Full item page

Search Results

Now showing 1 - 10 of 14

Open Access
Evaluation of transformer models for financial targeted sentiment analysis in Spanish
(PeerJ, 2023-05-09) Pan, Ronghao; García Díaz, José Antonio; García Sánchez, Francisco; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Open Access
UMUCorpusClassifier: compilation and evaluation of linguistic corpus for Natural Language Processing tasks
(Sociedad Española de Procesamiento del Lenguaje Natural, 2020) Almela, Ángela; García Díaz, José Antonio; Alcaraz Marmol, Gema; Valencia García, Rafael; Filología Inglesa
The development of an annotated corpus is a very time-consuming task. Although some researchers have proposed the automatic annotation of a corpus based on ad-hoc heuristics, valid hypotheses cannot always be made. Even when the annotation process is performed by human annotators, the quality of the corpus is heavily in uenced by disagreements between annotators or with themselves. Therefore, the lack of supervision of the annotation process can lead to poor quality corpus. In this work, we propose a demonstration of UMUCorpusClassi er, a NLP tool for aid researches for compiling corpus as well as coordinating and supervising the annotation process. This tool eases the daily supervision process and permits to detect deviations and inconsistencies during early stages of the annotation process.
Open Access
Hope speech detection in Spanish. The LGBT case
(Springer, 2023-03-17) García‑Baena, Daniel; García‑Cumbreras, Miguel Ángel; Jiménez‑Zafra, Salud María; García Díaz, José Antonio; Valencia García, Rafael; Informática y Sistemas; Facultad de Informática
In recent years, systems have been developed to monitor online content and remove abusive, offensive or hateful content. Comments in online social media have been analyzed to find and stop the spread of negativity using methods such as hate speech detection, identification of offensive language or detection of abusive language. We define hope speech as the type of speech that is able to relax a hostile environment and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression. Detecting it automatically, in order to give greater diffusion to positive comments, can have a very significant effect when it comes to fighting against sexual or racial discrimination or when we intend to foster less bellicose environments. In this article we perform a complete study on hope speech in Spanish, analyzing existing solutions and available resources. In addition, we have generated a quality resource, a new Twitter dataset on LGBT community, and we have conducted some experiments that can serve as a baseline for further research.
Open Access
Psychographic traits identification based on political ideology: an author analysis study on Spanish politicians’ tweets posted in 2020
(Elsevier, 2022-05) García Díaz, José Antonio; Colomo Palacios, Ricardo; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
In general, people are usually more reluctant to follow advice and directions from politicians who do not have their ideology. In extreme cases, people can be heavily biased in favour of a political party at the same time that they are in sharp disagreement with others, which may lead to irrational decision making and can put people’s lives at risk by ignoring certain recommendations from the authorities. Therefore, considering political ideology as a psychographic trait can improve political micro-targeting by helping public authorities and local governments to adopt better communication policies during crises. In this work, we explore the reliability of determining psychographic traits concerning political ideology. Our contribution is twofold. On the one hand, we release the PoliCorpus-2020, a dataset composed by Spanish politicians’ tweets posted in 2020. On the other hand, we conduct two authorship analysis tasks with the aforementioned dataset: an author profiling task to extract demographic and psychographic traits, and an authorship attribution task to determine the author of an anonymous text in the political domain. Both experiments are evaluated with several neural network architectures grounded on explainable linguistic features, statistical features, and state-of-the-art transformers. In addition, we test whether the neural network models can be transferred to detect the political ideology of citizens. Our results indicate that the linguistic features are good indicators for identifying fine-grained political affiliation, they boost the performance of neural network models when combined with embedding-based features, and they preserve relevant information when the models are tested with ordinary citizens. Besides, we found that lexical and morphosyntactic features are more effective on author profiling, whereas stylometric features are more effective in authorship attribution.
Open Access
Smart analysis of economics sentiment in Spanish based on linguistic features and transformers
(IEEE, 2023-02-10) García Díaz, José Antonio; García-Sánchez, Francisco ; Valencia García, Rafael; Informática y Sistemas; Facultad de Informática
Texts related to economics and finances are characterized by the use of words and expressions whose meaning (and the sentiments they convey) substantially depend on the context. This poses a major challenge to Natural Language Processing tasks in general, and Sentiment Analysis in particular. For lowresource languages such as Spanish, this situation becomes even more acute. Yet, the latest advancements in the field, including word embeddings and transformers, have allowed to boost the performance of Sentiment Analysis solutions. In this work we explore the impact of the combination of different feature sets in the accuracy of Sentiment Analysis in Spanish financial texts. For this, a corpus with 15,915 tweets has been compiled and manually annotated as either positive, negative, or neutral. Then, feature sets based on contextual and non-contextual embeddings along with linguistic features were evaluated both individually and combined. The best results, with a weighted F1-score of 73.15880%, were obtained with a combination of feature sets by means of knowledge integration
Open Access
Fine grain emotion analysis in Spanish using linguistic features and transformers
(PeerJ, 2024-04-30) Salmerón Ríos, Alejandro; García Díaz, José Antonio; Pan, Ronghao; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Mental health issues are a global concern, with a particular focus on the rise of depression. Depression affects millions of people worldwide and is a leading cause of suicide, particularly among young people. Recent surveys indicate an increase in cases of depression during the COVID-19 pandemic, which affected approximately 5.4% of the population in Spain in 2020. Social media platforms such as X (formerly Twitter) have become important hubs for health information as more people turn to these platforms to share their struggles and seek emotional support. Researchers have discovered a link between emotions and mental illnesses such as depression. This correlation provides a valuable opportunity for automated analysis of social media data to detect changes in mental health status that might otherwise go unnoticed, thus preventing more serious health consequences. Therefore, this research explores the field of emotion analysis in Spanish towards mental disorders. There are two contributions in this area. On the one hand, the compilation, translation, evaluation and correction of a novel dataset composed of a mixture of other existing datasets in the bibliography. This dataset compares a total of 16 emotions, with an emphasis on negative emotions. On the other hand, the in-depth evaluation of this novel dataset with several state-ofthe- art transformers based on encoder-only and encoder-decoder architectures. The analysis compromises monolingual, multilingual and distilled models as well as feature integration techniques. The best results are obtained with the encoder-only MarIA model, with a macro-average F1 score of 60.4771%.
Open Access
Compilation and evaluation of the Spanish SatiCorpus 2021 for satire identification using linguistic features and transformers
(Springer , 2021-12-17) García Díaz, José Antonio; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Open Access
Overview of FinancES 2023: Financial Targeted Sentiment Analysis in Spanish
(Sociedad Española de Procesamiento del Lenguaje Natural, 2023-09) Almela, Ángela; García Díaz, José Antonio; García Sánchez, Francisco; Alcaraz Mármol, Gema; Marín Pérez, María José; Valencia García, Rafael; Filología Inglesa
This paper presents the FinancES 2023 shared task, organized in the IberLEF 2023 workshop, within the framework of the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023). The aim of this task is to extend the challenge of sentiment analysis in Spanish to the financial domain, in order to extract the sentiment that a piece of financial information can have for several actors, including the main economic target (i.e., the specific company or asset where the economic fact applies), other companies (i.e., the entities producing the goods and services that others consume) and consumers (i.e., households/individuals). Specifically, two tasks are proposed and evaluated separately. One to identify the main target and to determine the sentiment polarity towards such target, and a second task to assess the sentiment towards both other companies and consumers. The ranking includes results for 10 different teams proposing novel approaches, mostly based on Transformers and generative language models.
Open Access
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings
(Elsevier, 2020-08-22) García Díaz, José Antonio; Cánovas-García, Mar; Colomo-Palacios, Ricardo; Valencia García, Rafael; Informática y Sistemas; Facultad de Informática
Online social networks allow powerless people to gain enormous amounts of control over particular people's lives and pro t from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great e orts have recently been made to identify misogyny, it is still di cult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not su cient. Moreover, as Spanish is spoken worldwide, context and cultural di erences can complicate this identi cation. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classi ed it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classi cation based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identi cation of misogyny. We have evaluated our proposal with three machine-learning classi ers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results
Open Access
Ontology-driven aspect-based sentiment analysis classification: an infodemiological case study regarding infectious diseases in Latin America
(Elsevier, 2020-06-14) García Díaz, José Antonio; Cánovas García, Mar; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Infodemiology is the process of mining unstructured and textual data so as to provide public health officials and policymakers with valuable information regarding public health. The appearance of this new data source, which was previously unimaginable, has opened up a new way in which to improve public health systems, resulting in better communication policies and better detection systems. However, the unstructured nature of the Internet, along with the complexity of the infectious disease domain, prevents the information extracted from being easily understood. Moreover, when dealing with languages other than English, for which some of the most common Natural Language Processing resources are not available, the correct exploitation of this data becomes even more difficult. We intend to fill these gaps proposing an ontology-driven aspect-based sentiment analysis with which to measure the general public’s opinions as regards infectious diseases when expressed in Spanish by employing a case study of tweets concerning the Zika, Dengue and Chikungunya viruses in Latin America. Our proposal is based on two technologies. We first use ontologies in order to model the infectious disease domain with concepts such as risks, symptoms, transmission methods or drugs, among other concepts. We then measure the relationship between these concepts in order to determine the degree to which one concept influences other concepts. This new information is subsequently applied in order to build an aspect-based sentiment analysis model based on statistical and linguistic features. This is done by applying deep-learning models. Our proposal is available on a web platform, where users can see the sentiment for each concept at a glance and analyse how each concept influences the sentiment of the others.