Modeling and analyzing opinions from customer reviews

  1. García Moya, Lisette
Dirigida por:
  1. Rafael Berlanga Director/a

Universidad de defensa: Universitat Jaume I

Fecha de defensa: 11 de enero de 2016

Tribunal:
  1. Paolo Rosso Presidente/a
  2. David Tomás Díaz Secretario
  3. José Antonio Troyano Jiménez Vocal

Tipo: Tesis

Teseo: 397715 DIALNET lock_openTESEO editor

Resumen

The main motivation behind this thesis is the problem of aspect-based sentiment summarization and its application to Business Intelligence (BI). Given a collection of opinion posts, aspect-based summarization has to do with extracting from the collection the most relevant opined aspects (also called features) along with their associated sentiment information (usually an opinion word and/or a polarity score that express the sentiment orientation of the opinion). In the recent scenario of e-commerce, we presume that BI could rely on extracted knowledge from reviews available in the Web in order to analyze recent trends as well as the satisfaction and behavior of customers and to prepare strategic plans accordingly. Specifically, this thesis proposes new methodologies to: - model and extract the opinions and their respective targets (i.e., aspects or features) from collections of opinion posts, and - integrate the extracted sentiment data into a traditional corporate data warehouse to enable BI. The modeling of opinions and their targets takes place in the general framework of statistical language modeling. The hypothesis is that there exists a language model of opinion words able to model the opinion lexicon of a domain, and that there is also a language model of aspects that can be learned from the model of opinions. Both the learning of the models and the extraction of the sentiment data (i.e., the tuples feature-opinion) are implemented using unsupervised approaches that do not need exhaustive natural language processing (except for POS-tagging/ lemmatization). The resulting methodologies can be applied to any language and domain given a seed set of general-domain opinion words. For the integration of sentiment data with traditional corporate data two scenarios are considered: a static one in which both the data sources and the user requirements are static and known in advance, and dynamic one based on an open data infrastructure where BI data can be linked to external sources on demand, without being attached to predefined (rigid) data structures or multidimensional schemas. We demonstrate our proposal on datasets of real opinions available in the Web. Results of the proposed method corroborate the thesis claims and show a good effectivity for their usage as a BI analysis tool.