Publicacions en què col·labora amb Antonio Toral Ruiz (34)

2024

  1. Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

    2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

2023

  1. MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

    Proceedings of the 24th Annual Conference of the European Association for Machine Translation, EAMT 2023

2022

  1. Building Domain-specific Corpora from the Web: the Case of European Digital Service Infrastructures

    Proceedings of the International Conference on Language Resources and Evaluation, LREC 2022 - 15th Workshop on Building and Using Comparable Corpora, BUCC 2022

  2. MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

    EAMT 2022 - Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

2017

  1. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions

    15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference

  2. Crawl and crowd to bring machine translation to under-resourced languages

    Language Resources and Evaluation, Vol. 51, Núm. 4, pp. 1019-1051

  3. Final results of Abu-MaTran (automatic building of machine translation)

    20th Annual Conference of the European Association for Machine Translation, EAMT 2017

2016

  1. Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences

    Proceedings of the Annual Meeting of the Association for Computational Linguistics

  2. Producing monolingual and parallel web corpora at the same time - SpiderLing and Bitextor's love affair

    Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

  3. Producing monolingual and parallel web corpora at the same time: SpiderLing and Bitextor's love affair

    10th conference on International Language Resources and Evaluation (LREC'16) (European Language Resources Association), pp. 2949-2956

2015

  1. Abu-matran at wmt 2015 translation task: Morphological segmentation andweb crawling

    10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings

  2. Automatic Acquisition of Machine Translation Resources in the Abu-MaTran Project

    Procesamiento del lenguaje natural, Núm. 55, pp. 185-188

2014

  1. Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules

    Proceedings of the Annual Meeting of the Association for Computational Linguistics

  2. Extrinsic evaluation of web-crawlers in machine translation: A case study on Croatian-English for the tourism domain

    Proceedings of the 17th Annual Conference of the European Association for Machine Translation, EAMT 2014

2012

  1. Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon

    Language Resources and Evaluation, Vol. 46, Núm. 3, pp. 383-419

2009

  1. A study on Linking Wikipedia categories to WordNet synsets using text similarity

    International Conference Recent Advances in Natural Language Processing, RANLP

  2. Exploiting Wikipedia and EuroWordNet to solve Cross-Lingual Question Answering

    Information Sciences, Vol. 179, Núm. 20, pp. 3473-3488

2007

  1. Applying Wikipedia's multilingual knowledge to cross-lingual question answering

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

  2. GIR with geographic query expansion

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)