Publications by the researcher in collaboration with Antonio Toral Ruiz (35)


  1. Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

    2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings


  1. MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

    Proceedings of the 24th Annual Conference of the European Association for Machine Translation, EAMT 2023


  1. Building Domain-specific Corpora from the Web: the Case of European Digital Service Infrastructures

    Proceedings of the International Conference on Language Resources and Evaluation, LREC 2022 - 15th Workshop on Building and Using Comparable Corpora, BUCC 2022

  2. MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

    EAMT 2022 - Proceedings of the 23rd Annual Conference of the European Association for Machine Translation


  1. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions

    15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference

  2. Crawl and crowd to bring machine translation to under-resourced languages

    Language Resources and Evaluation, Vol. 51, Núm. 4, pp. 1019-1051

  3. Final results of Abu-MaTran (automatic building of machine translation)

    20th Annual Conference of the European Association for Machine Translation, EAMT 2017


  1. Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences

    Proceedings of the Annual Meeting of the Association for Computational Linguistics

  2. Producing monolingual and parallel web corpora at the same time - SpiderLing and Bitextor's love affair

    Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

  3. Producing monolingual and parallel web corpora at the same time: SpiderLing and Bitextor's love affair

    10th conference on International Language Resources and Evaluation (LREC'16) (European Language Resources Association), pp. 2949-2956


  1. Abu-matran at wmt 2015 translation task: Morphological segmentation andweb crawling

    10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings

  2. Automatic Acquisition of Machine Translation Resources in the Abu-MaTran Project

    Procesamiento del lenguaje natural, Núm. 55, pp. 185-188


  1. Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules

    Proceedings of the Annual Meeting of the Association for Computational Linguistics

  2. Extrinsic evaluation of web-crawlers in machine translation: A case study on Croatian-English for the tourism domain

    Proceedings of the 17th Annual Conference of the European Association for Machine Translation, EAMT 2014


  1. Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon

    Language Resources and Evaluation, Vol. 46, Núm. 3, pp. 383-419


  1. A study on Linking Wikipedia categories to WordNet synsets using text similarity

    International Conference Recent Advances in Natural Language Processing, RANLP

  2. Enrichment of language resources by exploiting new text and the resources themselves a case study on the acquisition of a ne lexicon

    Enrichment of language resources by exploiting new text and the resources themselves a case study on the acquisition of a ne lexicon

  3. Exploiting Wikipedia and EuroWordNet to solve Cross-Lingual Question Answering

    Information Sciences, Vol. 179, Núm. 20, pp. 3473-3488


  1. Applying Wikipedia's multilingual knowledge to cross-lingual question answering

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)