Publicacions (42) Publicacions de MIQUEL ESPLA GOMIS

2024

  1. Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

    2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

  2. Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

    IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 46, Núm. 2, pp. 837-850

2023

  1. MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

    Proceedings of the 24th Annual Conference of the European Association for Machine Translation, EAMT 2023

2022

  1. Building Domain-specific Corpora from the Web: the Case of European Digital Service Infrastructures

    Proceedings of the International Conference on Language Resources and Evaluation, LREC 2022 - 15th Workshop on Building and Using Comparable Corpora, BUCC 2022

  2. Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation

    Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

  3. MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

    EAMT 2022 - Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

2021

  1. Applying automatic translation for optical music recognition’s encoding step

    Applied Sciences (Switzerland), Vol. 11, Núm. 9

  2. Bicleaner at WMT 2020: Universitat d'Alacant-Prompsit's submission to the parallel corpus filtering shared task

    5th Conference on Machine Translation, WMT 2020 - Proceedings

  3. Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

    EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

  4. Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021)

2020

  1. An English-Swahili parallel corpus and its use for neural machine translation in the news domain

    Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020

  2. ParaCrawl: Web-Scale Acquisition of Parallel Corpora

    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020)

  3. ParaCrawl: Web-scale acquisition of parallel corpora

    Proceedings of the Annual Meeting of the Association for Computational Linguistics

  4. Presentation of the monograph «Spoken Corpus Linguistics in Romance: thoughts, design and results»

    Caplletra: revista internacional de filología, Núm. 69, pp. 117-123

2019

  1. Predicting insertion positions in word-level machine translation quality estimation

    Applied Soft Computing Journal, Vol. 76, pp. 174-192

2016

  1. Bitextor's participation in WMT'16: Shared task on document alignment

    Proceedings of the Annual Meeting of the Association for Computational Linguistics