PoeTree. Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian and Spanish

  1. Plecháč, Petr 1
  2. Kolár, Robert 1
  3. Cinková, Silvie 12
  4. Šeļa, Artjoms 34
  5. De Sisto, Mirella 5
  6. Nugues, Lara 6
  7. Haider, Thomas 7
  8. Nagy, Benjamin 3
  9. Delente, Éliane 8
  10. Renault, Richard 8
  11. Bobenhausen, Klemens 9
  12. Hammerich, Benjamin 9
  13. Mittmann, Adiel 10
  14. Palkó, Gábor 11
  15. Horváth, Péter 11
  16. Navarro Colorado, Borja 12
  17. Ruiz Fabo, Pablo 13
  18. Bermúdez Sabel, Helena 14
  19. Korchagin, Kirill 15
  20. Plungian, Vladimir 1516
  21. Sitchinava, Dmitri 17
  1. 1 Czech Academy of Sciences, Institute of Czech Literature
  2. 2 Charles University in Prague
    info

    Charles University in Prague

    Praga, República Checa

    ROR https://ror.org/024d6js02

  3. 3 The Institute of the Polish Language of the Polish Academy of Sciences
  4. 4 University of Tartu
    info

    University of Tartu

    Tartu, Estonia

    ROR https://ror.org/03z77qz90

  5. 5 Tilburg University
    info

    Tilburg University

    Tilburgo, Holanda

    ROR https://ror.org/04b8v1s79

  6. 6 University of Basel
    info

    University of Basel

    Basilea, Suiza

    ROR https://ror.org/02s6k3f65

  7. 7 University of Passau
    info

    University of Passau

    Passau, Alemania

    ROR https://ror.org/05ydjnb78

  8. 8 Université de Caen Normandie
  9. 9 Metricalizer
  10. 10 Universidade Federal de Santa Catarina
    info

    Universidade Federal de Santa Catarina

    Florianópolis, Brasil

    ROR https://ror.org/041akq887

  11. 11 Eötvös Loránd University
    info

    Eötvös Loránd University

    Budapest, Hungría

    ROR https://ror.org/01jsq2704

  12. 12 Universitat d'Alacant
    info

    Universitat d'Alacant

    Alicante, España

    ROR https://ror.org/05t8bcz72

  13. 13 University of Strasbourg
    info

    University of Strasbourg

    Estrasburgo, Francia

    ROR https://ror.org/00pg6eq24

  14. 14 Jinntec GmbH
  15. 15 Institute of Russian Language, Russian Academy of Sciences
  16. 16 Institute of Linguistics, Russian Academy of Sciences
  17. 17 University of Potsdam
    info

    University of Potsdam

    Potsdam, Alemania

    ROR https://ror.org/03bnmw459

Editor: Zenodo

Year of publication: 2023

Type: Dataset

Abstract

PoeTree (Poetry Treebanks) is a dataset comprising over 300,000 poems / 84,000,000 tokens in nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Spanish, and Russian). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure (schema available at https://versologie.cz/poetree/json-schema). cs (~80k poems) derived from Corpus of Czech Verse de (~50k poems) derived from Metricalizer en (~40k poems) based on the texts from Project Gutenberg es (~9k poems) derived from Corpus of Spanish Golden-Age Sonnets and Diachronic Spanish Sonnet Corpus fr (~18k poems) derived from Malherbə hu (~13k poems) derived from ELTE Poetry Corpus it (~40k poems) derived from Biblioteca Italiana pt (~5k poems) derived from Poemas ru (~45k poems) derived from Corpus of Russian Poetry