PoeTree. Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian and Spanish
- Plecháč, Petr 1
- Kolár, Robert 1
- Cinková, Silvie 12
- Šeļa, Artjoms 34
- De Sisto, Mirella 5
- Nugues, Lara 6
- Haider, Thomas 7
- Nagy, Benjamin 3
- Delente, Éliane 8
- Renault, Richard 8
- Bobenhausen, Klemens 9
- Hammerich, Benjamin 9
- Mittmann, Adiel 10
- Palkó, Gábor 11
- Horváth, Péter 11
- Navarro Colorado, Borja 12
- Ruiz Fabo, Pablo 13
- Bermúdez Sabel, Helena 14
- Korchagin, Kirill 15
- Plungian, Vladimir 1516
- Sitchinava, Dmitri 17
- 1 Czech Academy of Sciences, Institute of Czech Literature
-
2
Charles University in Prague
info
- 3 The Institute of the Polish Language of the Polish Academy of Sciences
-
4
University of Tartu
info
-
5
Tilburg University
info
-
6
University of Basel
info
-
7
University of Passau
info
- 8 Université de Caen Normandie
- 9 Metricalizer
-
10
Universidade Federal de Santa Catarina
info
-
11
Eötvös Loránd University
info
-
12
Universitat d'Alacant
info
-
13
University of Strasbourg
info
- 14 Jinntec GmbH
- 15 Institute of Russian Language, Russian Academy of Sciences
- 16 Institute of Linguistics, Russian Academy of Sciences
-
17
University of Potsdam
info
Éditeur: Zenodo
Année de publication: 2023
Type: Dataset
Résumé
PoeTree (Poetry Treebanks) is a dataset comprising over 300,000 poems / 84,000,000 tokens in nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Spanish, and Russian). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure (schema available at https://versologie.cz/poetree/json-schema). cs (~80k poems) derived from Corpus of Czech Verse de (~50k poems) derived from Metricalizer en (~40k poems) based on the texts from Project Gutenberg es (~9k poems) derived from Corpus of Spanish Golden-Age Sonnets and Diachronic Spanish Sonnet Corpus fr (~18k poems) derived from Malherbə hu (~13k poems) derived from ELTE Poetry Corpus it (~40k poems) derived from Biblioteca Italiana pt (~5k poems) derived from Poemas ru (~45k poems) derived from Corpus of Russian Poetry