Urdu-Nepali-English Parallel Corpus

Center for Language Engineering

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

Urdu-Nepali-English Parallel Corpus

Center for Research in Urdu Language Processing (CRULP) is pleased to release Urdu and Nepali corpora parallel to 100,000 words of common English source from PENN Treebank corpus, available through Linguistic Data Consortium (LDC). The text files used are listed in the README files provided for each corpus. The corpora are also tagged for part of speech. The work has been supported by the Language Resource Association (GSK) of Japan and International Development Research Center (IDRC) of Canada, through PAN Localization project (www.PANL10n.net).

Download (This file has been accessed: times, since 01 September 2010)
Urdu Corpus	Read me	License
Urdu Corpus Extended	Read Me	License
POS Tagged Urdu Corpus	Read Me	License
Nepali Corpus	Read me	License
POS Tagged Nepali Corpus	Read me	License