Center for Research in
Urdu Language Processing (CRULP) is pleased to release
Urdu and Nepali corpora parallel to 100,000 words of
common English source from PENN Treebank corpus,
available through Linguistic Data Consortium (LDC).
The text files used are listed in the README files
provided for each corpus. The corpora are also tagged
for part of speech.
The work has been
supported by the Language Resource Association (GSK)
of Japan and International Development Research Center (IDRC)
of Canada, through PAN Localization project (www.PANL10n.net). |