Center for Language Engineering

 
 



 

 

KICS
KICS-UET


 
 

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

 
   
  Urdu Parts of Speech Tagset  
     
  Release Notes  
 

Parts of Speech (POS) tagging is a fundamental component of most natural language processing systems. The development of Tagset of a language is the first step towards the achievement of this task. The current Urdu Tagset consists of syntactic categories, and improves upon the earlier versions available (Muaz et al. (2009), Sajjad (2007), Sajjad and Schmid (2009)). This POS Tagset has been used to develop the CLE Urdu Digest POS Tagged Corpus 100K (available at: http://cle.org.pk/clestore/urdudigestcorpus100ktagged.htm) developed by Center for Language Engineering, KICS, UET, Lahore.

This work has been developed through the project grant for Essential Urdu Linguistic Resources (www.cle.org.pk/eulr) in collaboration with University of Konstanz (http://www.uni-konstanz.de/), Germany and funded by German Academic Exchange Service, DAAD (https://www.daad.org/), Germany.

 
     
  Download (This file has been accessed: times, since 31 May 2013)  
 

Urdu Parts of Speech Tagset

   
 

Urdu Parts of Speech Tagset (Old)

License  
     
 

webmaster@cle.org.pk