Center for Language Engineering

 
 



 

 

KICS
KICS-UET


 
 

[ Localization ] [ Language Processing ] [ Linguistic Resources ]

 
   
  Urdu Most Frequently Used Ligatures List  
     
  Release Notes  
 

The ligature list has been extracted from 19.3 million corpus gathered from a wide range of domains
as mentioned in the following table, keeping in view the end user perspective.

 
     
 

Domains

Sub domains

   
  C1. Sports/Games   C1.1. Sports (special events)
  C2. News
 
  C2.1. Local and international affairs
  C2.2. Editorials and opinions
  C3. Finance
 
  C3.1. Business, domestic and
          foreign market
  C4. Culture/Entertainment

 

  C4.1. Music, theatre,exhibitions,
          review articles on literature
  C4.2. Travel / tourism
  C5. Consumer Information

 

  C5.1. Health
  C5.2. Popular science
  C5.3. Consumer technology
  C6. Personal communications
 
  C6.1. Emails, online, discussions,
          editorials, e-zines
   
 
  Domain wise corpus size distribution is given in the following table.  
     
 

Domains

Raw Corpora

Size

Distinct words

     
  C1. Sports/Games 1666304 23118
  C2. News 8957259 67365
  C3. Finance 1162019 17024
  C4. Culture/Entertainment 3845117 59214
  C5. Consumer Information 1980723 34151
  C6. Personal communications 1685424 30469
     
Total 19296846 104341
 
       
  Download (This file has been accessed: times, since 25 April 2012)  
 

Urdu most Frequently Used Ligatures

License  
     
 

webmaster@cle.org.pk