Center for Language Engineering

 
 



 

 

KICS
KICS-UET


 
 

[ Text Corpora ] [ Image Corpora ] [ Speech Corpora ] [ Lexical Resources ] [ NLP Applications ]

 
 

[ How to Order ]

 
   
 

CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.

 
     
  CLE Urdu Stemmer [ Pakistan ] [ International ]
   
 
CLE Catalog #: CLE17A006
Release Date: 7 September 2017
Language(s): Urdu
Application Type: API
Platform: JAVA
Distribution: Web Download
Processing Fee (Pakistan): 30000 PKR
Processing Fee (International): 250 USD
License: Yes
   
  Introduction
  Stemming is the process of reducing inflected or derived words to their stem, base or root form. Stemming proves useful in multiple NLP research areas, particularly in the field of information retrieval. The need for stemming becomes even more pronounced for languages like Urdu which are morphologically rich and have a variety of inflected and derived forms. CLE Urdu Stemmer is a Java API that provides an interface to extract the stem, prefix, and postfix from Urdu word. The Urdu Stemmer has 91.2% accuracy tested on 10418 Urdu words.
  
The minimum hardware requirements for CLE Urdu Stemmer are: Pentium-compatible CPU 2.8 GHz and 4 GB RAM. This application requires Windows 7 or above with Java Runtime Environment 8.0.
   
  Package
  The package of CLE Urdu Stemmer contains:
  1. CLE Urdu Stemmer API
  2. CLE Urdu Stemmer API - Release Notes
   
  Sample
 
خوش‌اخلاقی

: ان پٹ

 
خوش

: (Prefix) سابقہ

: آؤٹ پٹ

 
اخلاق

: (Stem) راس

 
ی

: (Postfix) لاحقہ

 
  Online Urdu Stemmer
 
 
 
 

webmaster@cle.org.pk