|
[ Text Corpora ] [ Image Corpora ] [ Speech Corpora ] [ Lexical Resources ] [ NLP Applications ] |
|
|
[ How to Order ] |
|
|
|
|
|
CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.
|
|
|
|
|
|
CLE Urdu Stemmer |
|
[ Pakistan ] [ International ] |
|
|
|
CLE Catalog #: |
CLE17A006 |
Release Date: |
7 September 2017 |
Language(s): |
Urdu |
Application Type: |
API |
Platform: |
JAVA |
Distribution: |
Web Download |
Processing Fee (Pakistan): |
30000 PKR |
Processing Fee (International): |
250 USD |
License: |
Yes |
|
|
|
|
Introduction |
|
Stemming is the process of reducing inflected or derived words to their stem, base or root form. Stemming proves useful in multiple NLP research areas, particularly in the field of information retrieval. The need for stemming becomes even more pronounced for languages like Urdu which are morphologically rich and have a variety of inflected and derived forms. CLE Urdu Stemmer is a Java API that provides an interface to extract the stem, prefix, and postfix from Urdu word. The Urdu Stemmer has 91.2% accuracy tested on 10418 Urdu words. |
| |
The minimum hardware requirements for CLE Urdu Stemmer are: Pentium-compatible CPU 2.8 GHz and 4 GB RAM. This application requires Windows 7 or above with Java Runtime Environment 8.0. |
|
|
|
Package |
|
The package of CLE Urdu Stemmer contains:
- CLE Urdu Stemmer API
- CLE Urdu Stemmer API - Release Notes
|
|
|
|
Sample |
|
|
خوشاخلاقی |
: ان پٹ
|
|
خوش
|
: (Prefix) سابقہ
|
: آؤٹ پٹ |
|
اخلاق
|
: (Stem) راس
|
|
|
ی
|
: (Postfix) لاحقہ
|
|
|
|
|
|
Online Urdu Stemmer |
|
|
|
|
|