CLE Store

Center for Language Engineering

[ Text Corpora ] [ Image Corpora ] [ Speech Corpora ] [ Lexical Resources ] [ NLP Applications ]

CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.

CLE Urdu Phrase Chunker

[ Pakistan ] [ International ]

CLE Catalog #:	CLE17A005
Release Date:	24 May 2017
Language(s):	Urdu
Application Type:	API
Platform:	Python
Distribution:	Web Download
Processing Fee (Pakistan):	30000 PKR
Processing Fee (International):	250 USD
License:	Yes

Introduction

CLE Urdu Chunker (IOB-tagger) assigns IOB tags to obtain syntactic phrases like Noun phrases (NPs), Verb phrases (VPs), Post-positional phrases (PPs) and Prepositional phrases (PrPs). The chunker accepts POS tagged text and outputs IOB tagged text. Each word of the output contains a POS tag and an IOB tag i.e. word/POS/IOB. The chunker is trained on CLE Urdu Digest IOB Tagged Corpus 100K and gives a tagging accuracy of 97.06 % on 10% manually tagged test set. For Urdu POS tagging, please see: Urdu POS Tagger

Package

The package of CLE Urdu Phrase Chunker contains:

CLE Urdu IOB Tagger API
CLE Urdu IOB Tagger API - Release Notes

System Requirements

The minimum hardware requirements for this application are: Pentium-compatible CPU 2.8 GHz and 1 GB RAM. This application requires Linux (Chunker is trained and tested on Ubuntu 14.04 LTS) with python 2.7.

Sample

Input

                      ایک/CD جنگل/NN میں/PSP بہت/Q سے/PRT چرندوپرند/NN رہتے/VBF تھے/AUXT ایک/CD روز/NN وہاں/NN شیر/NN آ/VBF گیا/AUXA جس/PRR نے/PSP آتے/VBF ہی/PRT بہت/Q سے/PRT جانوروں/NN کو/PSP شکار/NN کر/VBF لیا/AUXA ۔/PU

Output

                     ایک/CD/B-NP جنگل/NN/I-NP میں/PSP/B-PP بہت/Q/B-NP سے/PRT/I-NP چرندوپرند/NN/I-NP رہتے/VBF/B-VP تھے/AUXT/I-VP ایک/CD/B-NP روز/NN/I-NP وہاں/NN/B-NP شیر/NN/B-NP آ/VBF/B-VP گیا/AUXA/I-VP جس/PRR/B-NP نے/PSP/B-PP آتے/VBF/B-VP ہی/PRT/O بہت/Q/B-NP سے/PRT/I-NP جانوروں/NN/I-NP کو/PSP/B-PP شکار/NN/B-NP کر/VBF/B-VP لیا/AUXA/I-VP ۔/PU/O

webmaster@cle.org.pk