|
[ Text Corpora ] [ Image Corpora ] [ Lexical Resources ] [ NLP Applications ] |
|
|
[ How to Order ] |
|
|
|
|
|
CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.
|
|
|
|
|
|
CLE Urdu HFD 14 Point Size Instance Images |
|
[ Pakistan ] [ International ] |
|
|
|
Source: |
|
CLE Catalog #: |
CLE14I021 |
Release Date: |
29 April 2014 |
Data Type: |
Image |
Language(s): |
Urdu |
Distribution: |
1 DVD, Web Download |
Processing Fee (Pakistan): |
30000 PKR |
Processing Fee (International): |
250 USD |
License: |
Yes |
|
|
|
|
Introduction |
|
CLE Urdu HFD 14 Point Size Instance Images is an image corpus of diacritics including Ijam, Tashkil, punctuation marks and special symbols of Urdu. Ligatures having these diacritics are written using Noori Nastalique at 14 font size and scanned at 300 DPI. The scanned images are processed to extract instances of diacritic classes. A total of 18 diacritic classes are organized in separate folders. Each folder contains up to 100 instances of respective diacritic class.
|
|
|
|
Data |
|
This corpus contains 18 folders of images in BITMAP format. Each image name is labeled as follows:
B(Binarized)_HFD(HighFrequencyDiacritic)_<DiacriticClass>_CC(ConnectedComponent)_<SampleNumber>_F<FontSize>.bmp
A separate file is maintained which contains information about each folder including folder name and number of instances of respective diacritic class. This file will be distributed along with the corpus. |
|
|
|
Samples |
|
|
|
|
|
|
Instance Image of ‘SINGLE DOT’ |
Instance Image of ‘DOUBLE DOT’ |
Instance Image of ‘MADDAH’ |
Instance Image of ‘SECONDARY STROKE OF GAAF’ |
|
|
|
|
|
|
|
|
|
|
|
|
|