|
[ Text Corpora ] [ Image Corpora ] [ Lexical Resources ] [ NLP Applications ] |
|
|
[ How to Order ] |
|
|
|
|
|
CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.
|
|
|
|
|
|
CLE Urdu 18 Point Size Distorted Instance Images |
|
[ Pakistan ] [ International ] |
|
|
|
Source: |
CLE Urdu Image Corpus for 18 Point Size and CLE Urdu Image Corpora for 24 - 40 Point Sizes |
CLE Catalog #: |
CLE14I028 |
Release Date: |
13 June 2014 |
Data Type: |
Image |
Language(s): |
Urdu and English |
Distribution: |
1 DVD, Web Download |
Processing Fee (Pakistan): |
30000 PKR |
Processing Fee (International): |
250 USD |
License: |
Yes |
|
|
|
|
Introduction |
|
CLE Urdu 18 Point Size Distorted Instance Images is an image corpus collected from one hundred and ten pages which are scanned from thirty two books written using Noori Nastalique. These books have a variety of publishers, publication dates, paper, printing and transparency qualities and are selected from different domains such as literature, poetry, religion, biography, novel, interviews, culture/travel, history, autobiography, science, short stories and character representation. Up to six pages from each book are scanned at 300 DPI to generate image corpus.
Six hundred and thirty four distorted instances of main bodies are extracted from the image corpus. This collection includes a variation of images in which main bodies are either broken or distorted. In addition, main bodies having attached diacritics, noise or other main bodies are also extracted. A total of two hundred and twenty five distorted main body classes are distributed in separate folders. Each folder contains up to one hundred distorted instances and includes all variations such as broken or noisy main bodies, or main bodies with attached diacritics or noise of respective ligature string. |
|
|
|
Data Source |
|
CLE Urdu 18 Point Size Distorted Instance Images is extracted from:
- CLE Urdu Image Corpus 18 Point Size (for details, see: http://www.cle.org.pk/clestore/imagecorpora.htm).
- CLE Urdu Image Corpora for 24 - 40 Point Sizes (for details, see: http://www.cle.org.pk/clestore/imagecorpora.htm).
|
|
|
|
Data |
|
This corpus contains 225 folders of images in BITMAP format. Each image name is labeled as follows:
B(Binarized)_D(Distorted)_<LigatureString>_<SampleNumber>_F<FontSize>.bmp
A separate file is maintained which contains information about each folder including folder name and number of instances of respective main body class. This file will be distributed along with the corpus. |
|
|
|
Samples |
|
|
|
|
|
|
Distorted Instance Image of ‘پنجگا’ Ligature |
Distorted Instance Image of ‘ا’ and ‘صلیت’ Ligatures |
Distorted Instance Image of ‘بھی’ Ligature |
Distorted Instance Image of ‘علی’ Ligature |
|
|
|
|
|
|
|
|
|
|
|
|
|