|
[ Text Corpora ] [ Image Corpora ] [ Lexical Resources ] [ NLP Applications ] |
|
|
[ How to Order ] |
|
|
|
|
|
CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.
|
|
|
|
|
|
CLE Urdu 40 Point Size Instance Images |
|
[ Pakistan ] [ International ] |
|
|
|
Source: |
CLE Urdu Image Corpus for 40 Point Size |
CLE Catalog #: |
CLE14I043 |
Release Date: |
12 December 2014 |
Data Type: |
Image |
Language(s): |
Urdu and English |
Distribution: |
1 DVD, Web Download |
Processing Fee (Pakistan): |
30000 PKR |
Processing Fee (International): |
250 USD |
License: |
Yes |
|
|
|
|
Introduction |
|
CLE Urdu 40 Point Size Instance Images is an image corpus collected from one hundred and seventy three pages which are scanned from twenty six books written using Noori Nastalique. These books have a variety of publishers, publication dates, paper, printing and transparency qualities and are selected from different domains such as literature, poetry, religion, biography, novel, interviews, culture/travel, history, autobiography, science, short stories and character representation. Pages from each book are scanned at 300 DPI to generate image corpus.
The main bodies excluding diacritics and broken main bodies are extracted from the scanned images. A total of 276 main body classes are distributed in separate folders. Each folder contains up to thirty five samples, however in a few cases the sample count may be lower. |
|
|
|
Data Source |
|
CLE Urdu 40 Point Size Instance Images are extracted from CLE Urdu Image Corpus 40 Point Size.
|
|
|
|
Data |
|
This corpus contains 276 folders of images in BITMAP format. Each image name is labeled as follows:
<LigatureClass>_<SampleNumber>_F<FontSize>.bmp
A separate file is maintained which contains information about each folder including folder name and number of instances of respective main body class. This file will be distributed along with the corpus. |
|
|
|
Samples |
|
|
|
|
|
|
Instance Image of “ببقبد" Ligature Class |
Instance Image of “بھلے” Ligature Class |
Instance Image of “علطبا” Ligature Class |
Instance Image of “ہم” Ligature Class |
|
|
|
|
|
|
|
|
|
|
|
|
|