|
[ Text Corpora ] [ Image Corpora ] [ Lexical Resources ] [ NLP Applications ] |
|
|
[ How to Order ] |
|
|
|
|
|
CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.
|
|
|
|
|
|
CLE Urdu HFL 14 Point Size (5586 classes) Instance Images |
|
[ Pakistan ] [ International ] |
|
|
|
Source: |
CLE Urdu HFL 14 Point Size (5586 classes) Document Images |
CLE Catalog #: |
CLE14I015 |
Release Date: |
26 March 2014 |
Data Type: |
Image |
Language(s): |
Urdu |
Distribution: |
1 DVD, Web Download |
Processing Fee (Pakistan): |
30000 PKR |
Processing Fee (International): |
250 USD |
License: |
Yes |
|
|
|
|
Introduction |
|
CLE Urdu HFL 14 Point Size (5586 classes) Instance Images contains main body images of high frequency ligatures (HFL) of Urdu. CLE Urdu HFL 14 Point Size (5586 classes) Document Images, which contains 5586 HFL classes written using Noori Nastalique at 14 font size and scanned at 300 DPI in grayscale, is processed to extract main body classes. The main bodies excluding diacritics and broken main bodies are extracted from these scanned images. These main body classes cover 131,000 high frequency words (for details see: http://www.cle.org.pk/software/ling_resources/wordlist.htm).
A total of 5586 main body classes are distributed in separate folders. Each folder contains up to thirty five samples, however in a few cases the sample count is lower. There are some folders in which sample images may have the same shape, e.g. ‘;’ and ‘”’. For some main bodies, the diacritics are always attached, e.g. ہا, ہلو, ہجے, etc. |
|
|
|
Data Source |
|
CLE Urdu HFL 14 Point Size (5586 classes) Instance Images are extracted from:
- CLE13I006 CLE Urdu HFL 14 Point Size (5586 classes) Document Images (for details see: http://www.cle.org.pk/clestore/cleurduhfl14pt-5586.htm).
|
|
|
|
Data |
|
This corpus contains 5586 folders of images in BITMAP format. Each image name is labeled as follows:
B(Binarized)_HFL(HighFrequencyLigature)_<LigatureClass>_CC(ConnectedComponent)_<Number>_F<FontSize>.bmp
A separate file is maintained which contains information about each folder including folder name, printed ligature and number of instances of respective main body class. This file will be distributed along with the corpus. |
|
|
|
Samples |
|
|
|
|
|
|
Instance Image of "جبکہ" Ligature Class |
Instance Image of "کبجببے" Ligature Class |
Instance Image of "مجھے" Ligature Class |
Instance Image of "فبصد" Ligature Class |
|
|
|
|
|
|
|
|
|
|
|
|
|