Project Details |
Start date of project |
1st March, 2012 |
Duration of project |
30 Months |
Funding agency |
National Information and Communication Technology Research and Development (ICT R&D) Fund, Pakistan |
Principle investigator |
Dr. Sarmad Hussain |
Project status (completed/in progress) |
In progress |
Objectives |
-
To develop Urdu OCR for Nastalique style of writing.
-
To develop post-processing algorithms in computational linguistics for output generation and error correction of Urdu OCR.
-
To identify future research directions for graduate research in this area.
-
To provide access to textual information to print disable communities.
|
Scope of work |
- The following character set will be recognized by the Urdu OCR:
- Urdu alphabet set including Urdu digits, Urdu aerab and Latin digits.
- Other symbols of Urdu, as follows:
؎ ؏ ؐ ؑ ؒ ؓ ؔ ؟ () ' " ۔ ؛ : ،
- The text written in Noori Nastalique font style with font size range between14 and 44 will be recognized.
- The system will handle up to 2 columns of text.
- Urdu OCR will handle the Latin script written with Times New Roman, Arial and Courier font styles, within the font size range proposed for Urdu.
- The system will output plain Urdu text in Unicode format.
|
(Anticipated) Deliverables |
- OCR for 14 point size
- OCR for 16 point size
- OCR for 24 point size
- OCR for 36 point size
- Complete Urdu OCR System for 14 to 44 point sizes
|
Useful links |
http://www.cle.org.pk/ocr |