Project Details |
Start date of project |
18 October, 2010 |
Duration of project |
1 Year |
Funding agency |
PAN Localization |
Principle investigator |
Dr. Sarmad Hussain |
Project status (completed/in progress) |
In progress |
Objectives |
-
To develop and mature algorithms for analyzing and recognizing Urdu text images using segmentation-based and ligature-based methods.
-
To develop automatic scaling algorithms for Urdu ligatures to make font size independent system.
-
To develop the Urdu OCR for Nastalique style of writing.
-
To develop post-processing algorithms in computational linguistics for output generation and error correction of Urdu OCR.
|
Scope of work |
- Urdu OCR will recognize the Urdu text written in Noori Nastalique writing style. Any text written using other writing styles will not be processed.
- The text from books written with font sizes ranging from 14 to 24 will be recognized. Smaller or larger font sizes will not be processed.
- This application will process plain text, and will not process advanced formatting, e.g. Italic, bold, and underline, etc.
- This application will not process figures and multi column text.
|
(Anticipated) Deliverables |
- Ligature based recognizer at 14 point size
- Segmentation-based recognizer at 14 point size
- Font size independent System for 14-24 point size
- Ligature to word mapping system
|