|
Dr. Qurat ul Ain Akram having more than 10 years of research experience in areas of language processing, script processing and machine learning. Recently she has complated her doctorate in computer science. Her Ph.d is on text recognition of printed and hand written document images. Currently, she is leading the development of the first Urdu Nastalique Optical Character Recognition system for 14 - 44 font sized typed images. During this research, she has developed novel techniques for Urdu image segmentation, and segmentation-based and ligature-based recognition of Urdu ligatures. In addition, she has also been involved in the development of system for Latin and Nastalique script detection and recognition from document images. The text and image corpora of 14 to 44 font sizes used to develop OCR system are also managed by her. Within her OCR research, she has also designed and executed the process for the development of ground truth data and tools for automatic accuracy computation of image segmentation, ligature recognition and words formation. Her major research includes enhancement of Tesseract (an OCR engine developed by Google) to enable recognition support of Urdu ligatures. She has also managed the word formation system which forms the sequence of words using the recognized sequence of ligatures using statistical model and has also developed a stemmer using rule based technique. Previously, she has developed an automatic system for Urdu news caption detection from videos of different news channels. She has also developed a Spline based font size independent OCR system in her MS thesis research. |
|