|
Introduction
Access to online information has now become a critical factor in the development of nations globally. Much of local content is available in published form in local languages. Further much of the content being made available online in Urdu is in the form of images, due to the legacy technology being used by the publishing industry, which makes it very bulky to transfer over the low bandwidth connections and renders it unsearchable. With growing access to internet through broad band and mobile, there is increasing need to port relevant content in local languages online to more effectively use the data channels for the benefit of public in Pakistan. Technologies like the Urdu Optical character recognition (OCR) are necessary to drive this change. Urdu Nastalique Optical Character Recognition (OCR) project processes Urdu document images written in Noori Nastalique writing style having font size range from 14 to 44 and outputs Urdu text in Unicode format. This project will accelerate the process of publishing Urdu online content and make published Urdu material more accessible to illiterate and blind community. The specific objectives of Urdu OCR are
|
|||||
|