Company: IIT Kharagpur
Author(s): Subhasis Mahata
NI Product(s) Used: LabVIEW vision development module
Industry: Big data analytics
The Challenge : election commission of India keep voter list as a photo PDF format to save the disk space, for data crunching prospect it's really difficult to recover the text data from such PDF of different languages
The Solution: We have develop a OCR based LabVIEW program, from where you directly generate MS Excel sheet from any Indian language voter list photo PDF.
Introduction: I have search nearest solution from PYTHON, unfortunate those are not as much as accurate as this one, specially for Indian languages.
Application Description: The most challenging part of this project is to trained the OCR to read form different languages, each page of Photo PDF converted to a high resolution picture with white background and the final stage again those text of different languages convert into English text and to save the big volume of data into database for research and application purpose in real time.
Impact/Results: data crunching from a picture based PDF with different languages and then again it convert into a single language like English is the most difficult phase for the data entry operator.
Conclusion: This program could be save the manpower as well as money and time for them.
Author Contact Details: subhasismahata@yahoo.com