07-05-2022 06:20 AM
@wiebe: have you tried the PA-AI Builder text recognition?
07-05-2022 08:27 AM
@cy... wrote:
@wiebe: have you tried the PA-AI Builder text recognition?
No, but I have made a PDF toolkit 😋.
CRC would indeed be a way out, to get most text out of most PDFs. I don't think it's a strategy used my most text extraction tools.
07-09-2022 11:22 PM
Want to read text from pdf.
07-09-2022 11:23 PM
Wanted to read text from pdf.
07-09-2022 11:25 PM
No, please guide me how to do this.
07-09-2022 11:26 PM
please tell me how to use pdf tool kit and where i can find it.
07-10-2022 06:03 AM
The LabVIEW PDF Toolkits out there wouldn’t help you, not even very much if you plan to disassemble them to learn how you have to parse a PDF file yourself in order to get that information.
.They all do the opposite of what you want, creating a PDF document as the programmer calls specific functions to add text blocks, images, page and document formatting commands and more to it. This is already quite a lot of work but relatively manageable as you need to concern yourself only with the things you want to add.
Reading a PDF document consistently is a lot mote complicated. PDF is a very rich document description language based on EPS which was designed to allow transfering any imaginable document to a printer and make the result look almost perfectly WYSIWYG. It means also that the resulting PDF document can be very different depending on the tool which created it but look in a perfectly implemented renderer exactly the same. Your PDF parser has to be prepared to face lots and lots of different possible syntax elements and it’s not always possible to skip unknown syntax elements as that can mess up your parser state.
07-10-2022 07:07 AM
wiebe@CARYA wrote
CRC would indeed be a way out, to get most text out of most PDFs. I don't think it's a strategy used my most text extraction tools.
Considering that PDF also allows compression of parts in its document and also encryption, that “most” could be still not enough.
07-11-2022 03:13 AM
@rolfk wrote:
wiebe@CARYA wroteCRC would indeed be a way out, to get most text out of most PDFs. I don't think it's a strategy used my most text extraction tools.
Considering that PDF also allows compression of parts in its document and also encryption, that “most” could be still not enough.
Well, I do mean specialized "text from PDF" extractors. These would parse the PDFs, including decompression and even decrypting. Extracting or reading text from a PDF with notepad, or a tool that reads text from a file has little change.
The standard compressions is a simple inflate (IIRC there are ways to specify a custom compression algorithm). This is so common in PDFs, without support for this you'd be lucky if you can parse 2% of the PDFs.
The encryption only is more or less the same thing. If you only protect printing and editing, there isn't really any encryption. A tool could reverse this. Not sure how easy that is, I only did the decrypting (a while ago).
07-12-2022 07:28 PM
message to OP: if you do not mind having a running cost in your resume processing, I would recommend another platform, read earlier posts for reference. otherwise, have the applicant fill in a standardized forms could be more feasible.