Hello Forum, We are seeking a way to extract data from PDFs prior to converting them to images (via java action). The data we need to extract from each PDF is always in the same physical location of each sheet, and represents a unique numbering convention that we'd like to capture. Has anyone developed a means of doing something similar?
If the information is letters and numbers, I think you could accomplish this using Lucene search module.
The PDFBox Java library can do this. I've used this library to write data onto PDFs in the past, in my PDF exporter module here.
You can find a tutorial for reading text from PDF files here.