I am trying to parse content in PDF file, such as text or data in tables, into modifiable and displayable objects that are stored in a database in Mendix. Ultimately, I want this to be automated so once a user uploads a PDF file into the Mendix platform, Mendix platform will read and parse the PDF file using Java action or similar tools, automatically store the necessary data into the database, and display them on Mendix platform. I have currently created Java code that parses every text in a PDF file as well as every data tables in a PDF file, and type them into an Excel file and use Excel Importer to import the data from the Excel file. But like I said, the ultimate purpose of the whole project was to parse content (that the user needs) from the PDF file, store them as modifiable data objects in database and display them in Mendix as an automated process, thus parsing everything and using an Excel Importer seemed very inefficient. So I was thinking if there was any other way that can achieve the ultimate goal here. At the end, the Mendix project should be an automated process that reads data from PDF file, store the data into database, and display them on Mendix.
Have a look at the PDFREader module form the marketplace that does exactly what you are trying to achieve.