pdfbox class instantiation issue (was: pdfbox-2.0.3.jar vs pdfbox-app-2.0.3.jar)
I'm building a module that reads the content of a pdf document (which is based on System.FileDocument). First I made a litle setup that uses PDFBox to directly read a PDF from filesystem (pdfbox-2.0.3.jar, retrieved via Maven). InputStream is = new FileInputStream("C:\\Users\\dude\\JaarverslagPensioenfonds.pdf"); BufferedInputStream bis = new BufferedInputStream(is); PDDocument pdDocument = PDDocument.load(bis); PDFTextStripper textStripper = new PDFTextStripper(); String text = textStripper.getText(pdDocument); That works nice. (I could have used a more simple load(), but for this example above is better). So now I want to use the PDF library that shows up in userlib (pdfbox-app-2.0.3.jar). I guess this is a Mendix specific tweak of pdf-box-2.0.3.jar? I use above code, slightly tweaked, in a JavaAction: InputStream inputStream = Core.getFileDocumentContent(getContext(), __document); PDDocument pdDocument = PDDocument.load(inputStream); PDFTextStripper textStripper = new PDFTextStripper(); String text = textStripper.getText(pdDocument); (__document is an extention of System.FileDocument, as a IMendixObject object). Then get this error: java.lang.ExceptionInInitializerError: null at amicosensoring.actions.DocumentToTextAction.executeAction(DocumentToTextAction.java:41) at amicosensoring.actions.DocumentToTextAction.executeAction(DocumentToTextAction.java:1) Line 41 is the line where the PDFTextStripper is instantiated. In the line before the PDDocument is loaded correctly (the debugger in Eclipse confirms that). The source of the Mendix version of pdfbox is not available, but it appears that this stripper extends org.apache.pdfbox.text.LegacyPDFStreamEngine. Mind the 'Legacy' part; the 'regular' pdf-box-2.0.3.jar doesn't have that. My question(s) is(are): what is the correct way of initialising this PDFTextStripper whitin Mendix java actions; what is this 'Legacy' about; is there a workaround? Thanks, Nol
Nol de Wit
The pdfbox-app is a standalone version of PDFbox, both are available from their maven site and download page.
According to it's history, PDFStreamEngine became 'deprecated' in version 2.0.3 and renamed to LegacyPDFStreamEngine.
This should be true for both 2.0.3 jars (are you sure both are that version?)
Do you know which module added the pdfbox-app jar to your userlib (I'm guessing communitycommons). Depending on where it is used, you might be able to just swap it with the pdfbox jar.
Hope that helps!
For readability purpose I copy my answer to Jeroen here as well:
My remark that the 'regular' pdf-box-2.0.3.jar does not contain the Legacy prefix is incorrect: it is the same as in pdfbox-app-2.0.3.jar
I thought that pdfbox-app-xxx.jar was a Mendix product, but realise now that that is not the case. Thanks for pointing out.
Yes, it appears to be part of communitycommons: it is accompanied by a 0k file called 'pdfbox-app-2.0.3.jar.CommunityCommons.RequiredLib'.
I'm not using both jars at the same time
I've replaced the -app jar for the 'regular' jar in the userlib, but the same error persists.
I didn't include the full error: I realise that there is a cause section in the stacktrace as well, which gives a little more info:
Caused by: java.lang.NullPointerException: null at org.apache.pdfbox.text.PDFTextStripper.(PDFTextStripper.java:1866) at amicosensoring.actions.DocumentToTextAction.executeAction(DocumentToTextAction.java:39)
This line 1866 referres to a static block in PDFTextStripper, which tries to close an inputstream it just opened, which apparently is null (when used within Mendix, not when used in a standalone java application).
The debugger in Eclipse shows that the InputStream 'input' is null, which causes a NPE when it is attempted to be closed on line 1866. I haven't succeeded yet to find which classloader it is that doesn't want to load this 'BidiMirroring.txt' file (which is available in the -app jar), but I presume it is a Mendix specific one (which is not allowed to load resources like these..?).
Anyone any idea how to get around this, or to instruct this (presumably) Mendix class loader to load this txt file?