Best way to deal with a huge list in microflows

13
I have an CSV importer that imports a csv file with almost 260000 rows (11 columns each). Importing goes fine but when I try to run a microflow to process all rows I get the following stacktrace (due to iterating over a list of almost 260000 objects): 2009-09-08 13:21:40.140 ERROR - EXTERNALINTERFACE: java.lang.OutOfMemoryError: Java heap space What would be the best way to make sure the processing of rows doesn't run into heap space errors without having to split the csv file into multiple files?
asked
3 answers
20

Instead of splitting the CSV file it's also possible to split the processing of the import rows in your microflow. In other words: the processing of the import rows must be done in batches.

Okay, sounds great. But how do you do that? And therefor for you need Java, because it's currently not possible to give a amount and offset to a list / retrieve activity in microflow. But that's no problem because you can retrieve the limited list by Java. You can also do the complete job by Java but in my opinion a solution with as much as possible logic in microflow is better.

So do the following:

  1. Create a Java action with input parameters amount (Integer) and offset (Integer)
    • Use the Core.retrieveXPathQuery(IContext context, String xPathQuery, int amount, int offset, Map<String, String> sortMap) to retrieve the objects you want.
    • The third and fourth parameters can be used for passing the amount and offset.
    • Pass a Map< String, String > as last parameter to sort ascending on the CreateDate attribute.
    • Let the function return the list.
  2. Create a microflow that looks like the microflow at the bottom of my post. Neccesary comments are included in the microflow image (excuse me for the bad layout, but I have not enough space to make a horizontal oriented image).

Good luck!

alt text

NOTE: When you have tot deal with very much objects (> 10.000) then consider using separate transactions for the processing. Otherwise your will maybe get database / memory trouble.

answered
4

Hi,

I've got the same scenario, except that I am dealing with up to 400000 records.

I notice that this is a fairly old post, is it still the best solution for the problem?

Kind Regards

answered
2

With a little bit Java knowledge i think it is quite feasible to use stream processing to read your csv file: Open a filestream to the csv file, read a couple of lines, open a new mendix context, parse the lines into objects, commit and close the context.

The advantage of reading from a filestream is that no data is kept in memory, except for the current line.

answered