Memory leak in async microflows

24
/** Last updated 15/05/18 **/   Hi all, Recently I started a small side-project to explicitly test the performance of Mendix database/query behavior in different scenarios.  The first step was to create some simple logic to populate my test database with a large amount of data.  My setup for this was quite straightforward. I created a manual invoke action that would start a number of Asynchronous Microflows through the CommunityCommons java action 'RunAsyncInQueue' . Each ASYNC_CreateAll flow would create a list for each database entity, fill these lists with objects until the batch size was reached, and commit the batches, like so: (picture 1: part of main Async flow) (picture 2: Batch create flow) (Picture 3: part of random data/object creation flow) Now all of the above is nothing that I haven't done many times before, in some form or another. Though not in Mendix 7. The idea of the setup was that I could, for example, have the system run 1000 batches of a 1000 creates each, go for lunch, come back, and have a database with 1 million of each record.  This is where things got interesting.  I'd given the system 2GB heap space, but despite this I rapidly ran into a 'GC Overhead Exceeded' error. I figured I'd tone it down a bit and try for 100 batches of a 100 creates each. This time the logic ran without error. I manually ran a few more batches to at least build a bit of a database. On the fourth manual invoke, I ran into GC overhead error again.  I decided to try and see what was going on, so I connected my Mendix environment to VisualVM to track memory and GC behavior: (Picture 4: VisualVM output) As you can see, the amount of Used Heap rises steadily, and while the Garbage Collector is doing something, the Major GC events never bring the heap back to base levels. In effect, the entities created in memory in the Async microflows don't seem to be released to the JVM GC. After the entire batch ends (all 100 Async flows) we still don't see any drop in Used Heap (time between batches are the 'plateaus' in the memory graph).  (Picture 5: Part of the heap dump) The Heap Dump predictably shows a lot of objects retained in memory. Note also the char[] (String) at the top. As part of the test-data, I create a 'long string' of 20.000 characters to simulate the use of 'Unlimited' text fields. All these are clearly retained in memory and are taking up the largest percentage of the heap (71%). Since I tend to use async flows quite often in projects (though usually not quite so data-heavy), this behavior was cause for concern.  I checked if I could find any obvious reason why objects would be maintained in memory, but the flow does little more than create --> add to list --> commit list. Objects are not referenced to System.Session or System.User, and only a few are referenced to each other. I tried adding the 'Start Transaction' and 'End Transaction' to the start and end of each Async flow, but this changed nothing. Finally, I tried to explicitly terminate each session at the end of the microflows, through a custom java action using this.getContext().getSession().destroy().  (Picture 6: Mendix console output) As you can see, the sessions appear to be destroyed, and each Async flow runs under a different session. Again, this did nothing to change the observed behavior. As far as I can find, Mendix 7 doesn't offer any way to explicitly release an object to the GC anymore (ISession.release(IMendixObject) no longer exists), so I don't really see any way to resolve the issue.  This is only a test-project, so I'm not so much looking for a solution as I'm looking for a cause; - Have I overlooked something that could / should cause this behavior? - Is the issue caused by the way the CommunityCommons action works on Mendix 7? - Is it a bug in the Mendix GC behavior for server-side memory usage?  So for those who bothered to read through the whole thing; have you experienced anything similar in Mendix 7? And have you managed to find a cause and/or resolution?     /** UPDATE **/ I've filed a support ticket and received the following from Mendix: Hi Christiaan, This issue is already being worked on by our R&D Runtime team as part of a larger set of improvements in the way the platforms handles state. The current estimate is that a fix for this particular issue will be released as part of 7.14.0 (currently scheduled for end of this month). This set of improvements is currently the top priority of this particular R&D team. Guess we sit and wait, for the moment :)   /** UPDATE 2 **/ Received the following after submitting my project for testing: Thanks, R&D will take a look at it to determine whether this is a new case or part of the issue they are currently working on. One caveat, if it is confirmed to be the same issue it looks like the full fix will not make it into 7.14.0 but 7.15.0 instead (again, current estimate). More information if and when I get it.   /** UPDATE 3 **/ Sorry that took a few days guys. I just ran the same project on Postgres, and as the Mendix support team predicted, the problem was in the use of Built-in DB. I was theoretically aware of the fact that it ran in memory, but I always assumed it ran in its own container. Never really used it in such a way as to cause these problems, either.  As you can see in the graph below, running 1000 batches of 1000 creates each now works without any issue. GC is behaving as expected. So, no memory leak, no issue in java, just me running the wrong DB.  Thanks @ Mendix for the support! (Picture 7: JVM GC and memory usage whilst running on PostGreSQL)
asked
3 answers
14

We've researched the project: not a bug, but caused by the use of the in-memory database hsqldb. Thanks for the extensive report!

 

Update 1

Some feedback from our investigation:

* We see memory usage growing when testing with hsqldb, but not with postgresql. Reason for this is that hsqldb runs as a library in the mendix runtime, and is a in-memory database, meaning all data created in hsqldb is stored in memory.

* Garbage collection is always done at the end of a request, not at the end of a batch commit, or the end of a submicroflow. This has not changed compared to mendix 6, so should not cause the problem. We are however optimizing garbage collection, so starting with mendix 7.15 (planned) memory usage should be better than mendix 6.

* The planned fix for 7.14 as reported by chris  addresses a different issue where too much client state was returned to the client at the end of a request. This is however not related to the behavior reported above. We checked the community commons executeAsyncInQueue implementation. This action runs your microflow in a new system context. When a context closed, garbage collection will be executed, so all memory used by the microflow will be cleaned up. We can confirm that this also happens  for the project described here.

 

So to summarize, at this moment we see no memory leak, all memory will be cleaned up at end of the request. We do see a need to increase the number of times garbage collection should be done, and this is being implemented.

 

@Christiaan, can you validate this, but running your test project on postgresql to see if memory if freed?

 

Update 2

The issue described by Chris below is not related. It's also not related to garbage collection of mendix objects. This has not changed between Mendix 6 and 7: garbage collection of Mendix objects is always done at the end of a request. Using submicroflows or batches does not change this behavior (again mx 6 and 7 are the same).

The issue reported by Chris is caused by the fact that all newly created objects were sent to the client, which causes a peak in memory usage at the end of the request. The algorithm to determine what new objects are required by the client will be optimized in mendix 7.14, which solves the issue experienced by Chris.

 

answered
4

Great research!

Seeing as this is a combination of the Mendix runtime and actions from the Community Commons module, I suggest you create a support ticket with this issue and include the test project + steps with it.

That way we can try to assess from which part this issue originates and try to fix that.

Edit: And about the Start/End transaction. I believe the Start transaction just creates a nested transaction in that call which you remove at the end, but you still end up with the original transaction (that is not ended). I've done some experimenting with this yesterday as well.

From the Community Commons Documentation:

StartTransaction - Start a transaction, if a transaction is already started for this context, a savepoint will be added.

EndTransaction - Commit the transaction, this will end this transaction or remove a save point from the queue if the transaction is nested.

I suggest you try only adding an 'End transaction' call, without the Start call. This should finish the existing 'top-level' transaction for each separate microflow, and seems to still create a new one automagically for the next microflow call.

answered
4

This bug is more general and will cause 'GC overhead limit exceeded' bugs in batch and list processing. I reported this bug a couple of weeks ago to Mendix and they admitted that the garbage collection in Mx 7 is changed compared to Mx6.

In short: Garbarge collection is done at the end of the microflow and not during. So all data collected in lists will stay in memory until the microflow is ended and not when the list is cleared or recreated.

Adding endtransactions won't help.

My intermediate solution is  to use good old 'executeMicroflowInBatches'

Question: How many of you process more than 50.000 objects in a microflow? Here at FlowFabric we have at least 20 projects.

answered