Each Entity in Mendix gets its corresponding table in the database. Each non-virtual attribute of that entity becomes a field in the corresponding table. In case of a generalization then the actual entity instance is partially stored in the generalization table and partially in the specialization table.
Let's take the example of the `Gen` and `Spec` entities (where `Spec` inherits from `Gen`).
If we would execute an XPath to get all the `Spec` instances (`//MyModule.Spec`) then we need to read from the `Spec` table the attributes defined on the `Spec` entity and from the `Gen` table the attributes defined on the `Gen` entity. We know that for each record in `Spec` there is a corresponding record in `Gen`. As such we can use a single query to read that information using an INNER JOIN.
If we would execute an XPath to get all the `Gen` instances, then we actually mean to say: give me all the `Gen` instances and all the `Spec` instances (that's the meaning of inheritance, a `Spec` instance is also a `Gen` instance). However, some records available in `Gen` might not be available in `Spec` (we can't invert that same rule to say that all `Gen` instances are also `Spec` instances). Also, there might be cases where there is a `Spec2` instance also inheriting from `Gen`. So that's why we need multiple queries. We start with querying `Gen`. This gives us all the `Gen` instances from which we can deduct which instances are of a more specific type (the `Spec` instances for example). For those instances we fire additional queries (one per specialization entity) to load the attributes defined on those entities. This results in those queries with an `IN` statement having a lot of ID's in it).
The main reason to do that directly is because you actually asked to load the full entity instance (by using the XPath query). The full entity instance is an instance with the attributes from all tables belonging to that entity and inheritance structure. If you would defer loading that, you might do it on demand, but that gets actually worse if you iterate over the entity instances (as it would have to query the `Spec` table per instance).
The chosen approach (INNER JOIN where possible, multiple queries where necessary) is done because usually when there are less records in the database, then the latency of multiple queries quickly outgrows the speed of separate queries. Although I have to admit that I am a bit surprised that the database is apparently more efficient in a big IN statement execution than an INNER JOIN. I'll have to do a bit more investigation into that why this is the case.
Does this answer your questions?
I did some investigations in the time spend on those queries. I created a situation with `Gen` and `Spec` entities containing 1000 `Spec` entities. When I query all `Gen` entity instances (so executing two queries) my query performs around twice as slow as when I query all `Spec` entity instances (executing one query with INNER JOIN). This is exactly the opposite of what you measured!
So I looked again at your example and noticed one important difference. You have a LIMIT of 1.000 records in your query, but also an OFFSET of 500.000 records. Now, this is interesting. When executing the INNER JOIN query, the database will most probably first join _all_ the records of `Gen` with the corresponding records of `Spec` and only then drop the first 500.000 records. This is obviously slower then loading 1000 specific records from the `Spec`table. I think this is the main reason for the performance difference. I think when you try this again without the OFFSET, you will see different numbers.
1) It usually does not. For this domain
The GEN OQL"SELECT MyFirstModule.gen/* FROM MyFirstModule.gen LIMIT 20" results in
ORDER BY "myfirstmodule$gen"."id" ASC LIMIT 20
The SPEC OQL "SELECT MyFirstModule.spec/* FROM MyFirstModule.spec LIMIT 20" results in
INNER JOIN "myfirstmodule$gen" "dj1myfirstmodule$gen" ON "dj1myfirstmodule$gen"."id" = "myfirstmodule$spec"."id"
ORDER BY "myfirstmodule$spec"."id" ASC LIMIT 20"
So there is something wrong with your test. Sure you got the correct logs? Picture 6 contains more then just the retrieval of GEN. The second part (
is retrieval of SPEC. Probably because your app requests it.
Try again with a simpler model and more separated testcases.
2) and 3) will probably become clearer with your simpler model
A little more detail on what I'm seeing, in response to @Tim van Steenbergen (thanks for pitching in btw). For the record; I'm pretty sure there's something wrong with my test. I'm just not sure what. Hence the post :)
Here's a step-by-step of what I've got.
The domain model (above). Other than having a few more attributes, it doesn't look different from what Tim is using.
The microflow (above). Called from a button. This is literally the entire test. No further actions are called, no pages are shown.
The retrieve action (above). RunCount constant is set to 1000. Offset is set to 500k (halfway the 1000k DB table).
So far, it all looks about as basic as I can make it.
The Mendix ConnectionBus_Retrieve shows me the following:
The log output for with ConnectionBus_Retrieve set to trace (above) shows six records that relate to my query.
The first log record (above) shows the output of the request analyzer. Nothing unexpected there.
The second log record (above) shows the incoming query, with what I interpret as a query ID: [071ab9c3-358f-3d20-8d42-7fadc1b7c6e0]. Still looks like a standard retrieve on the ParentGeneral table.
The third log record (above) shows the OQL translation of the Xpath query. Again, nothing unexpected.
The fourth log record (above) shows the SQL translation of the OQL query. Still nothing unexpected.
The fifth log record (above) shows a second SQL query for the same original query ID. This is the retrieve of the ChildSpecial table (or Tim's SPEC). As I read it, this query originates from the same Xpath / OQL, though I can't for the life of me figure out why.
The sixth log record (above) shows the outcome, similar to the one in my original post. It includes the attributes for both ParentGeneral and ChildSpecial.
I'll set up another test and try to further simplify things. Still, Apart from the limit/offset, it feels about as basic as it can be.
[Edit note: resized pictures]