Hi David,
Let me try to answer some of your questions.
First of all, it seems to me the primary purpose of your 'high availablility' setup is to allow stopping one of the Mendix processes without losing the ability to login and work in the application.
Questions about configuration:
Procedures to update application definitions:
Questions about Load Balancer setup:
Questions about URL rewriter:
Licensing Questions:
Using the Persistent Sessions option allows you to failover an end-user to another server instance without having to login again. However (speaking of the current 2.5.3 release):
Considering all of this, it would be possible to create a 'higher-available' setup, using multiple mendix server processes, when you allow sessions to complete on one instance, while redirecting new login sessions to a new instance. When all users are gone, you can take down the first one and do maintenance (well, in most of the cases, as long as it's not maintenance on your mendix application model...).
Or, of course, next sunday at 2 AM, your database server, shared uploaded files storage or load balancer will break. :] Or, due to an application model or server bug, a user action will crash the first instance, and by hitting reload because the user does not get response, the second one will also break. :D
To answer your question "Is anyone using Mendix in a high-availability architecture?": There are a couple of multi-server-instance setups that are being used in production, but the ones I know of are using quite another approach to bump the availability levels for end users, by splitting workloads over several independent instances. A common setup is to use one instance to handle all end-user interaction, and use one or more others to execute batch-processing or scheduled events, or even process jobs that are put into a queue by the end-user instance. This setup prevents end-user inconvenience when e.g. there's risk a batch job or scheduled event could result in a JVM Heap Space Out of Memory. When carefully set up, this can result in a lot less downtime for end users, because as long as the domain model structure remains intact, you can alter the application model of the 'backend' servers and restart them.
David, based on information from the Mendix cloud ops team I'm also working on a documentation page to explain requirements to an HA architecture in more detail, this is page is still in draft. We will certainly include your questions and answers on here with the next updates. (this page)
Using scheduled events in a HA architecture is tricky, and has some challenges. I have been working on a module that allows you to schedule microflows and can handle to be used in an HA architecture. All microflows will only be executed on a single server.
This module is passing all tests so far, but it takes some time before all special cases in dates and schedules have passed (for example I'm still waiting to see what happens during daylight savings time). If you are interested in using this module just send me an email (jasper.van.der.hoek [at] mendix.com)
In addition to your question on load balancing with deeplink or a webservice, from the load balancer it wouldn't matter if it is a user, deeplink or service that triggers the actions. Any action should be operated in the exact same way,
For examle if the same user triggers a different deeplink the loadbalancer and browser should recognize the cookies and make sure to redirect to the same instance as was used before.
In addition to Hans's comment, about HA usage the easiest scenario would be to break apart front-end and back-end actions. But I have worked at customers, who have done what you are asking. And I am currently working with a customer who wants to setup all their servers (even accp, and pre-prod) in an HA architecture.
David,
Regarding Scheduled Events: I can imagine setting up a heartbeat webs service message between the 2 mendix servers and a configuration entity stating which one should 'execute' on scheduled events.
If 1 server goes down unexpected the other one notices on the heartbeat and takes over scheduled events.
For maintenance the configuration entity should mark something.
This requires the microflows handling the scheduled event first checking the heartbeat/configuration before 'executing' the current work.
And off course this requires one of the servers to stay up during maintenance, assuming changes by the service console are read into memory on startup and can be altered without risk.