Questions about Mendix and High Availability architecture

16
First, sorry for this being such a long post. I am about to try to set up a Mendix application in a high availability architecture and have a number of questions about how to do this, and the recommended procedures to update applications in this architecture. Say I have 2 Windows Mendix application servers called MBS01.company.com and MBS02.company.com and a load balancer called LB.company.com. Would I be correct in the following assumptions: I should install the Mendix Windows Service on each application server with identical copies of the deployment files (Application/model and Application/web). Each server should be configured to connect to the same database (located on a separate database server). Questions about configuration: Does the port need to be the same for each instance? Is the Admin Password the same for each instance? Should scheduled events be set to ALL on both instances or just on one server? For images and uploaded files to be shared between instances, they must be stored in a shared location accessible from each instance, and the file location set as a Custom Mendix Setting in the Console (UploadedFilesPath). Is this corrct? Application Root URL setting - see questions below about url rewriter Procedures to update application definitions: Could someone confirm that this would be the correct procedure: Stop the Mendix Service on both servers (oops - there goes the high availability). Update the deployment files on both servers On one server only, run the Service Console to synchronize any database changes, then stop Restart the Mendix Service on each server Questions about Load Balancer setup: I assume a keep-alive port needs to be configured - should this use the same port that the Mendix Servers are running on? Does the (F5) load balancer need to pass through the true end-client IP address. The default setting for F5 is that the destination application sees the client address as the back-end of the F5, although this behavior can be changed I assume we should configure sticky sessions - what timeout is recommended? Questions about URL rewriter: We would like to define a CNAME for the load balancer that will redirect to the application server group. So for example, the load balancer LB.company.com has a CNAME defined of helpdesk.company.com which redirects to MBS01.company.com:8085 and MBS02.company.com:8085. How does this CNAME interact with the Managed Fusion URL Rewriter that is used for a single server? The Application Root URL setting in the Service Console - should this be set to the CNAME of the load balancer, or do we need to configure the Rewriter for each server? What is the effect of using a load balancer on Deeplink URLs What is the effect of using a load balancer on published web services? Licensing Questions: I assume that the license should be placed on both application servers (the full number licensed), and that if one node goes down, all licenses will be available on the remaining nodes. Does the licensing model enforce the limit on the number of users logged in across the 2 servers? If Persistent sessions is set to true, does the Application Active Sessions form show active sessions on both nodes, or just on the node you are connected to? Is there any documentation about setting up and managing high availability installation? Thanks for any advice Edit: Does anyone have any answers to some of these questions? Is anyone using Mendix in a high-availability architecture?
asked
4 answers
26

Hi David,

Let me try to answer some of your questions.

First of all, it seems to me the primary purpose of your 'high availablility' setup is to allow stopping one of the Mendix processes without losing the ability to login and work in the application.

Questions about configuration:

  1. (port) No
  2. (password) No
  3. Scheduled event execution is configurable on both of the server instances, and both of them will not know about each other's settings. This way you can pindown the execution of specific events to a specific server instance. If you want to 'failover' execution of specific events to another instance, you will need to implement some additional functionality, like Bart suggests. However, doing this properly (meaning, by not introducing even more scenario's that could lead to downtime and datacorruption) requires you to walk all the way into this 'world of pain', dealing with quorum, split brain scenario's, fencing etc....
  4. (uploaded files) Yes, correct, preferably implemented using some kind of higher-available filesystem than a single shared disk.
  5. (root url) Same on each instance.

Procedures to update application definitions:

  1. If application model changes contain either form changes, domain model changes (which require database structure synchronization), naming changes on public available microflow-names (e.g. used on buttons in forms), you're best off updating all of the mendix server instances which are talking to the same backend database at the same time, yes.
  2. Sure.
  3. Yup.
  4. Indeed.

Questions about Load Balancer setup:

  1. (keep-alive port) Do you mean, like a health check?
  2. (ip address) You can either pass through the client IP address (also requiring you to use the load balancer as default-gateway?) or put the client IP address into the X-Forwarded-For header when proxying the HTTP request. IP addresses of end users are not used anywhere but to log them when a user logs in, or fails to login.
  3. (sticky sessions) What implementation of sticky sessions are you planning to use (e.g. inserting cookies into the HTTP responses?)? What's the purpose of the timeout you're refering to?

Questions about URL rewriter:

  1. I don't know Managed Fusion, but I guess a url rewriter or reverse proxy looks at the Host: header inside the HTTP request. So whatever chain of DNS settings you configure, no need for headaches here.
  2. (app root url) AFAIK, the sole purpose of the Application Root URL setting is inserting a correct self-referencing application url into the generated wsdl-documentation at the /ws-doc/ location. Probably you don't want to expose /ws-doc/ to the outside world, so no need to give this setting much attention.
  3. (deeplink) No effect, I guess.
  4. (web services) The web service-microflow will be executed on the server instance you redirect it to, just like normal json client-api requests on /xas/.

Licensing Questions:

  1. (I don't know license details, perhaps someone from Mendix can correct me if I'm wrong) When using non-persistent sessions, both of your server processes won't know about each other, and both of them will accept the full amount of concurrent users. Because the entire purpose of your high availability setup is to overcome an outage of one of the server processes, you will always want to make sure the total amount of logged in users in your cluster does not exceed the max concurrent users your license allows. ;-)
  2. When using persistent login sessions, both of the the servers will know about the total amount of logged in users, because they can retrieve this information from the database. This also automatically prevents disappointing end-users who cannot login anymore when you lose half of your redundancy. ;-)

Using the Persistent Sessions option allows you to failover an end-user to another server instance without having to login again. However (speaking of the current 2.5.3 release):

  • Not all user state is stored into the backend when using persistent sessions. When you start editing an object in a form, switch your session to another ser ver instance and click the save button, you will lose all changes you made! (same holds for a 'normal' restart-situtation)
  • Stopping a server instance will terminate running microflows.
  • 'Disallow concurrent execution' status on microflows is NOT shared between server instances.(!!) So you'd better not rely on using this feature to e.g. generate unique id's yourself when allowing more server instances to connect to the same backend database.
  • The webclient won't notice altered forms and domain model object names after you switch application model without a full clear-cache-reload. This can lead to all sorts of unreproducible 'bugs' at the end user. Being forced to re-logon after a server restart (and/or application model change) does fix some of this problems. User roles, navigation, domain model metadata and (not in IE) forms are more likely to be refreshed client-side after a new login.

Considering all of this, it would be possible to create a 'higher-available' setup, using multiple mendix server processes, when you allow sessions to complete on one instance, while redirecting new login sessions to a new instance. When all users are gone, you can take down the first one and do maintenance (well, in most of the cases, as long as it's not maintenance on your mendix application model...).

Or, of course, next sunday at 2 AM, your database server, shared uploaded files storage or load balancer will break. :] Or, due to an application model or server bug, a user action will crash the first instance, and by hitting reload because the user does not get response, the second one will also break. :D

To answer your question "Is anyone using Mendix in a high-availability architecture?": There are a couple of multi-server-instance setups that are being used in production, but the ones I know of are using quite another approach to bump the availability levels for end users, by splitting workloads over several independent instances. A common setup is to use one instance to handle all end-user interaction, and use one or more others to execute batch-processing or scheduled events, or even process jobs that are put into a queue by the end-user instance. This setup prevents end-user inconvenience when e.g. there's risk a batch job or scheduled event could result in a JVM Heap Space Out of Memory. When carefully set up, this can result in a lot less downtime for end users, because as long as the domain model structure remains intact, you can alter the application model of the 'backend' servers and restart them.

answered
5

David, based on information from the Mendix cloud ops team I'm also working on a documentation page to explain requirements to an HA architecture in more detail, this is page is still in draft. We will certainly include your questions and answers on here with the next updates. (this page)

Using scheduled events in a HA architecture is tricky, and has some challenges. I have been working on a module that allows you to schedule microflows and can handle to be used in an HA architecture. All microflows will only be executed on a single server.

This module is passing all tests so far, but it takes some time before all special cases in dates and schedules have passed (for example I'm still waiting to see what happens during daylight savings time). If you are interested in using this module just send me an email (jasper.van.der.hoek [at] mendix.com)

In addition to your question on load balancing with deeplink or a webservice, from the load balancer it wouldn't matter if it is a user, deeplink or service that triggers the actions. Any action should be operated in the exact same way,
For examle if the same user triggers a different deeplink the loadbalancer and browser should recognize the cookies and make sure to redirect to the same instance as was used before.

In addition to Hans's comment, about HA usage the easiest scenario would be to break apart front-end and back-end actions. But I have worked at customers, who have done what you are asking. And I am currently working with a customer who wants to setup all their servers (even accp, and pre-prod) in an HA architecture.

answered
3

David,

Regarding Scheduled Events: I can imagine setting up a heartbeat webs service message between the 2 mendix servers and a configuration entity stating which one should 'execute' on scheduled events.

If 1 server goes down unexpected the other one notices on the heartbeat and takes over scheduled events.

For maintenance the configuration entity should mark something.

This requires the microflows handling the scheduled event first checking the heartbeat/configuration before 'executing' the current work.

And off course this requires one of the servers to stay up during maintenance, assuming changes by the service console are read into memory on startup and can be altered without risk.

answered
0

Thanks for sharing such a query , really helpful for others also.

answered