Build a load-balanced multi-master MySQL cluster

This howto will show you how to build a basic multi-master MySQL cluster (actually, a Percona XtraDB cluster) with a load balancer in front of it.

Software:

  • OS: Ubuntu 12.04 LTS (precise pangolin).
  • Database: Percona XtraDB Cluster. Percona XtraDB is a higher-performance, backwards-compatible drop-in replacement version of the InnoDB storage engine for MySQL. The Percona XtraDB Cluster is the Percona Server setup with the Galera multi-master cluster libraries built in.
  • Load balancer: HAproxy. HAProxy is a very high performance TCP/HTTP load balancer.

We will be building three database nodes, connected to one load balancer. Something like this:
Cluster Overview

The steps:

  1. Setup the machines
  2. Install Percona XtraDB Cluster on the database machines
  3. Configure the bootstrap node
  4. Configure the rest of the cluster
  5. Test the cluster
  6. Daemonise the clustercheck script on the database machines with xinetd
  7. Set the HAProxy load balancer up
  8. Win

Create your servers

I used DigitalOcean for this tutorial, but use whomever you like.

createdroplet

Install Percona XtraDB Cluster on the database servers

Log in to each of your database servers and set up Percona XtraDB Cluster.

First, add the key to apt:

$ apt-key adv --keyserver keys.gnupg.net \
 --recv-keys 1C4CBDCDCD2EFD2A

 

(Create and) Edit the /etc/apt/sources.list.d/percona.list file, adding the following:

 deb http://repo.percona.com/apt precise main
 deb-src http://repo.percona.com/apt precise main

 

Update apt:

$ apt-get update

 

Install Percona XtraDB Cluster Server:

$ apt-get install percona-xtradb-cluster-server-5.5

as well as the client:

$ apt-get install percona-xtradb-cluster-client-5.5

It will ask for a MySQL root password during each install. I kept them the same over the nodes.

Configure the bootstrap node

Choose a machine that will create the initial cluster. Forming an initial cluster is called bootstrapping the cluster.

Stop MySQL on that machine:

$ service mysql stop

(Create &) Edit /etc/mysql/my.cnf on the bootstrap node.

Add the following:

[mysqld]

datadir=/var/lib/mysql
user=mysql

# Path to Galera library
wsrep_provider=/usr/lib/libgalera_smm.so

# Empty gcomm address is being used when cluster is getting bootstrapped
wsrep_cluster_address=gcomm://

# Cluster connection URL contains the IPs of node#1, node#2 and node#3
#wsrep_cluster_address=gcomm://192.168.10.11,192.168.10.12,192.168.10.13

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# This is a recommended tuning variable for performance
innodb_locks_unsafe_for_binlog=1

# This changes how InnoDB autoincrement locks are
# managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node #1 address
wsrep_node_address=192.168.10.11

# SST method
wsrep_sst_method=xtrabackup

# Cluster name
wsrep_cluster_name=my_ubuntu_cluster

# Authentication for SST method
wsrep_sst_auth="sstuser:s3cretPass"

wsrep stands for WriteSet REPlication, BTW. wsrep is a project that aims to develop a generic database replication plugin interface, and Galera is a specific wsrep provider.

Notice line 10, where the wsrep_cluster_address (gcomm://) address is empty. This will tell this node to bootstrap itself (create a new cluster) on startup. We will comment line 10 and uncomment line 13 later when the rest of the cluster is ready. You’ll notice that line 13 specifies all the other node addresses in the cluster – it is stricly speaking only necessary to specify one other node, but listing all of them is noted as a best practice in the Percona documentation.

Some notes about some of the settings
wsrep_cluster_name is the unique name of your cluster. It should be identical on all nodes that form part of the same cluster.

wsrep_sst_method specifies the way a State Snapshot Transfer (SST) is taken. Cluster nodes are kept in sync through SSTs and ISTs (Incremental State Transfer). An SST is a full data copy from a donor node to a joining node, and an IST a way for a joining node to catch up through a transfer of partial state snapshots (writesets) from the donor node’s writeset cache.

The following SST options exist:

  • xtrabackup – uses Percona XtraBackup to take an SST. It’s useful because it doesn’t lock the donor database during an SST. We need to set up an SST user for xtrabackup, hence the values in wsrep_sst_auth.
  • rsync – uses rsync to transfer the relevant info. Donor node is locked as READ-ONLY during transfers.
  • mysqldump – uses mysqldump to take an SST. Also READ-ONLY locks the database during transfers. Needs a wsrep_sst_auth user, and that user needs MySQL root privileges.

You can also use a custom script to take an SST, but for our purposes we will be using xtrabackup.

Save the my.cnf file.

You can now start the MySQL service on the bootstrap node:

$ service mysql start

If all went well it should start up without a hitch.

We now need to add the SST user as identified in the wsrep_sst_auth directive.
Log into mysql on the bootstrap node:

$ mysql -uroot -p

 

Let’s check the node status first:

mysql> show status like 'wsrep%';
 +-----------+-----------------------------------------------------------+
 | Variable_name              | Value                                    |
 +-----------+-----------------------------------------------------------+
 | wsrep_local_state_uuid     | 844b3ad2-29ca-11e3-8586-e72851d735f1     |
  ...
 | wsrep_local_state          | 4                                        |
 | wsrep_local_state_comment  | Synced                                   |
  ...
 | wsrep_cluster_size         | 1                                        |
 | wsrep_cluster_status       | Primary                                  |
 | wsrep_connected            | ON                                       |
  ...
 | wsrep_ready                | ON                                       |
 +----------------------------+------------------------------------------+
 40 rows in set (0.01 sec)

Looking good – this output shows a successfully bootstrapped node.
Add the wsrep_sst_auth user:

mysql> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 's3cretPass';

mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT
       ON *.* TO 'sstuser'@'localhost';

mysql> FLUSH PRIVILEGES;

You can use the MySQL root user for SSTs, but it’s not recommended.

Time to configure the other nodes and get them to join the cluster.

Configure the rest of the cluster

Log into the other database cluster nodes, stop their MySQL services, and edit their /etc/mysql/my.cnf files:

[mysqld]

datadir=/var/lib/mysql
user=mysql

# Path to Galera library
wsrep_provider=/usr/lib/libgalera_smm.so

# Cluster connection URL contains IPs of node#1, node#2 and node#3
wsrep_cluster_address=gcomm://192.168.10.11,192.168.10.12,192.168.10.13

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# This is a recommended tuning variable for performance
innodb_locks_unsafe_for_binlog=1

# This changes how InnoDB autoincrement locks are
# managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node #2 address
wsrep_node_address=192.168.10.12

# Cluster name
wsrep_cluster_name=my_ubuntu_cluster

# SST method
wsrep_sst_method=xtrabackup

#Authentication for SST method
wsrep_sst_auth="sstuser:s3cretPass"

This cnf file differs from the bootstrap node’s cnf file in that line 10 (wsrep_cluster_address) specifies the cluster addresses. Line 26 has to be set to the node’s own IP address.

Repeat for the last node, changing line 25 and 26 to reflect the correct address.

Start MySQL:

$ service mysql start

 

You should see something like:

 * Starting MySQL (Percona XtraDB Cluster) database server mysqld
 * SST in progress, setting sleep higher mysqld
 * Checking for corrupt, not cleanly closed, and upgrade needing tables.

That SST in progress line tells you that it connected to the server and is receiving an SST.

Once all the nodes are connected you can stop the bootstrap node’s MySQL service, comment line 10 in it’s my.cnf, and uncomment line 13 in it. Restart the MySQL service on it after saving the cnf file. Not fixing the wsrep_cluster_address directive on the bootstrap node will make it re-bootstrap itself on restart.

Test the cluster

All three nodes should now be running happily connected to each other. Let’s test some things.

Look at the cluster size:

mysql> show status like 'wsrep%';
 +-----------+-----------------------------------------------------------+
 | Variable_name              | Value                                    |
 +-----------+-----------------------------------------------------------+
 | wsrep_local_state_uuid     | 844b3ad2-29ca-11e3-8586-e72851d735f1     |
  ...
 | wsrep_local_state          | 4                                        |
 | wsrep_local_state_comment  | Synced                                   |
  ...
 | wsrep_cluster_size         | 3                                        |
 | wsrep_cluster_status       | Primary                                  |
 | wsrep_connected            | ON                                       |
  ...
 | wsrep_ready                | ON                                       |
 +----------------------------+------------------------------------------+
 40 rows in set (0.01 sec)

Nice.

Try creating a database on one node, and running SHOW DATABASES; on another node. Should feel pretty cool to see the live replication 😉

Percona installs a script called clustercheck in /usr/bin/clustercheck. It’s a simple script that checks a node’s MySQL wsrep_local_state variable value and outputs HTTP 200 if the node is in sync, and HTTP 503 if it isn’t (or if it couldn’t connect).

It needs a MySQL user called clustercheckuser with password clustercheckpassword!. You can change this user, BTW – look for the following lines in /usr/bin/clustercheck:

MYSQL_USERNAME="${1-clustercheckuser}"
MYSQL_PASSWORD="${2-clustercheckpassword!}"

Just change the default values there if you want to change the clustercheck user.

Add the clustercheck user to MySQL:

mysql> grant process on *.* to 'clustercheckuser'@'localhost'
       identified by 'clustercheckpassword!';

mysql> flush privileges;

You don’t have to repeat user addition on the other nodes as it will propagate through the cluster by itself. Just remember to edit the clustercheck scripts on the other nodes if you are not using the default username and password.

Now run clustercheck:

$ clustercheck
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 40

Percona XtraDB Cluster Node is synced.

All our tests check out.

Daemonise the clustercheck script

We are going to set up the load balancer soon. But we need to find a way to make the load balancer aware that a node has gone out of sync. That’s why the clustercheck script exists.

To expose the clustercheck script we will be using xinetd. This is a daemon that listens on a port you specify, running a specific command once you connect to that port, and returning the output of the command to you.

Percona, nice people that they are, installs an xinetd config file for clustercheck as part of the server install. It should be located at /etc/xinetd.d/mysqlchk, and the contents probably look something like:

service mysqlchk
{
# this is a config for xinetd, place it in /etc/xinetd.d/
        disable = no
        flags           = REUSE
        socket_type     = stream
        port            = 9200
        wait            = no
        user            = nobody
        server          = /usr/bin/clustercheck
        log_on_failure  += USERID
        only_from       = 0.0.0.0/0
        # recommended to put the IPs that need
        # to connect exclusively (security purposes)
        per_source      = UNLIMITED
}

As you can see, it specifies that xinetd should listen on port 9200, and return the output of /usr/bin/clustercheck if you connect to port 9200.

Edit /etc/services to allow xinetd to listen for this, I added the following line right at the end:

# Local services
mysqlchk        9200/tcp          # mysqlchk

Install xinetd if it hasn’t been installed already:

$ apt-get install xinetd

If it has been installed already, just restart xinetd:

$ service xinetd restart

You should see it listening on port 9200 now:

$ netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address  ...   State     PID/Program name
tcp        0      0 0.0.0.0:3306   ...   LISTEN    2877/mysqld
tcp        0      0 0.0.0.0:9200   ...   LISTEN    3545/xinetd
tcp        0      0 0.0.0.0:22     ...   LISTEN    786/sshd
tcp        0      0 0.0.0.0:4567   ...   LISTEN    2877/mysqld
tcp6       0      0 :::22          ...   LISTEN    786/sshd

If you telnet to the address you should see the clustercheck output:

$ telnet 127.0.0.1 9200
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 40

Percona XtraDB Cluster Node is synced.
Connection closed by foreign host.

Lather, rinse, repeat for the other database nodes.

Set the load balancer up

Log into your server that you want to use for load balancing, 192.168.10.10 in this example.

Install HAProxy

$ apt-get install haproxy

Now, edit /etc/haproxy/haproxy.cfg:

global
  log 127.0.0.1   local0
  log 127.0.0.1   local1 notice
  maxconn 4096
  uid 99
  gid 99
  #daemon
  debug
  #quiet

defaults
  log     global
  mode    http
  option  tcplog
  option  dontlognull
  retries 3
  redispatch
  maxconn 2000
  contimeout      5000
  clitimeout      50000
  srvtimeout      50000

listen mysql-cluster 0.0.0.0:3306
  mode tcp
  balance roundrobin
  option  tcpka
  option  httpchk

  server db01 192.168.10.11:3306 check port 9200 inter 12000 rise 3 fall 3
  server db02 192.168.10.12:3306 check port 9200 inter 12000 rise 3 fall 3
  server db03 192.168.10.13:3306 check port 9200 inter 12000 rise 3 fall 3

The interesting detail to look at are lines 28-30, where HAProxy’s httpchk functionality is used to check for an HTTP 200 status on port 9200 of the cluster nodes – thus running the clustercheck script via xinetd, and taking a node out of circulation while it is not synced. It’s a better way to manage nodes than just checking if port 3306 is open.

The HAProxy service won’t start by default – as it has no sane default configuration – but now that we have provided a sane configuration we can set it to start. Edit /etc/default/haproxy, setting ENABLED to 1:

# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
#EXTRAOPTS="-de -m 16"

 

Start HAProxy with

service haproxy start

You will probably get a warning about <debug> mode incompatible with <quiet> and <daemon>. Keeping <debug> only. You can remove this by editing /etc/haproxy/haproxy.cfg and removing the debug option and enabling the quiet and daemon options (lines 7 – 9). I found it useful to mess around in debug mode at first, though.

You can check that it’s running by either looking for the process:

$ ps aux | grep -i haproxy
... /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid

or looking for the process listening on port 3306:

$ netstat -ntlp | grep 3306
tcp 0  0 0.0.0.0:3306 ... LISTEN  2783/haproxy

 

Unexpected results – Host Blocked Errors
I had a problem the first time I set this up. I got Error Code: 1129 Host ‘192.168.10.10’ is blocked because of many connection errors; unblock with ‘mysqladmin flush-hosts’ after a while when connected to my cluster. I’m still not sure why it happened, or that my actions cured it; I do know it hasn’t happened again after adding the option tcpka (line 26: enabling TCP Keep Alive) to the haproxy.cfg file. Not understanding it feels a bit like magic. Hopefully someone more clued up can weigh in with some ideas. The MySQL manual notes that it happens after a certain amount of interrupted connection requests, hence my trying to keep the connections alive. Since the setting does nothing much for security it may make sense to set the max_connect_errors value in your MySQL configuration to some insanely high value.

Now that HAProxy is running (and you are hopefully not dealing with unexpected results) you should be able to connect through MySQL to your load balancer, and it should be round-robining the requests to your cluster nodes.

Congrats! Have fun.


PS: Possible Tweaks

Security can be improved a bit – I would suggest at least locking down xinetd to only allow connections from your load balancer (look at line 12 in /etc/xinetd.d/mysqlchk – the directive takes += as operator so you can list multiple IPs below each other).

You can also close ports on the cluster nodes to only allow MySQL and SSH access. The MySQL cluster needs 3306, 4444, 4567, and 4568 open, and SSH is typically running on port 22.

Using Uncomplicated Firewall:

$ ufw allow 22
$ ufw allow 3306
$ ufw allow 4444
...etc

$ ufw enable

 

You can fiddle with iptables or similar if ufw isn’t complex enough, something like:

$ iptables -A INPUT -i eth0 -p tcp -m tcp \
  --source 192.168.10.11/24 --dport 3306 -j ACCEPT

should do to open specific ports.

 

HAProxy also has quite a pretty & colourful useful stats module that you can enable by adding something like this:

listen stats :8080
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /
    stats auth Username:Password

to the /etc/haproxy/haproxy.cfg file. Restart HAProxy, and browse port 8080 on your load balancer’s address. Remember to change Username and Password to values that you like for HTTP authentication.

HAProxy Stats - very colourful
HAProxy Stats – very colourful

 

Round-robining between the cluster nodes is an OK, but not great, strategy for the load balancer. If two nodes receive conflicting writes rollbacks can occur. It would be better to only write to a specific node at a time, failing to another node only if the active node goes offline.

You can achieve this by changing the haproxy.cfg file, lines 23-31:

listen mysql-cluster 0.0.0.0:3306
  mode tcp
  balance leastconn
  option  tcpka
  option  httpchk

  server db01 192.168.10.11:3306 check port 9200 inter 12000 rise 3 fall 3
  server db02 192.168.10.12:3306 check port 9200 inter 12000 rise 3 fall 3 backup
  server db03 192.168.10.13:3306 check port 9200 inter 12000 rise 3 fall 3 backup

 

PPS: Further reading

Here are some of the articles I used to write this howto: