Replicating in Google Cloud SQL using Tungsten

While investigating alternatives to migrate to Google Cloud SQL, I encountered a lack of support for external masters. However, it’s possible to overcome this limitation by replicating into Google Cloud SQL using Tungsten replicator.

Cloud SQL is Google’s database-as-a-service solution, similar to RDS for Amazon Web Services. You can get a fully managed database in only a few clicks (or API calls). At the time of writing this, the only supported databases are MySQL and Postgres.

Cloud SQL alternatives

Google offers two different options for MySQL deployments.

1st generation instances:

  • Only MySQL versions 5.5 and 5.6 can provisioned
  • Max memory is limited to 16 Gb
  • Max of 250 Gb storage (up to 500 Gb with Silver or higher support package)
  • MyISAM and InnoDB
  • Asynchronous replication for read replicas

2nd generation instances:

  • Only MySQL versions 5.6 and 5.7 can be deployed
  • Maximum memory is limited to 205 Gb
  • Maximum of 10 Tb storage
  • InnoDB storage engine only
  • Semi-sync replication only
  • GTID replication only
  • No support for external master

There are some limitations that are common to both flavors:

  • no SUPER privilege
  • no triggers
  • no performance schema
  • no replication between 1st and 2nd generation instances is possible

From the above, it is quite obvious most production deployments would want to use 2nd gen instances.

The problem is there is no migration path that doesn’t involve stopping application activity to be able to take a dump of the data, due to the fact that external masters are not supported on 2nd gen instances.

So how do we migrate our database to Google Cloud SQL while keeping downtime as low as possible?

The solution

The answer is to use Tungsten Replicator, so that replication is completely external to the database.

Note that since we only need the replicator, the FOSS version available on Github is enough for our purposes. There is no need to buy Tungsten commercial version, which includes the cluster functionality and official support.

We will need to install two different Tungsten processes: One will attach to the source database to read transactions from the binary logs, while the second will apply those transactions to the Cloud SQL instance.

This is what it looks like:

Preparing the environment

The first thing you will need is a place to install the replicator. I suggest to provision a dedicated instance (instance-1 in the diagram) in the same zone as your Cloud SQL instance.

At minimum you would want an n1-standard-1 size, as Tungsten does consume its fair share of memory.

Tungsten needs some packages (do check out the complete requirements list here) so let’s go ahead and install them:

apt-get install ruby default-jre

Now we need a database user for Tungsten on the source and target databases:

GRANT ALL ON *.* TO tungsten@'%' IDENTIFIED BY 'secret';

It is usually a good idea to have a dedicated OS user as well:

useradd -m -d /opt/continuent tungsten

Get the replicator package from GitHub and extract the contents as tungsten OS user:

wget http://ift.tt/2krtebg
tar zxf tungsten-replicator-5.2.1.tar.gz

Tungsten installation

Prepare the Tungsten config files on instance-1 as follows. instance-1 is the server where Tungsten will run from, and instance-2 is the server that has the source database.

Process that will read from MySQL
vi /etc/tungsten/tungsten-mysqlreader.ini

[defaults]
replication-user=tungsten
replication-password=secret
skip-validation-check=MySQLUnsupportedDataTypesCheck
skip-validation-check=MySQLPermissionsCheck
skip-validation-check=MySQLMyISAMCheck

[mysqlreader]
install-directory=/opt/continuent/mysqlreader
master=instance-1
members=instance-1
datasource-host=instance-2
datasource-user=tungsten
datasource-password=secret

Process that will write to Cloud SQL
vi /etc/tungsten/tungsten-writetocloudsql.ini

[defaults]
replication-user=tungsten
replication-password=secret

[writetocloudsql]
datasource-type=mysql
install-directory=/opt/continuent/writetocloudsql
master=instance-1
members=instance-1
topology=master-slave
datasource-host=cloudsql_ip_address
datasource-user=tungsten
datasource-password=secret
privileged-slave=false
skip-validation-check=InstallerMasterSlaveCheck
skip-validation-check=MySQLPermissionsCheck
skip-validation-check=MySQLBinaryLogsEnabledCheck
rmi-port=10002
master-thl-port=2112
master-thl-host=instance-1
thl-port=2113

Note the use of privileged-slave=false and the various validation checks that need to be skipped for the applier process. That allows us to get past the SUPER requirement and the other Cloud SQL limitations.

Since we are running both extractor and applier processes on the same instance, we need to manually specify the ports on the second process so that there are no conflicts.

Now we are ready to install the replicators, by running the following as tungsten OS user:

cd tungsten-replicator-5.2.1/tools
tpm install

Replicating into Google Cloud SQL using Tungsten

At this point you would start the reader process using trepctl online command to start capturing events from the source instance, and use something like mysqldump to dump & load your dataset into a Google Cloud SQL instance. Make sure you have the binlog coordinates of that dump available.

Once the Cloud SQL instance is properly seeded, we would start the applier process from the appropriate position e.g.

/opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl online -from-event 'mysql-bin.000011:0000000000002552;0'

Checking the status of each process:

tungsten@instance-1:~$ /opt/continuent/mysqlreader/tungsten/tungsten-replicator/bin/trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : mysql-bin.000007:0000000000000520;112
appliedLastSeqno : 1
appliedLatency : 0.355
autoRecoveryEnabled : false
autoRecoveryTotal : 0
channels : 1
clusterName : mysqlreader
currentEventId : mysql-bin.000007:0000000000000520
currentTimeMillis : 1511354659857
dataServerHost : instance-2
extensions :
host : instance-2
latestEpochNumber : 0
masterConnectUri : thl://localhost:/
masterListenUri : thl://instance-1:2112/
maximumStoredSeqNo : 1
minimumStoredSeqNo : 0
offlineRequests : NONE
pendingError : NONE
pendingErrorCode : NONE
ndingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: NONE
pipelineSource : jdbc:mysql:thin://instance-2:3306/tungsten_mysqlreader?noPrepStmtCache=true
relativeLatency : 3.857
resourcePrecedence : 99
rmiPort : 10000
role : master
seqnoType : java.lang.Long
serviceName : mysqlreader
serviceType : local
simpleServiceName : mysqlreader
siteName : default
sourceId : instance-2
state : ONLINE
timeInStateSeconds : 72.806
timezone : GMT
transitioningTo :
uptimeSeconds : 74.46
useSSLConnection : false
version : Tungsten Replicator 5.2.1
Finished status command...


tungsten@instance-1:/etc/tungsten$ /opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : NONE
appliedLastSeqno : -1
appliedLatency : -1.0
autoRecoveryEnabled : false
autoRecoveryTotal : 0
channels : -1
clusterName : writetocloudsql
currentEventId : NONE
currentTimeMillis : 1510939133227
dataServerHost : cloudsql
extensions :
host : cloudsql
latestEpochNumber : -1
masterConnectUri : thl://localhost:/
masterListenUri : thl://instance-1:2113/
maximumStoredSeqNo : -1
minimumStoredSeqNo : -1
offlineRequests : NONE
pendingError : Replicator configuration failed
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: Unable to translate property value: key=serverId value = 3555962359
pipelineSource : UNKNOWN
relativeLatency : -1.0
resourcePrecedence : 99
rmiPort : 10002
role : master
seqnoType : java.lang.Long
serviceName : writetocloudsql
serviceType : unknown
simpleServiceName : writetocloudsql
siteName : default
sourceId : 35.184.133.21
state : OFFLINE:ERROR
timeInStateSeconds : 431.65
timezone : GMT
transitioningTo :
uptimeSeconds : 433.596
useSSLConnection : false
version : Tungsten Replicator 5.2.1
Finished status command...

I’ve discovered there is a bug with Tungsten 5.2, where high values of server-id parameter prevent replicator from working. CloudSQL sets very high server-ids by default, and this cannot be modified by a user.

I have already reported this to Continuent so hopefully they will come up with a way to fix this soon.

In the meantime the only way to get past this is to open a ticket with Google support, and have them modify the server-id on Cloud SQL instance for you.

Conclusion

Cloud SQL is a very interesting platform for those wanting a fully managed database solution.

Until 2nd gen instances have the ability to replicate from an external master, replicating into Google Cloud SQL using Tungsten after the initial load is one valid alternative. By doing so, you can keep data in sync until you are ready to do the cutover to the new platform.

via Planet MySQL
Replicating in Google Cloud SQL using Tungsten