While investigating alternatives to migrate to Google Cloud SQL, I encountered a lack of support for external masters. However, it’s possible to overcome this limitation by replicating into Google Cloud SQL using Tungsten replicator.
Cloud SQL is Google’s database-as-a-service solution, similar to RDS for Amazon Web Services. You can get a fully managed database in only a few clicks (or API calls). At the time of writing this, the only supported databases are MySQL and Postgres.
Cloud SQL alternatives
Google offers two different options for MySQL deployments.
1st generation instances:
- Only MySQL versions 5.5 and 5.6 can provisioned
- Max memory is limited to 16 Gb
- Max of 250 Gb storage (up to 500 Gb with Silver or higher support package)
- MyISAM and InnoDB
- Asynchronous replication for read replicas
2nd generation instances:
- Only MySQL versions 5.6 and 5.7 can be deployed
- Maximum memory is limited to 205 Gb
- Maximum of 10 Tb storage
- InnoDB storage engine only
- Semi-sync replication only
- GTID replication only
- No support for external master
There are some limitations that are common to both flavors:
- no SUPER privilege
- no triggers
- no performance schema
- no replication between 1st and 2nd generation instances is possible
From the above, it is quite obvious most production deployments would want to use 2nd gen instances.
The problem is there is no migration path that doesn’t involve stopping application activity to be able to take a dump of the data, due to the fact that external masters are not supported on 2nd gen instances.
So how do we migrate our database to Google Cloud SQL while keeping downtime as low as possible?
The solution
The answer is to use Tungsten Replicator, so that replication is completely external to the database.
Note that since we only need the replicator, the FOSS version available on Github is enough for our purposes. There is no need to buy Tungsten commercial version, which includes the cluster functionality and official support.
We will need to install two different Tungsten processes: One will attach to the source database to read transactions from the binary logs, while the second will apply those transactions to the Cloud SQL instance.
This is what it looks like:
Preparing the environment
The first thing you will need is a place to install the replicator. I suggest to provision a dedicated instance (instance-1 in the diagram) in the same zone as your Cloud SQL instance.
At minimum you would want an n1-standard-1 size, as Tungsten does consume its fair share of memory.
Tungsten needs some packages (do check out the complete requirements list here) so let’s go ahead and install them:
apt-get install ruby default-jre
Now we need a database user for Tungsten on the source and target databases:
GRANT ALL ON *.* TO tungsten@'%' IDENTIFIED BY 'secret';
It is usually a good idea to have a dedicated OS user as well:
useradd -m -d /opt/continuent tungsten
Get the replicator package from GitHub and extract the contents as tungsten OS user:
wget http://ift.tt/2krtebg
tar zxf tungsten-replicator-5.2.1.tar.gz
Tungsten installation
Prepare the Tungsten config files on instance-1 as follows. instance-1 is the server where Tungsten will run from, and instance-2 is the server that has the source database.
Process that will read from MySQLvi /etc/tungsten/tungsten-mysqlreader.ini
[defaults] replication-user=tungsten replication-password=secret skip-validation-check=MySQLUnsupportedDataTypesCheck skip-validation-check=MySQLPermissionsCheck skip-validation-check=MySQLMyISAMCheck [mysqlreader] install-directory=/opt/continuent/mysqlreader master=instance-1 members=instance-1 datasource-host=instance-2 datasource-user=tungsten datasource-password=secret
Process that will write to Cloud SQLvi /etc/tungsten/tungsten-writetocloudsql.ini
[defaults] replication-user=tungsten replication-password=secret [writetocloudsql] datasource-type=mysql install-directory=/opt/continuent/writetocloudsql master=instance-1 members=instance-1 topology=master-slave datasource-host=cloudsql_ip_address datasource-user=tungsten datasource-password=secret privileged-slave=false skip-validation-check=InstallerMasterSlaveCheck skip-validation-check=MySQLPermissionsCheck skip-validation-check=MySQLBinaryLogsEnabledCheck rmi-port=10002 master-thl-port=2112 master-thl-host=instance-1 thl-port=2113
Note the use of privileged-slave=false and the various validation checks that need to be skipped for the applier process. That allows us to get past the SUPER requirement and the other Cloud SQL limitations.
Since we are running both extractor and applier processes on the same instance, we need to manually specify the ports on the second process so that there are no conflicts.
Now we are ready to install the replicators, by running the following as tungsten OS user:
cd tungsten-replicator-5.2.1/tools
tpm install
Replicating into Google Cloud SQL using Tungsten
At this point you would start the reader process using trepctl online command to start capturing events from the source instance, and use something like mysqldump to dump & load your dataset into a Google Cloud SQL instance. Make sure you have the binlog coordinates of that dump available.
Once the Cloud SQL instance is properly seeded, we would start the applier process from the appropriate position e.g.
/opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl online -from-event 'mysql-bin.000011:0000000000002552;0'
Checking the status of each process:
tungsten@instance-1:~$ /opt/continuent/mysqlreader/tungsten/tungsten-replicator/bin/trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : mysql-bin.000007:0000000000000520;112
appliedLastSeqno : 1
appliedLatency : 0.355
autoRecoveryEnabled : false
autoRecoveryTotal : 0
channels : 1
clusterName : mysqlreader
currentEventId : mysql-bin.000007:0000000000000520
currentTimeMillis : 1511354659857
dataServerHost : instance-2
extensions :
host : instance-2
latestEpochNumber : 0
masterConnectUri : thl://localhost:/
masterListenUri : thl://instance-1:2112/
maximumStoredSeqNo : 1
minimumStoredSeqNo : 0
offlineRequests : NONE
pendingError : NONE
pendingErrorCode : NONE
ndingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: NONE
pipelineSource : jdbc:mysql:thin://instance-2:3306/tungsten_mysqlreader?noPrepStmtCache=true
relativeLatency : 3.857
resourcePrecedence : 99
rmiPort : 10000
role : master
seqnoType : java.lang.Long
serviceName : mysqlreader
serviceType : local
simpleServiceName : mysqlreader
siteName : default
sourceId : instance-2
state : ONLINE
timeInStateSeconds : 72.806
timezone : GMT
transitioningTo :
uptimeSeconds : 74.46
useSSLConnection : false
version : Tungsten Replicator 5.2.1
Finished status command...
tungsten@instance-1:/etc/tungsten$ /opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : NONE
appliedLastSeqno : -1
appliedLatency : -1.0
autoRecoveryEnabled : false
autoRecoveryTotal : 0
channels : -1
clusterName : writetocloudsql
currentEventId : NONE
currentTimeMillis : 1510939133227
dataServerHost : cloudsql
extensions :
host : cloudsql
latestEpochNumber : -1
masterConnectUri : thl://localhost:/
masterListenUri : thl://instance-1:2113/
maximumStoredSeqNo : -1
minimumStoredSeqNo : -1
offlineRequests : NONE
pendingError : Replicator configuration failed
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: Unable to translate property value: key=serverId value = 3555962359
pipelineSource : UNKNOWN
relativeLatency : -1.0
resourcePrecedence : 99
rmiPort : 10002
role : master
seqnoType : java.lang.Long
serviceName : writetocloudsql
serviceType : unknown
simpleServiceName : writetocloudsql
siteName : default
sourceId : 35.184.133.21
state : OFFLINE:ERROR
timeInStateSeconds : 431.65
timezone : GMT
transitioningTo :
uptimeSeconds : 433.596
useSSLConnection : false
version : Tungsten Replicator 5.2.1
Finished status command...
I’ve discovered there is a bug with Tungsten 5.2, where high values of server-id parameter prevent replicator from working. CloudSQL sets very high server-ids by default, and this cannot be modified by a user.
I have already reported this to Continuent so hopefully they will come up with a way to fix this soon.
In the meantime the only way to get past this is to open a ticket with Google support, and have them modify the server-id on Cloud SQL instance for you.
Conclusion
Cloud SQL is a very interesting platform for those wanting a fully managed database solution.
Until 2nd gen instances have the ability to replicate from an external master, replicating into Google Cloud SQL using Tungsten after the initial load is one valid alternative. By doing so, you can keep data in sync until you are ready to do the cutover to the new platform.
via Planet MySQL
Replicating in Google Cloud SQL using Tungsten