MySQL闪回原理与实战

2018-11-09 17:31:27 Mysql 2709 0

阅读前须知

对于truncatedrop删除的数据,不适用
但是你可以通过得到binlog2sql转化来的sql,全量跑一次到事发前的数据,以此来恢复

闪回原理

MySQL binlogevent的形式,记录了MySQL server从启用binlog以来所有的变更信息,能够帮助重现这之间的所有变化。
MySQL引入 binlog 主要有两个目的:

  • 为了主从复制
  • 某些备份还原操作后需要重新应用binlog

有三种可选的binlog格式,各有优缺点:

  • statement:基于SQL语句的模式,binlog数据量小,但是某些语句和函数在复制过程可能导致数据不一致甚至出错;
  • row:基于行的模式,记录的是行的完整变化。很安全,但是binlog会比其他两种模式大很多;
  • mixed:混合模式,根据语句来选用是statement还是row模式;

利用 binlog 闪回,需要将 binlog 格式设置为row,如果你直接用

mysqlbinlog  binlog日志文件

生成的则是对人不太友好的数据,如下

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#181105  3:09:25 server id 279528724  end_log_pos 120 CRC32 0x8aaa7b41  Start: binlog v 4, server v 5.6.16-log created 181105  3:09:25
# at 120
#181105  3:09:25 server id 279528724  end_log_pos 231 CRC32 0x097dd8dd  Previous-GTIDs
# db84042c-3686-11e8-8116-ecf4bbe91278:1-594666826,
# df57d3b5-2cc6-11e8-8182-ecf4bbe621e0:1-124110
# at 231
#181105  3:09:26 server id 279528724  end_log_pos 279 CRC32 0xfcde49e1  GTID    last_committed=0        sequence_number=0       rbr_only=no
SET @@SESSION.GTID_NEXT= 'db84042c-3686-11e8-8116-ecf4bbe91278:594666827'/*!*/;
# at 279
#181105  3:09:26 server id 279528724  end_log_pos 367 CRC32 0xea5c6617  Query   thread_id=50333784      exec_time=0     error_code=0
SET TIMESTAMP=1541358566/*!*/;
SET @@session.pseudo_thread_id=50333784/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN

那么,我们如何把拿到的 binlog 变成 SQL 呢

基础配置

Mysql配置

首先,咱们得配置 my.cnf

打开 binlog 功能

[mysqld]

symbolic-links=0

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

# binlog 配置
log-bin=mysql-bin # binlog 前缀名
server-id=1
max_binlog_size = 1G # binlog 最大容量
binlog_format = row # binlog 存储的格式
binlog_row_image = full

# 如果是本地解析 binlog 请把这行也加上,不然解析过程会报错
gtid_mode=ON_PERMISSIVE

一般来说,我们在东窗事发的时候,是能够知道事情发生的大致时间
然后找到 binlog 存放目录,把那部分的 binlog 日志文件拿下来

mysql> show variables like '%log_bin_basename%';
+------------------+--------------------------+
| Variable_name    | Value                    |
+------------------+--------------------------+
| log_bin_basename | /var/lib/mysql/mysql-bin |
+------------------+--------------------------+
1 row in set (0.08 sec)

先把binlog放到本地(反正别拿到生产环境去玩就是了) Mysql(得配置my.cnf刚刚我给出的最后一行,并且重启)的对应 binlog 日志存储位置 比如,刚刚我拿了 mysql-bin.001281、mysql-bin.001282、mysql-bin.001283
得修改这个目录下的 `mysql-bin.index` 文件
把这三个 binlog 文件引进去

恢复工具

项目地址

binlog2sql

环境依赖

Python 2.7, 3.4+
MySQL 5.6, 5.7

工具安装

git clone https://github.com/danfengcao/binlog2sql.git && cd binlog2sql
pip install -r requirements.txt

恢复流程

因为 binlog 会把每次操作前后的单行数据都存储下来,所以可以很方便的回滚

# 通过工具,可以拿到事发时,执行前的语句
UPDATE `panda_avatar`.`user_credits` SET `created_at`='2018-08-13 23:29:19', `deleted_at`=NULL, `user_id`=1324086, `audit_amount`=1500.00, `obj_base`=3000.00, `audit_limit`=3110.00, `obj_temp`=0.00, `audit_base`=3000.00, `obj_raise`=0.00, `updated_at`='2018-11-05 09:09:27', `obj_limit`=3000.00, `audit_temp`=0.00, `audit_raise`=110.00, `obj_amount`=0.00, `id`=281967 WHERE `created_at`='2018-08-13 23:29:19' AND `deleted_at` IS NULL AND `user_id`=1324086 AND `audit_amount`=1500.00 AND `obj_base`=3000.00 AND `audit_limit`=3110.00 AND `obj_temp`=0.00 AND `audit_base`=3000.00 AND `obj_raise`=0.00 AND `updated_at`='2018-11-05 09:09:24' AND `obj_limit`=3000.00 AND `audit_temp`=0.00 AND `audit_raise`=110.00 AND `obj_amount`=0.00 AND `id`=281967 LIMIT 1; #start 135709993 end 135710370 time 2018-11-05 

# 通过工具,可以拿到用于回滚的语句
UPDATE `panda_avatar`.`user_credits` SET `created_at`='2018-08-13 23:29:19', `deleted_at`=NULL, `user_id`=1324086, `audit_amount`=1500.00, `obj_base`=3000.00, `audit_limit`=3110.00, `obj_temp`=0.00, `audit_base`=3000.00, `obj_raise`=0.00, `updated_at`='2018-11-05 09:09:24', `obj_limit`=3000.00, `audit_temp`=0.00, `audit_raise`=110.00, `obj_amount`=0.00, `id`=281967 WHERE `created_at`='2018-08-13 23:29:19' AND `deleted_at` IS NULL AND `user_id`=1324086 AND `audit_amount`=1500.00 AND `obj_base`=3000.00 AND `audit_limit`=3110.00 AND `obj_temp`=0.00 AND `audit_base`=3000.00 AND `obj_raise`=0.00 AND `updated_at`='2018-11-05 09:09:27' AND `obj_limit`=3000.00 AND `audit_temp`=0.00 AND `audit_raise`=110.00 AND `obj_amount`=0.00 AND `id`=281967 LIMIT 1; #start 135709993 end 135710370 time 2018-11-05 
具体操作流程,由工具作者阐述
背景

小明在11:44时误删了test库user表大批的数据,需要紧急回滚。
test库user表原有数据

mysql> select * from user;
+----+--------+---------------------+
| id | name   | addtime             |
+----+--------+---------------------+
|  1 | 小赵   | 2013-11-11 00:04:33 |
|  2 | 小钱   | 2014-11-11 00:04:48 |
|  3 | 小孙   | 2016-11-11 20:25:00 |
|  4 | 小李   | 2013-11-11 00:00:00 |
.........
+----+--------+---------------------+
16384 rows in set (0.04 sec)

11:44时,user表大批数据被误删除。与此同时,正常业务数据是在继续写入的
mysql> delete from user where addtime>'2014-01-01';
Query OK, 16128 rows affected (0.18 sec)

mysql> select count(*) from user;
+----------+
| count(*) |
+----------+
|      261 |
+----------+
恢复数据步骤

登录mysql,查看目前的binlog文件

mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000053 | 168652863 |
| mysql-bin.000054 |    504549 |
+------------------+-----------+

最新的binlog文件是mysql-bin.000054。我们的目标是筛选出需要回滚的SQL

由于误操作人只知道大致的误操作时间,我们首先根据时间做一次过滤。
只需要解析test库user表。
(注:如果有多个sql误操作,则生成的binlog可能分布在多个文件,需解析多个文件)

shell> python binlog2sql/binlog2sql.py -h127.0.0.1 -P3306 -uadmin -p'admin' -dtest -tuser --start-file='mysql-bin.000054' --start-datetime='2016-12-26 11:44:00' --stop-datetime='2016-12-26 11:50:00' > /tmp/raw.sql

raw.sql输出:
DELETE FROM `test`.`user` WHERE `addtime`='2014-11-11 00:04:48' AND `id`=2 AND `name`='小钱' LIMIT 1; #start 257427 end 265754 time 2016-12-26 11:44:56
DELETE FROM `test`.`user` WHERE `addtime`='2015-11-11 20:25:00' AND `id`=3 AND `name`='小孙' LIMIT 1; #start 257427 end 265754 time 2016-12-26 11:44:56
...
DELETE FROM `test`.`user` WHERE `addtime`='2016-12-14 23:09:07' AND `id`=24530 AND `name`='tt' LIMIT 1; #start 257427 end 504272 time 2016-12-26 11:44:56
INSERT INTO `test`.`user`(`addtime`, `id`, `name`) VALUES ('2016-12-10 00:04:33', 32722, '小王'); #start 504299 end 504522 time 2016-12-26 11:49:42
...

根据位置信息,我们确定了误操作sql来自同一个事务

准确位置在257427-504272之间(binlog2sql对于同一个事务会输出同样的start position)。
再根据位置过滤,使用 -B 选项生成回滚sql,检查回滚sql是否正确。
(注:真实场景下,生成的回滚SQL经常会需要进一步筛选。结合grep、编辑器等)

shell> python binlog2sql/binlog2sql.py -h127.0.0.1 -P3306 -uadmin -p'admin' -dtest -tuser --start-file='mysql-bin.000054' --start-position=257427 --stop-position=504272 -B > /tmp/rollback.sql

rollback.sql 输出:
INSERT INTO `test`.`user`(`addtime`, `id`, `name`) VALUES ('2016-12-14 23:09:07', 24530, 'tt'); #start 257427 end 504272 time 2016-12-26 11:44:56
INSERT INTO `test`.`user`(`addtime`, `id`, `name`) VALUES ('2016-12-12 00:00:00', 24529, '小李'); #start 257427 end 504272 time 2016-12-26 11:44:56
...
INSERT INTO `test`.`user`(`addtime`, `id`, `name`) VALUES ('2014-11-11 00:04:48', 2, '小钱'); #start 257427 end 265754 time 2016-12-26 11:44:56

shell> wc -l /tmp/rollback.sql
16128 /tmp/rollback.sql

与业务方确认回滚sql没问题,执行回滚语句。登录mysql,确认回滚成功。

shell> mysql -h127.0.0.1 -P3306 -uadmin -p'admin' < /tmp/rollback.sql

mysql> select count(*) from user;
+----------+
| count(*) |
+----------+
|    16389 |
+----------+
注:若无特殊说明,文章均为云天河原创,请尊重作者劳动成果,转载前请一定要注明出处