环境配置(官方推荐):
JDK(1.8以上,推荐1.8)
Python(推荐Python2.6.X)
下载datax工具:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
解压后就能使用。
目录结构如下:
[root@rancher1 datax]# pwd /datax [root@rancher1 datax]# ls -lh total 4.0K drwxr-xr-x 2 62265 users 59 Jul 8 17:42 bin drwxr-xr-x 2 62265 users 68 Oct 12 2019 conf drwxr-xr-x 2 62265 users 85 Jul 8 18:39 job drwxr-xr-x 2 62265 users 4.0K Oct 12 2019 lib drwxr-xr-x 3 root root 24 Jul 8 17:22 log drwxr-xr-x 3 root root 24 Jul 8 17:22 log_perf drwxr-xr-x 4 62265 users 34 Oct 12 2019 plugin drwxr-xr-x 2 62265 users 23 Oct 12 2019 script drwxr-xr-x 2 62265 users 24 Oct 12 2019 tmp
ORACLE服务器信息:
监听信息:
[oracle@exam ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 08-JUL-2020 19:45:54 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=exam)(PORT=1521))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production Start Date 08-JUL-2020 16:51:50 Uptime 0 days 2 hr. 54 min. 4 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/oracle/product/11.2.0/db_1/network/admin/listener.ora Listener Log File /u01/app/oracle/diag/tnslsnr/exam/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=exam)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521))) Services Summary... Service "exam" has 1 instance(s). Instance "exam", status READY, has 1 handler(s) for this service... Service "examXDB" has 1 instance(s). Instance "exam", status READY, has 1 handler(s) for this service... The command completed successfully
表结构:
SQL> desc issue.db_hosts; Name Null? Type ----------------------------------------- -------- ---------------------------- DB_ID NUMBER(16) HOST_NAME VARCHAR2(256 CHAR) IP VARCHAR2(30 CHAR) RAC_ID NUMBER(16)
mysql服务器信息:
mysql 2569 1276 0 16:47 ? 00:00:11 /u01/mysql-5.7.27/bin/mysqld --basedir=/u01/mysql-5.7.27 --datadir=/u01/mysql-5.7.27/data --plugin-dir=/u01/mysql-5.7.27/lib/plugin --user=mysql --log-error=/u01/mysql-5.7.27/log/mysql_error.log --open-files-limit=65535 --pid-file=/u01/mysql-5.7.27/mysql.pid --socket=/u01/mysql-5.7.27/mysql.sock --port=3306
mysql> desc issue.db_hosts; +-----------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+-------------+------+-----+---------+-------+ | db_id | bigint(20) | YES | | NULL | | | host_name | varchar(50) | YES | | NULL | | | ip | varchar(20) | YES | | NULL | | | rac_id | int(11) | YES | | NULL | | +-----------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec)
配置数据抽取的json文件:
[root@rancher1 job]# cat db_host.json
{
"job": {
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "issue",
"password": "issue",
"column": [
"*"
],
"connection": [
{
"table": [
"db_hosts"
],
"jdbcUrl": [
"jdbc:oracle:thin:@192.168.0.210:1521:exam"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "123456",
"column": [
"db_id","host_name","ip","rac_id"
],
"preSql": [
"truncate table db_hosts"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8",
"table": [
"db_hosts"
]
}
]
}
}
}
],
"setting": {
"speed": {
"channel": 5
}
}
}
}
执行抽取:
[root@rancher1 bin]# python datax.py ../job/db_host.json
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2020-07-08 19:56:11.019 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-07-08 19:56:11.029 [main] INFO Engine - the machine info =>
osInfo: Oracle Corporation 1.8 25.161-b14
jvmInfo: Linux amd64 5.5.10-1.el7.elrepo.x86_64
cpu num: 2
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2020-07-08 19:56:11.054 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"oraclereader",
"parameter":{
"column":[
"*"
],
"connection":[
{
"jdbcUrl":[
"jdbc:oracle:thin:@192.168.0.210:1521:exam"
],
"table":[
"db_hosts"
]
}
],
"password":"*****",
"username":"issue"
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"db_id",
"host_name",
"ip",
"rac_id"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8",
"table":[
"db_hosts"
]
}
],
"password":"******",
"preSql":[
"truncate table db_hosts"
],
"username":"root",
"writeMode":"insert"
}
}
}
],
"setting":{
"speed":{
"channel":5
}
}
}
2020-07-08 19:56:11.080 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-07-08 19:56:11.083 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-07-08 19:56:11.083 [main] INFO JobContainer - DataX jobContainer starts job.
2020-07-08 19:56:11.086 [main] INFO JobContainer - Set jobId = 0
2020-07-08 19:56:11.498 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.0.210:1521:exam.
2020-07-08 19:56:11.500 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2020-07-08 19:56:11.816 [job-0] INFO OriginalConfPretreatmentUtil - table:[db_hosts] all columns:[
db_id,host_name,ip,rac_id
].
2020-07-08 19:56:11.833 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
insert INTO %s (db_id,host_name,ip,rac_id) VALUES(?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2020-07-08 19:56:11.833 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2020-07-08 19:56:11.834 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2020-07-08 19:56:11.835 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2020-07-08 19:56:11.847 [job-0] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[truncate table db_hosts]. context info:jdbc:mysql://192.168.56.140:3306/issue?useUnicode=true&characterEncoding=utf8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2020-07-08 19:56:11.857 [job-0] INFO JobContainer - jobContainer starts to do split ...
2020-07-08 19:56:11.858 [job-0] INFO JobContainer - Job set Channel-Number to 5 channels.
2020-07-08 19:56:11.865 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
2020-07-08 19:56:11.865 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2020-07-08 19:56:11.887 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2020-07-08 19:56:11.891 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2020-07-08 19:56:11.895 [job-0] INFO JobContainer - Running by standalone Mode.
2020-07-08 19:56:11.909 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-07-08 19:56:11.923 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-07-08 19:56:11.923 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2020-07-08 19:56:11.939 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-07-08 19:56:11.947 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from db_hosts
] jdbcUrl:[jdbc:oracle:thin:@192.168.0.210:1521:exam].
2020-07-08 19:56:12.117 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from db_hosts
] jdbcUrl:[jdbc:oracle:thin:@192.168.0.210:1521:exam].
2020-07-08 19:56:12.376 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[442]ms
2020-07-08 19:56:12.377 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it‘s tasks.
2020-07-08 19:56:21.924 [job-0] INFO StandAloneJobContainerCommunicator - Total 667 records, 21266 bytes | Speed 2.08KB/s, 66 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.003s | All Task WaitReaderTime 0.140s | Percentage 100.00%
2020-07-08 19:56:21.925 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2020-07-08 19:56:21.925 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2020-07-08 19:56:21.925 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do post work.
2020-07-08 19:56:21.926 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2020-07-08 19:56:21.927 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /datax/hook
2020-07-08 19:56:21.929 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 1 | 1 | 1 | 0.054s | 0.054s | 0.054s
PS Scavenge | 1 | 1 | 1 | 0.032s | 0.032s | 0.032s
2020-07-08 19:56:21.930 [job-0] INFO JobContainer - PerfTrace not enable!
2020-07-08 19:56:21.931 [job-0] INFO StandAloneJobContainerCommunicator - Total 667 records, 21266 bytes | Speed 2.08KB/s, 66 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.003s | All Task WaitReaderTime 0.140s | Percentage 100.00%
2020-07-08 19:56:21.932 [job-0] INFO JobContainer -
任务启动时刻 : 2020-07-08 19:56:11
任务结束时刻 : 2020-07-08 19:56:21
任务总计耗时 : 10s
任务平均流量 : 2.08KB/s
记录写入速度 : 66rec/s
读出记录总数 : 667
读写失败总数 : 0
[root@rancher1 bin]#
查看结果:
mysql> select * from issue.db_hosts; +------------+------------------------+----------------+--------+ | db_id | host_name | ip | rac_id | +------------+------------------------+----------------+--------+ | 500011196 | dxxxxxxxxx3 | 10.xxx.xxx.174 | 28 | | 500014437 | dxxxx | 10.xxx.xxx.30 | 10 | | 500014437 | dxxxx4 | 10.1xx.xxx.33 | 10 | | 500014437 | dxxxx | 10.xxx.xxx.39 | 10 | | 500014437 | dbxx | 10.xxx.xxx.38 | 10 | | 500014437 | dbxx | 10.xxx.xxx.37 | 10 |
感觉配置还是比较简单的,抽取大量数据库没有测试不晓得性能咋样;
大批量表数据迁移时候需要些脚本生成对应的json(配置json文件很容易出错),具体每个类型数据库的json写法参考https://github.com/alibaba/DataX
oracleread:https://github.com/alibaba/DataX/blob/master/oraclereader/doc
mysqlwrite:https://github.com/alibaba/DataX/tree/master/mysqlwriter/doc
表结构可以借助powerdesign工具实行转换,个别表单独修改;
原文:https://www.cnblogs.com/muzisanshi/p/13266194.html