Datax 安装及基本使用
作者:刘大猫
- 2025-07-04 黑龙江
本文字数:7865 字
阅读完需:约 26 分钟
@[TOC]
一、Datax 概述
1.概述
2.DataX 插件体系
3.DataX 核心架构
二、安装
2.1 下载并解压
源码地址: https://github.com/alibaba/DataX 这里我下载的是最新版本的 DataX3.0 。下载地址为:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
# 下载后进行解压[xiaokang@hadoop ~]$ tar -zxvf datax.tar.gz -C /opt/software/
复制代码
2.2 运行自检脚本
[xiaokang@hadoop ~]$ cd /opt/software/datax/[xiaokang@hadoop datax]$ bin/datax.py job/job.json
复制代码
出现以下界面说明 DataX 安装成功
三、基本使用
3.1 从 stream 读取数据并打印到控制台
1. 查看官方 json 配置模板
[xiaokang@hadoop ~]$ python /opt/software/datax/bin/datax.py -r streamreader -w streamwriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the streamreader document: https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md
Please refer to the streamwriter document: https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job.
{ "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [], "sliceRecordCount": "" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "", "print": true } } } ], "setting": { "speed": { "channel": "" } } }}
复制代码
2. 根据模板编写 json 文件
{ "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "type":"string", "value":"xiaokang-微信公众号:小康新鲜事儿" }, { "type":"string", "value":"你好,世界-DataX" } ], "sliceRecordCount": "10" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "utf-8", "print": true } } } ], "setting": { "speed": { "channel": "2" } } }}
复制代码
3. 运行 Job
[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./stream2stream.json
复制代码
3.2 Mysql 导入数据到 HDFS
示例:导出 MySQL 数据库中的 help_keyword 表到 HDFS 的 /datax 目录下(此目录必须提前创建)。
1. 查看官方 json 配置模板
[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r mysqlreader -w hdfswriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the mysqlreader document: https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
Please refer to the hdfswriter document: https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md
Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job.
{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "password": "", "username": "", "where": "" } }, "writer": { "name": "hdfswriter", "parameter": { "column": [], "compress": "", "defaultFS": "", "fieldDelimiter": "", "fileName": "", "fileType": "", "path": "", "writeMode": "" } } } ], "setting": { "speed": { "channel": "" } } }}
复制代码
2. 根据模板编写 json 文件
{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [ "help_keyword_id", "name" ], "connection": [ { "jdbcUrl": [ "jdbc:mysql://192.168.1.106:3306/mysql" ], "table": [ "help_keyword" ] } ], "password": "xiaokang", "username": "root" } }, "writer": { "name": "hdfswriter", "parameter": { "column": [ { "name":"help_keyword_id", "type":"int" }, { "name":"name", "type":"string" } ], "defaultFS": "hdfs://hadoop:9000", "fieldDelimiter": "|", "fileName": "keyword.txt", "fileType": "text", "path": "/datax", "writeMode": "append" } } } ], "setting": { "speed": { "channel": "3" } } }}
复制代码
3. 运行 Job
[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./mysql2hdfs.json
复制代码
3.3 HDFS 数据导出到 Mysql
1. 将 3.2 中导入的文件重命名并在数据库创建表
[xiaokang@hadoop ~]$ hdfs dfs -mv /datax/keyword.txt__4c0e0d04_e503_437a_a1e3_49db49cbaaed /datax/keyword.txt
复制代码
表必须预先创建,建表语句如下:
CREATE TABLE help_keyword_from_hdfs_datax LIKE help_keyword;
复制代码
2. 查看官方 json 配置模板
[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r hdfsreader -w mysqlwriter
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
Please refer to the hdfsreader document: https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md
Please refer to the mysqlwriter document: https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job.
{ "job": { "content": [ { "reader": { "name": "hdfsreader", "parameter": { "column": [], "defaultFS": "", "encoding": "UTF-8", "fieldDelimiter": ",", "fileType": "orc", "path": "" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "password": "", "preSql": [], "session": [], "username": "", "writeMode": "" } } } ], "setting": { "speed": { "channel": "" } } }}
复制代码
3. 根据模板编写 json 文件
{ "job": { "content": [ { "reader": { "name": "hdfsreader", "parameter": { "column": [ "*" ], "defaultFS": "hdfs://hadoop:9000", "encoding": "UTF-8", "fieldDelimiter": "|", "fileType": "text", "path": "/datax/keyword.txt" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": [ "help_keyword_id", "name" ], "connection": [ { "jdbcUrl": "jdbc:mysql://192.168.1.106:3306/mysql", "table": ["help_keyword_from_hdfs_datax"] } ], "password": "xiaokang", "username": "root", "writeMode": "insert" } } } ], "setting": { "speed": { "channel": "3" } } }}
复制代码
4. 运行 Job
[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./hdfs2mysql.json
复制代码
3.4 mysql 同步到 mysql
{ "job": { "content": [{ "reader": { "name": "mysqlreader", "parameter": { "password": "gee123456", "username": "geespace", "connection": [{ "jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"], "querySql": ["SELECT id, name FROM test_test"] }] } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["id", "name"], "password": "gee123456", "username": "geespace", "writeMode": "insert", "connection": [{ "table": ["test_test_1"], "jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev" }] } } }], "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0.02 } } }}
复制代码
3.5 mysql 同步到 hbase
{ "job": { "content": [{ "reader": { "name": "mysqlreader", "parameter": { "password": "gee123456", "username": "geespace", "connection": [{ "jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"], "querySql": ["SELECT id, name FROM test_test"] }] } }, "writer": { "name": "hbase11xwriter", "parameter": { "mode": "normal", "table": "test_test_1", "column": [{ "name": "f:id", "type": "string", "index": 0 }, { "name": "f:name", "type": "string", "index": 1 }], "encoding": "utf-8", "hbaseConfig": { "hbase.zookeeper.quorum": "192.168.20.91:2181", "zookeeper.znode.parent": "/hbase" }, "rowkeyColumn": [{ "name": "f:id", "type": "string", "index": 0 }, { "name": "f:name", "type": "string", "index": 1 }] } } }], "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0.02 } } }}
复制代码
3.6 hbase 同步到 hbase
{ "job": { "content": [{ "reader": { "name": "hbase11xreader", "parameter": { "mode": "normal", "table": "test_test", "column": [{ "name": "f:id", "type": "string" }, { "name": "f:name", "type": "string" }], "encoding": "utf-8", "hbaseConfig": { "hbase.zookeeper.quorum": "192.168.20.91:2181", "zookeeper.znode.parent": "/hbase" } } }, "writer": { "name": "hbase11xwriter", "parameter": { "mode": "normal", "table": "test_test_1", "column": [{ "name": "f:id", "type": "string", "index": 0 }, { "name": "f:name", "type": "string", "index": 1 }], "encoding": "utf-8", "hbaseConfig": { "hbase.zookeeper.quorum": "192.168.20.91:2181", "zookeeper.znode.parent": "/hbase" }, "rowkeyColumn": [{ "name": "f:id", "type": "string", "index": 0 }, { "name": "f:name", "type": "string", "index": 1 }] } } }], "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0.02 } } }}
复制代码
3.7 hbase 同步到 mysql
{ "job": { "content": [{ "reader": { "name": "hbase11xreader", "parameter": { "mode": "normal", "table": "test_test_1", "column": [{ "name": "f:id", "type": "string" }, { "name": "f:name", "type": "string" }], "encoding": "utf-8", "hbaseConfig": { "hbase.zookeeper.quorum": "192.168.20.91:2181", "zookeeper.znode.parent": "/hbase" } } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["id", "name"], "password": "gee123456", "username": "geespace", "writeMode": "insert", "connection": [{ "table": ["test_test"], "jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev" }] } } }], "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0.02 } } }}
复制代码
四、辅助资料
划线
评论
复制
发布于: 刚刚阅读数: 5
刘大猫
关注
还未添加个人签名 2022-08-23 加入
还未添加个人简介









评论