hive 数据导入：从查询数据导入

2022 年 7 月 02 日
本文字数：3173 字
阅读完需：约 10 分钟

大家好，我是怀瑾握瑜，一只大数据萌新，家有两只吞金兽，嘉与嘉，上能 code 下能 teach 的全能奶爸
如果您喜欢我的文章，可以[关注⭐]+[点赞👍]+[评论📃]，您的三连是我前进的动力，期待与您共同成长~

1. 创建表的时候从其他表直接导入

该方法是创建新表的同时，直接读取旧表的字段和数据，常见的应用场景就是快速抽取数据做测试用，详情

0: jdbc:hive2://xxx.hadoop.com:2181,xxx2.h> CREATE  TABLE if not exists testA(. . . . . . . . . . . . . . . . . . . . . . .>  id string comment '',. . . . . . . . . . . . . . . . . . . . . . .>  name  string COMMENT ''. . . . . . . . . . . . . . . . . . . . . . .> )partitioned by (pdt int). . . . . . . . . . . . . . . . . . . . . . .> STORED AS PARQUET. . . . . . . . . . . . . . . . . . . . . . .> TBLPROPERTIES ("parquet.compression"="SNAPPY");
0: jdbc:hive2://xxx.hadoop.com:2181,xxx2.h> create table testB as select id, name from testA;INFO  : Compiling command(queryId=hive_20211123142151_32d40208-47c8-4f73-a9ec-de5aad51807c): create table testB as select id, name from testAINFO  : Semantic Analysis Completed (retrial = false)INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:name, type:string, comment:null)], properties:null)INFO  : Completed compiling command(queryId=hive_20211123142151_32d40208-47c8-4f73-a9ec-de5aad51807c); Time taken: 0.414 secondsINFO  : Executing command(queryId=hive_20211123142151_32d40208-47c8-4f73-a9ec-de5aad51807c): create table testB as select id, name from testAINFO  : Query ID = hive_20211123142151_32d40208-47c8-4f73-a9ec-de5aad51807cINFO  : Total jobs = 1INFO  : Launching Job 1 out of 1INFO  : Starting task [Stage-1:MAPRED] in serial modeINFO  : Subscribed to counters: [] for queryId: hive_20211123142151_32d40208-47c8-4f73-a9ec-de5aad51807cINFO  : Tez session hasn't been created yet. Opening sessionINFO  : Dag name: create table testB as select id, nam...testA (Stage-1)INFO  : Status: Running (Executing on YARN cluster with App id application_1637046596410_10638)
----------------------------------------------------------------------------------------------        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  ----------------------------------------------------------------------------------------------Map 1            container     SUCCEEDED      0          0        0        0       0       0  Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  ----------------------------------------------------------------------------------------------VERTICES: 01/02  [==========================>>] 100%  ELAPSED TIME: 8.63 s     ----------------------------------------------------------------------------------------------INFO  : Status: DAG finished successfully in 7.96 secondsINFO  : INFO  : Query Execution SummaryINFO  : ----------------------------------------------------------------------------------------------INFO  : OPERATION                            DURATIONINFO  : ----------------------------------------------------------------------------------------------INFO  : Compile Query                           0.41sINFO  : Prepare Plan                            9.99sINFO  : Get Query Coordinator (AM)              0.00sINFO  : Submit Plan                             0.42sINFO  : Start DAG                               1.16sINFO  : Run DAG                                 7.96sINFO  : ----------------------------------------------------------------------------------------------INFO  : INFO  : Task Execution SummaryINFO  : ----------------------------------------------------------------------------------------------INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDSINFO  : ----------------------------------------------------------------------------------------------INFO  :      Map 1              0.00              0              0               0                0INFO  :  Reducer 2           2306.00          4,180             89               0                0INFO  : ----------------------------------------------------------------------------------------------
INFO  : OKNo rows affected (20.638 seconds)

复制代码

2. hive 表导入到 hive 表

将 A 表数据导入到 B 表

INSERT INTO TABLE testB select id, name from testA where id = 1;

复制代码

2.1 技巧 1：插入数据

关键字：insert into table

该方法多用于插入新的数据，比如定时或者实时落数据的时候，或者多次迁移历史数据

INSERT INTO TABLE testB select id, name from testA where id = 1;

复制代码

2.2 技巧 2：覆盖数据

关键字：insert overwrite table

该方法多用于天表等，按一定周期去运行，需要对数据进行覆盖操作的场景

INSERT OVERWRITE TABLE testB select id, name from testA where id = 1;

复制代码

2.3 技巧 3：分区数据

关键字：insert overwrite table partition(xxx=xxx)

该方法多用于插入的数据的分区是查询中的某个字段，并且可以做一定的处理，比如取模等操作

INSERT OVERWRITE TABLE testB partition(pdt=20210101) select id, name,pdt from testA where id = 1;

复制代码

2.4 技巧 4：导入压缩数据(重要)

实际场景中，hive 的表大多是有压缩的，没法直接从外部的文件直接导入到系统中，这里就需要一个折中的办法

关键字：

1.临时表

2.分区加入到字段中

3.将临时表数据导入到正式表，指定分区字段

接下来详细举例一下

假如原始表 testA，这是一张带分区 pdt，并且用 snappy 压缩的表

CREATE  TABLE if not exists testA( id string comment '', name  string COMMENT '')partitioned by (pdt int)STORED AS PARQUETTBLPROPERTIES ("parquet.compression"="SNAPPY");

复制代码

想导入的数据是 excel，或者是需要造一些数据，直接用 insert 插入，效率太低，尤其是大批量的

所以我们使用一张临时表，testA_tmp，分区字段 pdt 当做一个字段，字段用“,”分隔，文本文件，方便导入

CREATE  TABLE if not exists testA_tmp( id string comment '', name  string COMMENT '', pdt int COMMENT "")ROW FORMAT DELIMITED FIELDS TERMINATED BY ','STORED AS TEXTFILE;

复制代码

准备好一个文本文件存入数据 tmp.txt

tmp.txt:1,2,202101013,4,20210101

复制代码

上传 hdfs，导入到临时表中

hadoop fs -put tmp.txt /dataTmp/load data inpath '/dataTmp/tmp.txt' into table  testA_tmp;

复制代码

hive 命令行，设置动态分区，执行插入语句

set hive.exec.dynamic.partition=true;set hive.exec.dynamic.partition.mode=nonstrict;INSERT OVERWRITE TABLE testA_tmp PARTITION (pdt) SELECT * FROM testA;

复制代码

至此大功告成，导入到目标表，并且带有分区

结束语

如果您喜欢我的文章，可以[关注⭐]+[点赞👍]+[评论📃]，您的三连是我前进的动力，期待与您共同成长~
可关注公众号【怀瑾握瑜的嘉与嘉】，获取资源下载方式

发布于: 刚刚阅读数: 4

原文链接:【http://xie.infoq.cn/article/a00bd518caa49ffebe101601d】。文章转载请联系作者。

怀瑾握瑜的嘉与嘉

关注

还未添加个人签名 2022.07.01 加入

还未添加个人简介

发布

暂无评论

创作场景

hive 数据导入：从查询数据导入

1. 创建表的时候从其他表直接导入

2. hive 表导入到 hive 表

2.1 技巧 1：插入数据

2.2 技巧 2：覆盖数据

2.3 技巧 3：分区数据

2.4 技巧 4：导入压缩数据(重要)

结束语

怀瑾握瑜的嘉与嘉

评论