写点什么

HiveSQL 高级进阶 10 大技巧

作者:编程江湖
  • 2022 年 2 月 14 日
  • 本文字数:1645 字

    阅读完需:约 5 分钟

直接上干货,HiveSQL 高级进阶技巧,重要性不言而喻。掌握这 10 个技巧,你的 SQL 水平将有一个质的提升,达到一个较高的层次!

1.删除:

insert overwrite tmp select * from tmp where id != '666';复制代码
复制代码

2.更新:

直接上干货,HiveSQL 高级进阶技巧,重要性不言而喻。掌握这 10 个技巧,你的 SQL 水平将有一个质的提升,达到一个较高的层次!

insert overwrite tmp select id,label,       if(id = '1' and label = 'grade','25',value) as value from tmp where id != '666';复制代码
复制代码

3.行转列:

-- Step03:最后将info的内容切分select id,split(info,':')[0] as label,split(info,':')[1] as valuefrom (-- Step01:先将数据拼接成“heit:180,weit:60,age:26”    select id,concat('heit',':',height,',','weit',':',weight,',','age',':',age) as value     from tmp) as tmp-- Step02:然后在借用explode函数将数据膨胀至多行lateral view explode(split(value,',')) mytable as info;复制代码
复制代码

4.列转行 1:

select tmp1.id as id,tmp1.value as height,tmp2.value as weight,tmp3.value as age from (select id,label,value from tmp2 where label = 'heit') as tmp1joinon tmp1.id = tmp2.id(select id,label,value from tmp2 where label = 'weit') as tmp2joinon tmp1.id = tmp2.id(select id,label,value from tmp2 where label = 'age') as tmp3on tmp1.id = tmp3.id;复制代码
复制代码

5.列转行 2:

selectid,tmpmap['height'] as height,tmpmap['weight'] as weight,tmpmap['age'] as agefrom (    select id,           str_to_map(concat_ws(',',collect_set(concat(label,':',value))),',',':') as tmpmap      from tmp2 group by id) as tmp1;复制代码
复制代码

6.分析函数 1:

select id,label,value,       lead(value,1,0)over(partition by id order by label) as lead,       lag(value,1,999)over(partition by id order by label) as lag,       first_value(value)over(partition by id order by label) as first_value,       last_value(value)over(partition by id order by label) as last_valuefrom tmp;复制代码
复制代码

7.分析函数 2:


select id,label,value, row_number()over(partition by id order by value) as row_number, rank()over(partition by id order by value) as rank, dense_rank()over(partition by id order by value) as dense_rankfrom tmp;复制代码
复制代码

8.多维分析 1:​​​​​​​

select col1,col2,col3,count(1),       Grouping__ID from tmp group by col1,col2,col3grouping sets(col1,col2,col3,(col1,col2),(col1,col3),(col2,col3),())复制代码
复制代码

9.多维分析 2:​​​​​​​

select col1,col2,col3,count(1),       Grouping__ID from tmp group by col1,col2,col3with cube;复制代码
复制代码

10.数据倾斜 groupby:​​​​​​​

select label,sum(cnt) as all from (    select rd,label,sum(1) as cnt from     (        select id,round(rand(),2) as rd,value from tmp1    ) as tmp    group by rd,label) as tmpgroup by label;复制代码
复制代码

​​​​​​​11.数据倾斜 join:​​​​​​​

select label,sum(value) as all from (    select rd,label,sum(value) as cnt from    (        select tmp1.rd as rd,tmp1.label as label,tmp1.value*tmp2.value as value         from         (            select id,round(rand(),1) as rd,label,value from tmp1        ) as tmp1        join        (            select id,rd,label,value from tmp2            lateral view explode(split('0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9',',')) mytable as rd        ) as tmp2        on tmp1.rd = tmp2.rd and tmp1.label = tmp2.label    ) as tmp1    group by rd,label) as tmp1group by label;复制代码
复制代码

关键词:大数据培训

用户头像

编程江湖

关注

IT技术分享 2021.11.23 加入

还未添加个人简介

评论

发布
暂无评论
HiveSQL高级进阶10大技巧