分析 HiveQL 生成的 MapReduce 执行程序
Map过程:
(输入:key:偏移量,value: 表中一行; )
(输出: key: join字段, value:偏移量+所需字段)
page_view
key:1, value:1,111,9:08:01 => key:111, value:1-1
key:1, value:2,111,9:08:13 => key:111, value:1-2
key:1, value:1,222,9:03:14 => key:222, value:1-1
user
key:2, value:111, 25, female => 111,2-25
key:2, value:222, 32, male => 222,2-32
shuffle结果:
(把Map后相同的key的值变成一个列表)
key:111, value_list:<1-1,1-2,2-25>
key:222, value_list:<1-1,2-32>
Reduce结果:
(把shuffle结果后每个列表同不同偏移量的值笛卡尔积, 得到最终结果)
key: 111, value: 1, 25
key: 111, value: 2, 25
key: 222, value: 1, 32
评论