SQL 自连接的魅力
发布于: 2021 年 03 月 31 日
SQL 通常在不同的表或者视图间进行连接运算,包括 INNER JOIN、 LEFT JOIN、 RIGHT JOIN 或 CROSS JOIN
,但是,有一种特殊情况,需要将一个表连接到它自己,这就是所谓的自连接。
自连接的处理过程比较抽象,难以想象、难以理解,但是,一旦掌握其用法,我们便能快速的处理很多问题。
下面来看一个微信群里提到的问题,如下完成下面的转换?
乍一看,好像 CASE WHEN
就可以了,其实不然,这里的 item
都是一样的,使用 CASE WHEN
只会得到一行数据。
这里本质上是要进行行与行
之间的比较,将偶数行与奇数行合并,使用自连接再合适不过了,我们这里先将该表进行自连接(不加条件)。
mysql> SELECT * FROM A A1,A A2;
+------+-------+-------+------+-------+-------+
| item | type | time | item | type | time |
+------+-------+-------+------+-------+-------+
| A | start | 01:00 | A | start | 01:00 |
| A | end | 02:00 | A | start | 01:00 |
| A | start | 03:00 | A | start | 01:00 |
| A | end | 04:00 | A | start | 01:00 |
| A | start | 05:00 | A | start | 01:00 |
| A | end | 06:00 | A | start | 01:00 |
| A | start | 01:00 | A | end | 02:00 |
| A | end | 02:00 | A | end | 02:00 |
| A | start | 03:00 | A | end | 02:00 |
| A | end | 04:00 | A | end | 02:00 |
| A | start | 05:00 | A | end | 02:00 |
| A | end | 06:00 | A | end | 02:00 |
| A | start | 01:00 | A | start | 03:00 |
| A | end | 02:00 | A | start | 03:00 |
| A | start | 03:00 | A | start | 03:00 |
| A | end | 04:00 | A | start | 03:00 |
| A | start | 05:00 | A | start | 03:00 |
| A | end | 06:00 | A | start | 03:00 |
| A | start | 01:00 | A | end | 04:00 |
| A | end | 02:00 | A | end | 04:00 |
| A | start | 03:00 | A | end | 04:00 |
| A | end | 04:00 | A | end | 04:00 |
| A | start | 05:00 | A | end | 04:00 |
| A | end | 06:00 | A | end | 04:00 |
| A | start | 01:00 | A | start | 05:00 |
| A | end | 02:00 | A | start | 05:00 |
| A | start | 03:00 | A | start | 05:00 |
| A | end | 04:00 | A | start | 05:00 |
| A | start | 05:00 | A | start | 05:00 |
| A | end | 06:00 | A | start | 05:00 |
| A | start | 01:00 | A | end | 06:00 |
| A | end | 02:00 | A | end | 06:00 |
| A | start | 03:00 | A | end | 06:00 |
| A | end | 04:00 | A | end | 06:00 |
| A | start | 05:00 | A | end | 06:00 |
| A | end | 06:00 | A | end | 06:00 |
+------+-------+-------+------+-------+-------+
36 rows in set (0.00 sec)
复制代码
这里得到了该表进行自连接的笛卡尔积(36 行),根据查询的目标结果,这里需要过滤出 A1.type='start' AND A2.type='end'
,即可得到如下结果:
mysql> SELECT * FROM A A1,A A2
-> WHERE A1.type='start' AND A2.type='end';
+------+-------+-------+------+------+-------+
| item | type | time | item | type | time |
+------+-------+-------+------+------+-------+
| A | start | 01:00 | A | end | 02:00 |
| A | start | 03:00 | A | end | 02:00 |
| A | start | 05:00 | A | end | 02:00 |
| A | start | 01:00 | A | end | 04:00 |
| A | start | 03:00 | A | end | 04:00 |
| A | start | 05:00 | A | end | 04:00 |
| A | start | 01:00 | A | end | 06:00 |
| A | start | 03:00 | A | end | 06:00 |
| A | start | 05:00 | A | end | 06:00 |
+------+-------+-------+------+------+-------+
9 rows in set (0.00 sec)
复制代码
留下想要的列
mysql> SELECT A1.item,A1.time,A2.time FROM A A1,A A2
-> WHERE A1.type='start' AND A2.type='end';
+------+-------+-------+
| item | time | time |
+------+-------+-------+
| A | 01:00 | 02:00 | # 目标行 1
| A | 03:00 | 02:00 |
| A | 05:00 | 02:00 |
| A | 01:00 | 04:00 |
| A | 03:00 | 04:00 | # 目标行 2
| A | 05:00 | 04:00 |
| A | 01:00 | 06:00 |
| A | 03:00 | 06:00 |
| A | 05:00 | 06:00 | # 目标行 3
+------+-------+-------+
9 rows in set (0.00 sec)
复制代码
已经非常接近目标结果了,上面输出结果的第 1、5、9
行就是我们想要的结果。
目标结果里,开始时间小于结果时间,这里留下 A1.time < A2.time
的行
mysql> SELECT A1.item,A1.time,A2.time FROM A A1,A A2
-> WHERE A1.type='start' AND A2.type='end'
-> AND A1.time < A2.time;
+------+-------+-------+
| item | time | time |
+------+-------+-------+
| A | 01:00 | 02:00 |
| A | 01:00 | 04:00 |
| A | 03:00 | 04:00 |
| A | 01:00 | 06:00 |
| A | 03:00 | 06:00 |
| A | 05:00 | 06:00 |
+------+-------+-------+
6 rows in set (0.00 sec)
复制代码
通过观察可以发现,针对每一个 A1.item 和 A1.time
的组合,我们想要的就是 A2.time
的最小值。很自然就想到了聚合函数 MIN
mysql> SELECT A1.item,A1.time,MIN(A2.time) FROM A A1,A A2
-> WHERE A1.type='start' AND A2.type='end'
-> AND A1.time < A2.time
-> GROUP BY A1.item,A1.time;
+------+-------+--------------+
| item | time | MIN(A2.time) |
+------+-------+--------------+
| A | 01:00 | 02:00 |
| A | 03:00 | 04:00 |
| A | 05:00 | 06:00 |
+------+-------+--------------+
3 rows in set (0.00 sec)
复制代码
最后,设置下别名
mysql> SELECT A1.item,A1.time AS startdate,MIN(A2.time) AS enddate FROM A A1,A A2
-> WHERE A1.type='start' AND A2.type='end'
-> AND A1.time < A2.time
-> GROUP BY A1.item,A1.time;
+------+-----------+---------+
| item | startdate | enddate |
+------+-----------+---------+
| A | 01:00 | 02:00 |
| A | 03:00 | 04:00 |
| A | 05:00 | 06:00 |
+------+-----------+---------+
3 rows in set (0.00 sec)
复制代码
终于得到了目标结果。这就是自连接的一个应用,你 GET 到了吗?
划线
评论
复制
发布于: 2021 年 03 月 31 日阅读数: 10
版权声明: 本文为 InfoQ 作者【披头】的原创文章。
原文链接:【http://xie.infoq.cn/article/1950bb2bf2f57e7c6f86b9032】。文章转载请联系作者。
披头
关注
还未添加个人签名 2018.11.13 加入
数据科学爱好者,懂点SQL,会点Python,公众号【数据科学探究】
评论