经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 程序设计 » SQL语言 » 查看文章
烧死10亿脑细胞的SQL长啥样?
来源:cnblogs  作者:pgdba  时间:2023/7/24 10:22:36  对本文有异议

1 前言

今天在生产中碰到了一个让我十分费解的 SQL,十分有趣。

2 现象

SQL 很好复现,就是逻辑看起来有点唬人

  1. postgres=# create table test(id1 int,id2 int);
  2. CREATE TABLE
  3. postgres=# insert into test values(1,3),(2,1),(3,1),(3,3);
  4. INSERT 0 4
  5. postgres=# select * from test;
  6. id1 | id2
  7. -----+-----
  8. 1 | 3
  9. 2 | 1
  10. 3 | 1
  11. 3 | 3
  12. (4 rows)

业务 SQL 如下???? 此处用 test 表替代,真实情况表中字段存在一个父子关系,根据 parent_id 查找子 id

  1. postgres=# select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test;
  2. b
  3. ---
  4. t
  5. f
  6. t
  7. t
  8. (4 rows)
  9. postgres=# explain select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test;
  10. QUERY PLAN
  11. --------------------------------------------------------------
  12. Seq Scan on test (cost=0.00..3.14 rows=4 width=1)
  13. SubPlan 2
  14. -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4)
  15. (3 rows)

SQL 是 self-join ,a 是 test 表的一个别名。

让我们把子查询单独摘出来执行一下

  1. postgres=# select 1 as one from test a where (test.id1 = a.id2);
  2. ERROR: invalid reference to FROM-clause entry for table "test"
  3. LINE 1: select 1 as one from test a where (test.id1 = a.id2);
  4. ^
  5. HINT: Perhaps you meant to reference the table alias "a".

可以看到报错了,说明此处的 test 是取自外层的 test(即 from test),根据 test.id1 去判断 a.id2,于是返回如下结果

  1. postgres=# select * from test;
  2. id1 | id2
  3. -----+-----
  4. 1 | 3 ---true (id1=1id2里面有,遍历)
  5. 2 | 1 ---false(id1=2id2里面没有,遍历)
  6. 3 | 1 ---true (id1=3id2里面有,遍历)
  7. 3 | 3 ---true (id1=3id2里面有,遍历)
  8. (4 rows)

现在让我们改写一下 SQL,修改一下别名

  1. postgres=# select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test;
  2. b
  3. ---
  4. t
  5. t
  6. t
  7. t
  8. (4 rows)
  9. postgres=# explain select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test;
  10. QUERY PLAN
  11. --------------------------------------------------------------
  12. Seq Scan on test (cost=0.00..5.24 rows=4 width=1)
  13. SubPlan 2
  14. -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4)
  15. (3 rows)

这次可以看到,结果全部是真。老样子,也是相同的原理

  1. postgres=# select 1 as one from test a where (a.id1 = test.id2);
  2. ERROR: invalid reference to FROM-clause entry for table "test"
  3. LINE 1: select 1 as one from test a where (a.id1 = test.id2);
  4. ^
  5. HINT: Perhaps you meant to reference the table alias "a".

于是根据 test.id2 去探测 a.id1,于是返回如下结果

  1. postgres=# select * from test;
  2. id1 | id2
  3. -----+-----
  4. 1 | 3 ---true (id2=3id1里面有,遍历)
  5. 2 | 1 ---true (id2=1id1里面有,遍历)
  6. 3 | 1 ---true (id2=1id1里面有,遍历)
  7. 3 | 3 ---true (id2=3id1里面有,遍历)
  8. (4 rows)

让我们再改写一下 SQL

  1. postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;
  2. b
  3. ---
  4. t
  5. t
  6. t
  7. t
  8. (4 rows)
  9. postgres=# explain select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;
  10. QUERY PLAN
  11. --------------------------------------------------------------
  12. Seq Scan on test (cost=1.05..2.09 rows=4 width=1)
  13. InitPlan 1 (returns $0)
  14. -> Seq Scan on test a (cost=0.00..1.05 rows=1 width=0)
  15. Filter: (id1 = id2)
  16. (4 rows)

这次执行计划变了,变成了 InitPlan,执行计划和结构都有所差异。那么 InitPlan 是什么意思?

This plan happens whenever there is a part of your query that can (or have to) be calculated before anything else, and it doesn't depend on anything in the rest of your query.

只要查询的一部分可以(或必须)在其他任何内容之前计算,并且它不依赖于查询的其余部分中的任何内容,就会发生此计划。

A special case of SubPlan that only needs to run once.

SubPlan 的一种特殊情况,只需要运行一次。

这就有点像相关子连接和非相关子连接的说法,相关子连接在子查询语句中引用了外层表的列属性,这就导致外层表每获得一个元组,子查询就需要重新执行一次;而非相关子连接是指在子查询语句是独立的,和外层的表没有直接的关联,子查询可以单独执行一次,外层表可以重复利用子查询的执行结果。

因此上述执行计划就变成了 a 表先进行一次独立的子查询

  1. postgres=# select * from test where id1 = id2;
  2. id1 | id2
  3. -----+-----
  4. 3 | 3
  5. (1 row)
  6. postgres=# select exists (select 3,3) as b from test;
  7. b
  8. ---
  9. t
  10. t
  11. t
  12. t
  13. (4 rows)
  14. postgres=# delete from test;
  15. DELETE 4
  16. postgres=# insert into test values(5,4);
  17. INSERT 0 1
  18. postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;
  19. b
  20. ---
  21. f
  22. (1 row)
  23. postgres=# insert into test values(3,4);
  24. INSERT 0 1
  25. postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;
  26. b
  27. ---
  28. f
  29. f
  30. (2 rows)
  31. postgres=# insert into test values(4,4);
  32. INSERT 0 1
  33. postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;
  34. b
  35. ---
  36. t
  37. t
  38. t
  39. (3 rows)

可以看到,只要结果中有相等的 id1 和 id2,结果就会全部返回真。

那让我们又双叒叕改写下 SQL

  1. postgres=# truncate table test;
  2. TRUNCATE TABLE
  3. postgres=# insert into test values(1,3),(2,1),(3,1),(3,3);
  4. INSERT 0 4
  5. postgres=# explain select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test;
  6. QUERY PLAN
  7. --------------------------------------------------------------------
  8. Seq Scan on test (cost=0.00..2.09 rows=4 width=1)
  9. SubPlan 1
  10. -> Result (cost=0.00..1.04 rows=4 width=0)
  11. One-Time Filter: (test.id1 = test.id2)
  12. -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=0)
  13. (5 rows)
  14. postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test;
  15. b
  16. ---
  17. f
  18. f
  19. f
  20. t
  21. (4 rows)

这次多了一个 One-Time Filter,那么这个又是什么玩意?

A qualification used by a Result operation. If it is false, an empty result set can be returned without further work.

如果为 false,则可以返回空结果集,无需进一步工作。

让我们瞅瞅代码,在代码中有这么一段注释

  1. * Result nodes are also used to optimise queries with constant
  2. * qualifications (ie, quals that do not depend on the scanned data),
  3. * such as:
  4. *
  5. * select * from emp where 2 > 1
  6. *
  7. * In this case, the plan generated is
  8. *
  9. * Result (with 2 > 1 qual)
  10. * /
  11. * SeqScan (emp.*)
  12. *
  13. * At runtime, the Result node evaluates the constant qual once,
  14. * which is shown by EXPLAIN as a One-Time Filter. If it's
  15. * false, we can return an empty result set without running the
  16. * controlled plan at all. If it's true, we run the controlled
  17. * plan normally and pass back the results.

逻辑很清晰,因此上述逻辑就好比这么一串 SQL

  1. postgres=# select * from test where 2 > 1;
  2. id1 | id2
  3. -----+-----
  4. 1 | 3
  5. 2 | 1
  6. 3 | 1
  7. 3 | 3
  8. (4 rows)
  9. postgres=# select * from test where 1 > 1;
  10. id1 | id2
  11. -----+-----
  12. (0 rows)
  13. postgres=# select exists(select 1 from test where 1 > 1)as b;
  14. b
  15. ---
  16. f
  17. (1 row)
  18. postgres=# select exists(select 1 from test where 1 > 1)as b from test;
  19. b
  20. ---
  21. f
  22. f
  23. f
  24. f
  25. (4 rows)
  26. postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test;
  27. b
  28. ---
  29. f
  30. f
  31. f
  32. t
  33. (4 rows)

因此此时的 SQL 逻辑就变成了这样:遍历 test 表,判断 id1 = id2 的行,所以结果是 false、false、false、true

3 小结

真是一段烧死脑细胞的神奇 SQL。不知道其他数据库中这个 SQL 是否是类似结果?感兴趣的读者可以私信我。当然文章中可能也有错误,欢迎指正 ~

原文链接:https://www.cnblogs.com/pgdba/p/17576558.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号