大数据作业 Spark sql

Clarke

关注

发布于: 刚刚

1. 为 Spark SQL 添加一条自定义命令

• SHOW VERSION;• 显示当前 Spark 版本和 Java 版本

Answser

clone spark into local
open sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 file 并依次添加 key word

using maven plugin, in antlr4, double click to translate.
in file sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

insert below code

   override def visitShowSparkVersion(ctx: ShowSparkVersionContext): LogicalPlan = withOrigin(ctx) {    ShowSparkVersionCommand()  }

复制代码

new file sql/core/src/main/scala/org/apache/spark/sql/execution/command/ShowSparkVersionCommand.scala

with content

 /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License.  You may obtain a copy of the License at * *    http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */
package org.apache.spark.sql.execution.command
import org.apache.spark.sql.{Row, SparkSession}import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference}import org.apache.spark.sql.types.StringType
case class ShowSparkVersionCommand() extends LeafRunnableCommand {
  override val output: Seq[Attribute] =    Seq(AttributeReference("spark_version", StringType, nullable = true)())
  override def run(sparkSession: SparkSession): Seq[Row] = {    val outputString = System.getenv("SPARK_VERSION")    Seq(Row(outputString))  }
}

复制代码

in spark folder , run build/sbt package -Phive -Phivethrift
then after build, package successful, cd binand lanuch with envriment variable SPARK_VERSION =3.1.2(custom cmd) spark-sql

8: query:

 show spark_version;

复制代码

will output 3.1.2(custom cmd)

 (base) xukaixuan@xukaixuandeMacBook-Pro spark % SPARK_VERSION=3.1.2(CUSTOM CMD) bin/spark-sql21/09/04 23:41:40 WARN Utils: Your hostname, xukaixuandeMacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.31.85 instead (on interface en0)21/09/04 23:41:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another addressUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.propertiesSetting default log level to "WARN".To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).21/09/04 23:41:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable21/09/04 23:41:43 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist21/09/04 23:41:43 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist21/09/04 23:41:45 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.021/09/04 23:41:45 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore xukaixuan@127.0.0.1Spark master: local[*], Application Id: local-1630770102074spark-sql> show MYVERSION;3.1.2(CUSTOM CMD)Time taken: 2.148 seconds, Fetched 1 row(s)

复制代码

2. 构建 SQL 满足如下要求

通过 set spark.sql.planChangeLog.level=WARN;查看

构建一条 SQL，同时 apply 下面三条优化规则：CombineFiltersCollapseProjectBooleanSimplification
构建一条 SQL，同时 apply 下面五条优化规则：ConstantFoldingPushDownPredicatesReplaceDistinctWithAggregateReplaceExceptWithAntiJoinFoldablePropagation

answser

before:

using spark-sql create table

CREATE TEMPORARY TABLE finance USING org.apache.spark.sql.json  OPTIONS (path 'finances-small.json');

复制代码

, 其中 finances-small.json 数据为：

{"ID":1,"Account":{"Number":"123-ABC-789","FirstName":"Jay","LastName":"Smith"},"Date":"1/1/2015","Amount":1.23,"Description":"Drug Store"},{"ID":2,"Account":{"Number":"456-DEF-456","FirstName":"Sally","LastName":"Fuller"},"Date":"1/3/2015","Amount":200.00,"Description":"Electronics"},{"ID":3,"Account":{"Number":"333-XYZ-999","FirstName":"Brad","LastName":"Turner"},"Date":"1/4/2015","Amount":106.00,"Description":"Gas},{"ID":4,"Account":{"Number":"987-CBA-321","FirstName":"Justin","LastName":"Pihony"},"Date":"1/4/2015","Amount":0.00,"Description":"Drug Store"},...

复制代码

在 sql cmd 中设置： set spark.sql.planChangeLog.level=WARN;

执行 sql:

CREATE TEMPORARY TABLE finance USING org.apache.spark.sql.json  OPTIONS (path 'finances-small.json');
-- 1. 构建一条SQL，同时apply下面三条优化规则：-- CombineFilters-- CollapseProject-- BooleanSimplification
select  A + B + 1, ID, (case when true then "1" when false then "2" else "3" end) as c   from ( select Amount -1 AS A, Amount + 2 AS B, ID, `Date` from ( select * FROM finance where ID < 30 ) WHERE ID > 5  ) WHERE ID < 20

复制代码

最终得到 log, 其中 CombineFilters 被优化:

===================================21/09/05 14:23:17 WARN PlanChangeLogger: === Applying Rule org.apache.spark.sql.catalyst.optimizer.CollapseProject ===!Project [((A#47 + B#48) + cast(1 as double)) AS ((A + B) + CAST(1 AS DOUBLE))#50, ID#21L, CASE WHEN true THEN 1 WHEN false THEN 2 ELSE 3 END AS c#49]   Project [(((Amount#18 - cast(1 as double)) + (Amount#18 + cast(2 as double))) + cast(1 as double)) AS ((A + B) + CAST(1 AS DOUBLE))#50, ID#21L, CASE WHEN true THEN 1 WHEN false THEN 2 ELSE 3 END AS c#49]!+- Project [(Amount#18 - cast(1 as double)) AS A#47, (Amount#18 + cast(2 as double)) AS B#48, ID#21L]             +- Filter ((ID#21L > cast(5 as bigint)) AND (ID#21L < cast(20 as bigint)))!   +- Filter ((ID#21L > cast(5 as bigint)) AND (ID#21L < cast(20 as bigint)))                +- Project [Amount#18, ID#21L]!      +- Project [Amount#18, ID#21L]                   +- Filter (ID#21L < cast(30 as bigint))!         +- Filter (ID#21L < cast(30 as bigint))                      +- Relation[Account#17,Amount#18,Date#19,Description#20,ID#21L,_corrupt_record#22] json!            +- Relation[Account#17,Amount#18,Date#19,Description#20,ID#21L,_corrupt_record#22] json===================================
===================================21/09/05 14:23:17 WARN PlanChangeLogger:=== Applying Rule org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals ===!Project [(((Amount#18 - 1.0) + (Amount#18 + 2.0)) + 1.0) AS ((A + B) + CAST(1 AS DOUBLE))#50, ID#21L, CASE WHEN true THEN 1 WHEN false THEN 2 ELSE 3 END AS c#49]   Project [(((Amount#18 - 1.0) + (Amount#18 + 2.0)) + 1.0) AS ((A + B) + CAST(1 AS DOUBLE))#50, ID#21L, CASE WHEN true THEN 1 ELSE 3 END AS c#49] +- Filter ((ID#21L > 5) AND (ID#21L < 20))                         +- Filter ((ID#21L > 5) AND (ID#21L < 20))    +- Project [Amount#18, ID#21L]                            +- Project [Amount#18, ID#21L]       +- Filter (ID#21L < 30)                               +- Filter (ID#21L < 30)          +- Relation[Account#17,Amount#18,Date#19,Description#20,ID#21L,_corrupt_record#22] json                                  +- Relation[Account#17,Amount#18,Date#19,Description#20,ID#21L,_corrupt_record#22] json===================================
===================================21/09/05 14:23:17 WARN PlanChangeLogger: === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates === Project [(((Amount#18 - 1.0) + (Amount#18 + 2.0)) + 1.0) AS ((A + B) + CAST(1 AS DOUBLE))#50, ID#21L, CASE WHEN true THEN 1 ELSE 3 END AS c#49]   Project [(((Amount#18 - 1.0) + (Amount#18 + 2.0)) + 1.0) AS ((A + B) + CAST(1 AS DOUBLE))#50, ID#21L, CASE WHEN true THEN 1 ELSE 3 END AS c#49]!+- Filter ((ID#21L > 5) AND (ID#21L < 20))       +- Project [Amount#18, ID#21L]!   +- Project [Amount#18, ID#21L]          +- Filter ((ID#21L < 30) AND ((ID#21L > 5) AND (ID#21L < 20)))!      +- Filter (ID#21L < 30)             +- Relation[Account#17,Amount#18,Date#19,Description#20,ID#21L,_corrupt_record#22] json!         +- Relation[Account#17,Amount#18,Date#19,Description#20,ID#21L,_corrupt_record#22] json===================================

复制代码

执行

-- 2. 构建一条SQL，同时apply下面五条优化规则：-- ConstantFolding-- PushDownPredicates-- ReplaceDistinctWithAggregate-- ReplaceExceptWithAntiJoin-- FoldablePropagation
select  A, ID   from ( select distinct ID, `Date`, Amount + 0.2 - 0.1 AS A, Amount + 2 AS B from finance WHERE ID > 5  ) WHERE ID < 20except DISTINCT select Amount + 0.2 as A, ID from  finance WHERE ID > 25

复制代码

最终得到 log，最后一个规则FoldablePropagation没找到，后面会改进：

===================================21/09/05 15:20:11 WARN PlanChangeLogger:=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantFolding === Aggregate [ID#11L, Date#9, A#30, B#31], [A#30, ID#11L]                                                                                  Aggregate [ID#11L, Date#9, A#30, B#31], [A#30, ID#11L]!+- Project [ID#11L, Date#9, ((Amount#8 + cast(0.2 as double)) - cast(0.1 as double)) AS A#30, (Amount#8 + cast(2 as double)) AS B#31]   +- Project [ID#11L, Date#9, ((Amount#8 + 0.2) - 0.1) AS A#30, (Amount#8 + 2.0) AS B#31]!   +- Filter ((ID#11L > cast(5 as bigint)) AND (ID#11L < cast(20 as bigint)))                                                              +- Filter ((ID#11L > 5) AND (ID#11L < 20))       +- Relation[Account#7,Amount#8,Date#9,Description#10,ID#11L,_corrupt_record#12] json   ===================================
===================================21/09/05 15:20:11 WARN PlanChangeLogger:=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates === Project [A#30, ID#11L]                                                                                                                        Project [A#30, ID#11L]!+- Filter (ID#11L < cast(20 as bigint))                                                                                                       +- Aggregate [ID#11L, Date#9, A#30, B#31], [ID#11L, Date#9, A#30, B#31]!   +- Aggregate [ID#11L, Date#9, A#30, B#31], [ID#11L, Date#9, A#30, B#31]                                                                       +- Project [ID#11L, Date#9, ((Amount#8 + cast(0.2 as double)) - cast(0.1 as double)) AS A#30, (Amount#8 + cast(2 as double)) AS B#31]!      +- Project [ID#11L, Date#9, ((Amount#8 + cast(0.2 as double)) - cast(0.1 as double)) AS A#30, (Amount#8 + cast(2 as double)) AS B#31]         +- Filter ((ID#11L > cast(5 as bigint)) AND (ID#11L < cast(20 as bigint)))!         +- Filter (ID#11L > cast(5 as bigint))                                                                                                        +- Relation[Account#7,Amount#8,Date#9,Description#10,ID#11L,_corrupt_record#12] json!            +- Relation[Account#7,Amount#8,Date#9,Description#10,ID#11L,_corrupt_record#12] json===================================
====================================== Applying Rule org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate === Project [A#30, ID#11L]                                                                                                                        Project [A#30, ID#11L] +- Filter (ID#11L < cast(20 as bigint))                                                                                                       +- Filter (ID#11L < cast(20 as bigint))!   +- Distinct                                                                                                                                   +- Aggregate [ID#11L, Date#9, A#30, B#31], [ID#11L, Date#9, A#30, B#31]       +- Project [ID#11L, Date#9, ((Amount#8 + cast(0.2 as double)) - cast(0.1 as double)) AS A#30, (Amount#8 + cast(2 as double)) AS B#31]         +- Project [ID#11L, Date#9, ((Amount#8 + cast(0.2 as double)) - cast(0.1 as double)) AS A#30, (Amount#8 + cast(2 as double)) AS B#31]          +- Filter (ID#11L > cast(5 as bigint))                                                                                                        +- Filter (ID#11L > cast(5 as bigint))             +- Relation[Account#7,Amount#8,Date#9,Description#10,ID#11L,_corrupt_record#12] json                                                          +- Relation[Account#7,Amount#8,Date#9,Description#10,ID#11L,_corrupt_record#12] json
21/09/05 15:20:11 WARN PlanChangeLogger:=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin ===!Except false                    Distinct Project [A#30, ID#11L]                                                                                                                        Project [A#30, ID#11L] +- Filter (ID#11L < cast(20 as bigint))                                                                                                       +- Filter (ID#11L < cast(20 as bigint))!   +- Distinct                                                                                                                                   +- Aggregate [ID#11L, Date#9, A#30, B#31], [ID#11L, Date#9, A#30, B#31]       +- Project [ID#11L, Date#9, ((Amount#8 + cast(0.2 as double)) - cast(0.1 as double)) AS A#30, (Amount#8 + cast(2 as double)) AS B#31]         +- Project [ID#11L, Date#9, ((Amount#8 + cast(0.2 as double)) - cast(0.1 as double)) AS A#30, (Amount#8 + cast(2 as double)) AS B#31]          +- Filter (ID#11L > cast(5 as bigint))                                                                                                        +- Filter (ID#11L > cast(5 as bigint))             +- Relation[Account#7,Amount#8,Date#9,Description#10,ID#11L,_corrupt_record#12] json  ===================================

复制代码

Spark-Catalyst Optimizer 总结

3. 练习：实现自定义优化规则

第一步实现自定义规则：case class MyPushDown(spark: SparkSession) extends Rule[LogicalPlan] {def apply(plan: LogicalPlan): LogicalPlan = plan transform { …. }}第二步创建自己的 Extension 并注入 class MySparkSessionExtension extends (SparkSessionExtensions => Unit) {override def apply(extensions: SparkSessionExtensions): Unit = {extensions.injectOptimizerRule { session =>new MyPushDown(session)}}}第三步通过 spark.sql.extensions 提交 bin/spark-sql --jars my.jar --conf spark.sql.extensions=com.jikeshijian.MySparkSessionExtension

Answer

新建项目 com.xkx.sql.extension：

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">    <modelVersion>4.0.0</modelVersion>
    <groupId>com.xkx.sql.extension</groupId>    <artifactId>custom-spark-extention</artifactId>    <version>1.0-SNAPSHOT</version>
    <properties>        <maven.compiler.source>1.8</maven.compiler.source>        <maven.compiler.target>1.8</maven.compiler.target>        <scala.version>2.13.5</scala.version>        <spark.version>2.4.5</spark.version>        <hadoop.version>2.9.2</hadoop.version>        <encoding>UTF-8</encoding>    </properties>
    <dependencies>        <dependency>            <groupId>org.scala-lang</groupId>            <artifactId>scala-library</artifactId>            <version>${scala.version}</version>        </dependency>
        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-core_2.12</artifactId>            <version>${spark.version}</version>        </dependency>
        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-sql_2.12</artifactId>            <version>${spark.version}</version>        </dependency>
        <dependency>            <groupId>joda-time</groupId>            <artifactId>joda-time</artifactId>            <version>2.9.7</version>        </dependency>
        <dependency>            <groupId>mysql</groupId>            <artifactId>mysql-connector-java</artifactId>            <version>5.1.44</version>        </dependency>
        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-hive_2.12</artifactId>            <version>${spark.version}</version>        </dependency>
        <!-- https://mvnrepository.com/artifact/com.github.scopt/scopt -->        <dependency>            <groupId>com.github.scopt</groupId>            <artifactId>scopt_2.12</artifactId>            <version>3.5.0</version>        </dependency>
        <!-- https://mvnrepository.com/artifact/org.scalatest/scalatest -->        <dependency>            <groupId>org.scalatest</groupId>            <artifactId>scalatest_2.12</artifactId>            <version>3.2.0</version>            <scope>test</scope>        </dependency>

    </dependencies>
    <build>        <pluginManagement>            <plugins>                <!-- 编译scala的插件 -->                <plugin>                    <groupId>net.alchim31.maven</groupId>                    <artifactId>scala-maven-plugin</artifactId>                    <version>3.2.2</version>                </plugin>                <!-- 编译java的插件 -->                <plugin>                    <groupId>org.apache.maven.plugins</groupId>                    <artifactId>maven-compiler-plugin</artifactId>                    <version>3.5.1</version>                </plugin>            </plugins>        </pluginManagement>        <plugins>            <plugin>                <groupId>net.alchim31.maven</groupId>                <artifactId>scala-maven-plugin</artifactId>                <executions>                    <execution>                        <id>scala-compile-first</id>                        <phase>process-resources</phase>                        <goals>                            <goal>add-source</goal>                            <goal>compile</goal>                        </goals>                    </execution>                    <execution>                        <id>scala-test-compile</id>                        <phase>process-test-resources</phase>                        <goals>                            <goal>testCompile</goal>                        </goals>                    </execution>                </executions>            </plugin>
            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-compiler-plugin</artifactId>                <executions>                    <execution>                        <phase>compile</phase>                        <goals>                            <goal>compile</goal>                        </goals>                    </execution>                </executions>            </plugin>
            <!-- 打jar插件 -->            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-shade-plugin</artifactId>                <version>2.4.3</version>                <executions>                    <execution>                        <phase>package</phase>                        <goals>                            <goal>shade</goal>                        </goals>                        <configuration>                            <filters>                                <filter>                                    <artifact>*:*</artifact>                                    <excludes>                                        <exclude>META-INF/*.SF</exclude>                                        <exclude>META-INF/*.DSA</exclude>                                        <exclude>META-INF/*.RSA</exclude>                                    </excludes>                                </filter>                            </filters>                        </configuration>                    </execution>                </executions>            </plugin>        </plugins>    </build>
</project>

复制代码

在 source : src\main\scala 目录下新建文件：

MyPushDown.scala
MySparkSessionExtension.scala

├───src│   ├───main│   │   ├───java│   │   ├───resources│   │   └───scala│   └───test│       └───java

复制代码

MyPushDown.scala

import org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.catalyst.expressions.SubqueryExpressionimport org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, Sort}import org.apache.spark.sql.catalyst.rules._case class MyPushDown(spark: SparkSession) extends Rule[LogicalPlan] {
  private def removeTopLevelSort(plan: LogicalPlan): LogicalPlan = {    plan match {      case Sort(_, _, child) => child      case Project(fields, child) => Project(fields, removeTopLevelSort(child))      case other => other    }  }
    def apply(plan: LogicalPlan): LogicalPlan = plan transform {        case Sort(_, _, child) => {          print("custom MyPushDown")          child        }        case other => {          print("custom MyPushDown")          logWarning(s"Optimization batch is excluded from the MyPushDown optimizer")          other        }    }}

复制代码

MySparkSessionExtension.scala

import org.apache.spark.sql.SparkSessionExtensions
class MySparkSessionExtension  extends (SparkSessionExtensions => Unit)  {  override def apply(extensions: SparkSessionExtensions): Unit =  {    extensions.injectOptimizerRule { session =>        new MyPushDown(session)    }  }}

复制代码

之后打包， mvn package

完成后运行：

spark-sql --jars target/custom-spark-extention-1.0-SNAPSHOT.jar  --conf spark.sql.extensions=MySparkSessionExtension

复制代码

待 sql console 启动后，运行

set spark.sql.planChangeLog.level=WARN;
create temporary view t1 as select * from values  ("one", 1),  ("two", 2),  ("three", 3),  ("one", NULL)  as t1(k, v);

SELECT * FROM t1;

复制代码

会在 log 里面看到自定义优化规则：MyPushDown: Optimization batch is excluded from the MyPushDown optimizer

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation ===!Project [k#12, v#13]            LocalRelation [k#12, v#13]!+- LocalRelation [k#12, v#13]
21/09/05 18:45:45 WARN PlanChangeLogger:=== Result of Batch LocalRelation early ===!Project [k#12, v#13]                                                   LocalRelation [k#12, v#13]!+- Project [cast(k#14 as string) AS k#12, cast(v#15 as int) AS v#13]!   +- Project [k#14, v#15]!      +- LocalRelation [k#14, v#15]
21/09/05 18:45:45 WARN PlanChangeLogger: Batch Pullup Correlated Expressions has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Subquery has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Replace Operators has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Aggregate has no effect.custom MyPushDown21/09/05 18:45:45 WARN MyPushDown: Optimization batch is excluded from the MyPushDown optimizer21/09/05 18:45:45 WARN PlanChangeLogger: Batch Operator Optimization before Inferring Filters has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Infer Filters has no effect.custom MyPushDown21/09/05 18:45:45 WARN MyPushDown: Optimization batch is excluded from the MyPushDown optimizer21/09/05 18:45:45 WARN PlanChangeLogger: Batch Operator Optimization after Inferring Filters has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Push extra predicate through join has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Early Filter and Projection Push-Down has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Join Reorder has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Eliminate Sorts has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Decimal Optimizations has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Distinct Aggregate Rewrite has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Object Expressions Optimization has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch LocalRelation has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Check Cartesian Products has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch RewriteSubquery has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch NormalizeFloatingNumbers has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch ReplaceUpdateFieldsExpression has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Optimize Metadata Only Query has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch PartitionPruning has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Pushdown Filters from PartitionPruning has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Cleanup filters that cannot be pushed down has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch Extract Python UDFs has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch User Provided Optimizers has no effect.21/09/05 18:45:45 WARN PlanChangeLogger:=== Metrics of Executed Rules ===Total number of runs: 157Total time: 0.0271689 secondsTotal number of effective runs: 5Total time of effective runs: 0.0231883 seconds
21/09/05 18:45:45 WARN PlanChangeLogger: Batch Preparations has no effect.21/09/05 18:45:45 WARN PlanChangeLogger: Batch CleanExpressions has no effect.21/09/05 18:45:45 WARN PlanChangeLogger:=== Metrics of Executed Rules ===Total number of runs: 1Total time: 4.4E-6 secondsTotal number of effective runs: 0Total time of effective runs: 0.0 seconds
21/09/05 18:45:45 WARN PlanChangeLogger: Batch CleanExpressions has no effect.21/09/05 18:45:45 WARN PlanChangeLogger:=== Metrics of Executed Rules ===Total number of runs: 1Total time: 8.8E-6 secondsTotal number of effective runs: 0Total time of effective runs: 0.0 seconds
one     1two     2three   3one     NULLTime taken: 0.142 seconds, Fetched 4 row(s)spark-sql>

复制代码

发布于: 刚刚阅读数: 2

Clarke

关注

还未添加个人签名 2018.04.15 加入

还未添加个人简介

发布

暂无评论

创作场景

大数据作业 Spark sql

1. 为 Spark SQL 添加一条自定义命令

Answser

2. 构建 SQL 满足如下要求

answser

Spark-Catalyst Optimizer 总结

3. 练习：实现自定义优化规则

Answer

Clarke

评论