组件版本
hive 2.3.7
hadoop 2.7.2
spark 2.4.3
Spark 配置
spark on yarn 配置
spark 的配置版本为/Applications/bigsoft/spark-2.4.3-bin-hadoop2.7/
hadoop 需要修改的配置文件为/Applications/bigsoft/hadoop-2.7.2/bin/hadoop
<property> <description>Whether to enable log aggregation</description> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.log.server.url</name> <value>http://localhost:19888/jobhistory/logs</value></property></configuration>
复制代码
yarn 的 capacity-scheduler.xml 文件修改配置保证资源调度按照 CPU + 内存模式:
<property> <name>yarn.scheduler.capacity.resource-calculator</name> <!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> --> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> </property>
复制代码
修改 mapred-site.xml 的内容
<property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value></property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value></property>
复制代码
spark 下的目录配置为spark-env.sh 添加如下配置
export HADOOP_HOME=/Applications/bigsoft/hadoop-2.7.2export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport SCALA_HOME=/Applications/bigsoft/scala-2.12.8/binexport SPARK_MASTER_IP=localhostexport SPARK_WORKER_MEMORY=2gexport SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18018 -Dspark.history.fs.logDirectory=hdfs:///user/spark/applicationHistory"
复制代码
spark-default.xml 修改配置内容
spark.eventLog.dir=hdfs:///user/spark/applicationHistoryspark.eventLog.enabled=truespark.yarn.historyServer.address=http://localhost:18018
复制代码
启动 spark
${SPARK_HOME}/sbin/start-all.sh
复制代码
测试运行
spark-shell
val text=sc.textFile("/tmp/test/hive.log") text.flatMap(s=>s.split(" ")).map(s=>(s,1)).reduceByKey((x,y)=>x+y).collect().foreach(kv=>println(kv))
复制代码
查看任务:
配置 hive on spark
cp /Applications/bigsoft/apache-hive-2.3.7-bin/conf/hive-site.xml /Applications/bigsoft/spark-2.4.3-bin-hadoop2.7/conf
复制代码
评论