写点什么

大数据实战训练营 -sparkcore 作业

用户头像
Clarke
关注
发布于: 2 小时前

1 作业一使用 RDD API 实现带词频的倒排索引


import org.apache.spark.SparkConfimport org.apache.spark.SparkContext
object InvertedIndex {def main(args: Array[String]): Unit = {val sparkConf = new SparkConf().setAppName("sparkbyexamples.com").setMaster("local[1]")val sc = new SparkContext(sparkConf)val bookwordRdd = sc.textFile("D:\\data\\index.txt") .flatMap { line => val array = line.split("\\.", 2) val bookName = array(0) array(1).split("\"")(1).split(" ").map(word => (bookName, word)) }
val finalRDD = bookwordRdd.map(kv => (kv._2, kv._1)).map((_, 1L)) .reduceByKey((x, y) => x + y) .map { case ((k, v), cnt) => (k, (v, cnt)) } .groupByKey() .sortByKey() .collect() .foreach(println) }
}
复制代码



asadasdasdasdasdasdasdasdasasdasdasdasdasdasdas

asadasdasdasdasdasdasdasdasasdasdasdasdasdasdas

asadasdasdasdasdasdasdasdasasdasdasdasdasdasdas

用户头像

Clarke

关注

还未添加个人签名 2018.04.15 加入

还未添加个人简介

评论

发布
暂无评论
大数据实战训练营-sparkcore作业