首页 > 其他 > 详细

spark学习02天-scala读取文件,词频统计

时间:2019-06-09 00:09:01      阅读:467      评论:0      收藏:0      [点我收藏+]

1.在本地安装jdk环境和scala环境

技术分享图片

 

2.读取本地文件:

 

scala> import scala.io.Source
import scala.io.Source

scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
lines: List[String]
= List("With the development of civilization, it is the chil drens duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and dont have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

3.词频topN计算

scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
(x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse
res0: List[(String, Int)] = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin
g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o
nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study
.,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were,
1), (time,1), (them,,1), (childrens,1), (development,1), (knowledge.,1), (It,1)
, (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat
ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma
ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1),
(travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil
ization,,1), (broaden,1), (out,1), (food.,1), (dont,1), (importance,1), (kid...

 

 

spark学习02天-scala读取文件,词频统计

原文:https://www.cnblogs.com/students/p/10992149.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!