写在前面:
A DataFrame is a Dataset organized into named columns.
A Dataset is a distributed collection of data.
贴代码:
package february.sql
import org.apache.spark.sql.SparkSession
/**
  * Description:
  * DataFrame 转换为Dataset
  * DataSet的操作
  *
  * @Author: 留歌36
  * @Date: 2019/2/25 20:15
  */
object DatasetApp extends App {
  val spark = SparkSession.builder().appName(this.getClass.getSimpleName).master("local[2]").getOrCreate()
  // 注意: 需要导入隐私转换
  import spark.implicits._
  val path = "f:\\infos.csv"
  // spark 解析csv文件,
  val DF = spark.read.option("header","true").option("inferSchema","true").csv(path)
  DF.show()
  // DataFrame 转换为Dataset
  val DS = DF.as[Infos]
  // 常用的两种输出方式
  DS.select(DS("name")).show()
  DS.map(line => line.name).show()
  //
  spark.stop()
  case class Infos(id:Int, name:String,age:Int)
}
简单的csv文件:
infos.csv
id,name,age
1,zhangshan,21
2,lisi,32
3,wangwu,15
4,haha,23更多相关小demo:每天一个程序:https://blog.csdn.net/liuge36/column/info/34094
原文:https://www.cnblogs.com/liuge36/p/10443972.html