2016_spring_Spark_Tutorial

####Use Spark shell

After downloaded Spark-1.6.0-bin-hadoop.2.6
cd ~/spark-1.6.0-bin-hadoop2.6/
./bin/spark-shell

Spark-shell

// relative path is based on spark directory
val text = sc.textFile("README.md")
text.count()
text.filter(line => line.contains("Spark")).count()
text.first()
text.take(10)
text.collect()

####Simple word count

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object test {
  def  main (args: Array[String]){
    System.setProperty("hadoop.home.dir", "E:\\hadoop-common-2.2.0-bin-master")
    val conf = new SparkConf().setAppName("test_mvn").setMaster("local")
    val sc = new SparkContext(conf)

    val txt = sc.textFile("E:\\spark-1.6.0-bin-hadoop2.6\\README.md")
    val txt_split = txt.flatMap(_.split(" ")) // (line => line.split(" "))
    val txt_map = txt_split.map((_, 1))       // (line => (line, 1))
    val txt_red = txt_map.reduceByKey(_ + _)  // ((a, b) => a + b)
    
    val wordCount = txt.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _)
    
    wordCount.foreach(println) // rdd.map(println)
  }
}

####How to use spark-submit with IntelliJ IDEA

click "Terminal" on the bottom side
sbt package
now you have a .jar in your project\target\scala-2.10\YOURPROJECT_2.10-1.0.jar
cd SPARK_DIRECTORY
.\bin\spark-submit --name "test" --master local \...\YOURPROJECT_2.10-1.0.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2016_spring_Spark_Tutorial

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

2016_spring_Spark_Tutorial

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages