Tutorial: How to load and save data from different data source in Spark 2.0.2

In this blog we will discuss about Spark 2.0.2 . It demonstrates the basic functionality of Spark 2.0.2. We also describe  how to load and save data in Spark2.0.2. We have tried to cover basics of Spark 2.0.2 core functionality  like read and write data from different source (Csv,JSON,Txt) .

Loading and saving CSV file

As an example, the following creates a DataFrame based on the content of a CSV file. Read a csv document named team.csv with the following content and generate a table based on the schema in the csv document.

id,name,age,location
1,Mahendra Singh Dhoni ,38,ranchi
2,Virat Kohli,25,delhi
3,Shikhar Dhawan,25,mumbai
4,Rohit Sharma,33,mumbai
5,Stuart Binny,22,chennai

 

def main(args: Array[String]) {
  val spark = SparkSession
    .builder()
    .master("local")
    .appName("Spark2.0")
    .getOrCreate()
  val df = spark.read.option("header", "true")
    .csv("/resources/team.csv")
 val selectedData = df.select("name", "age")
   selectedData.write.option("header", "true")
    .save(s"src/main/resources/${UUID.randomUUID()}")
  println("OK")
}

 csv1.png

csvsave.png

Loading and saving JSON file

Here we include some basic examples of structured data processing using DataFrames. As an example, the following creates a DataFrame based on the content of a JSON file. Read a JSON document named cars_price.json with the following content and generate a table based on the schema in the JSON document.

[{"itemNo" : 1, "name" : "Ferrari", "price" : 52000000 , "kph": 6.1},  {"itemNo" : 2, "name" : "Jaguar", "price" : 15000000 , "kph": 3.4},  {"itemNo" : 3, "name" : "Mercedes", "price" : 10000000, "kph": 3}, {"itemNo" : 4, "name" : "Audi", "price" : 5000000 , "kph": 3.6}, {"itemNo" : 5, "name" : "Lamborghini", "price" : 5000000 , "kph": 2.9}]
def main(args: Array[String]) {
  val spark = SparkSession
    .builder()
    .master("local")
    .appName("Spark2.0")
    .getOrCreate()
  val df = spark.read.option("header", "true")
    .json("/resources/cars_price.json")
  val selectedData = df.select("name", "price")
  selectedData.write.option("header", "true")
    .save(s"src/main/resources/${UUID.randomUUID()}")
  println("OK")
}

savejson.pngjsonsave

Loading and saving txt file

As an example, the following creates a Data Frame based on the content of a text file. Read a text document named rklick.txt with the following content and generate a table based on the schema in the text document.

Rklick creative technology company providing key digital services.
Focused on helping our clients to build a successful business on web and mobile.
We are mature and dynamic solution oriented IT full service product development and customized consulting services company. Since 2007 we're dedicated team of techno-whiz engaged in providing outstanding solutions to diverse group of business entities across North America.
Known as Professional Technologists, we help our customers enable latest technology stack or deliver quality cost effective products and applications
def main(args: Array[String]) {
    val spark = SparkSession
      .builder()
      .master("local")
      .appName("Spark2.0")
      .getOrCreate()
    import spark.implicits._
    val rklickData = spark.read.text("src/main/resources/rklick.txt").as[String]
    val rklickWords = rklickData.flatMap(value=>value.split("\\s+"))
    val saveTxt = rklickWords.write.text(s"src/main/resources/${UUID.randomUUID()}")
    println("OK")
  }

txtload.pngtxtsave

We would look at how we can create more useful tutorials to grow it , then we would be adding more content to it together. If you have any suggestion feel free to suggest us 🙂 Stay tuned.

Advertisements