Question

Please do everything as said is simple but need help with it. Part 1: Spark Setup...

Please do everything as said is simple but need help with it.

Part 1: Spark Setup
In this exercise you will setup a Ubuntu virtual machine and install Spark on it.

Download and install virtual box and ubuntu from the following sites as we did in the class.

https://www.virtualbox.org/wiki/Downloads https://www.ubuntu.com/download/desktop

Once the installation is complete you will need to install latest version of java. Issue the following commands

sudo apt-get update

sudo apt-get install default-jre

after installation is done check the version using the following command

java -version

You need to install scala https://downloads.lightbend.com/scala/2.12.3/scala-2.12.3.tgz . It will be downloaded into Downloads folder.

Decompress the tgz archive using the following command

tar -xvzf scala-2.12.3.tgz

file will be decompressed to scala-2.12.3 folder. Move this folder to /usr/local/scala folder using the following command.

sudo mv scala-2.12.3 /usr/local/scala

You need to set the PATH environment variable to the scala binary using the following command

export PATH=$PATH:/usr/local/scala/bin

test that installation is successful by checking the version

scala -version

Now install spark by downloading it from https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin- hadoop2.7.tgz

Decompress it using

tar -xvzf spark-2.2.0-bin-hadoop2.7.tgz

and move it to /usr/local/spark folder using the following command

sudo mv spark-2.2.0-bin-hadoop2.7 /usr/local/spark

Finally set the path variable

export PATH=$PATH:/usr/local/spark/bin

now issue the following command to check installation was successful.

spark-shell

It will take some time but you should see some messages and screen art saying spark version 2.2.0 and giving you prompt scala>

Part2: Using Spark to work with Dataset

For this exercise please read chapter2 of the text book and use the dataset available at

http://bit.ly/1Aoywaq.

Using the dataset complete the following tasks.
1. Please create a raw RDD for all the CSV files
2. Please remove all headers from the RDD
3. Please convert each record in the RDD to a case class record 4. Please sample 20 records from the RDD.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

PART 1:

1.Completed first part of Installation of virtual box and ubuntu from the site given and updated version of java.

2. Run sudo apt-get update and sudo apt-get install default-jre and checked the updated version of java.

3. Installed scala and saved into Download folder and Decompress the tgz archive using the following command

tar -xvzf scala-2.12.3.tgz

4. File will be decompressed to scala-2.12.3 folder. Move this folder to /usr/local/scala folder.

5. Set the PATH environment variable to the scala binary using the following command and test that installation is successful by checking the version.

6. Installed spark and Decompress and move it to /usr/local/spark folder.

7. Finally set the path variable export PATH=$PATH:/usr/local/spark/bin and issue the following command to check installation was successful by spark-shell and giving prompt scala>

PArt 1 is completed now.

PART 2 :

Not aware about the chapter 2 of which book and topic but using the dataset by above link

http://bit.ly/1Aoywaq.

1. Create a raw RDD for all the CSV files and remove all headers from the RDD


2.Now convert each record in the RDD to a case class record 4.

3.Did sampling for 20 RDD.

Part 2 is also completed now.

As said above all the parts has been completed now.

Add a comment
Know the answer?
Add Answer to:
Please do everything as said is simple but need help with it. Part 1: Spark Setup...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • I need screenshots of every step please Case Projects Case Project 7-1: Use SSH in Ubuntu...

    I need screenshots of every step please Case Projects Case Project 7-1: Use SSH in Ubuntu In this project, you learn to use SSH in Ubuntu. Using the Ubuntu VMs you created in the case projects in Chapters 2 and 3, follow the steps to use SSH. Using the VM that has Ubuntu Server installed, do the following: 1. Start the VM and log on 2. SSH is included in Ubuntu Server but is not installed. Enter this command to...

  • need help with this assignment, please. Part 1 - Java program named MemoryCalculator In your Ubuntu...

    need help with this assignment, please. Part 1 - Java program named MemoryCalculator In your Ubuntu VM (virtual machine), using terminal mode ONLY, do the following: Create the folder program2 In this folder place the text file located on my faculty website in Module 2 called RAMerrors (Do not rename this file, it has no extension.) It is down below. Ths is the file RAMErrors 3CDAEFFAD ABCDEFABC 7A0EDF301 1A00D0000 Each record in this file represents the location of an error...

  • URGENT HELP NEEDED: JQuery. PLEASE POST SCREEN SHOTS Task 1: Downloading jQuery Right-click the link to...

    URGENT HELP NEEDED: JQuery. PLEASE POST SCREEN SHOTS Task 1: Downloading jQuery Right-click the link to download the uncompressed latest version of jQuery Copy the jQuery.x.x.x.js file in the folder and specified as source file. Task 2: Download and install HTML-Kit 1. Navigate to htmlkit.com. 2. Click Download HTML-Kit 292. After it downloads, launch HKSetup.exe. Choose Full installation (the default) Uncheck Yes, download and install HTML-Kit Tools Trial. 6. Click Next>Finish. Task 3: Creating a Simple jQuery Application Launch HTML-Kit....

  • computer networks help please !!! could someone help with the following tasks! ANY HELP WILL BE...

    computer networks help please !!! could someone help with the following tasks! ANY HELP WILL BE IMMENSELY APPRECIATED! THANKS. Task 1 In the following scenario, we would like to find out more information about a host that is on out network. Given an IP address we would like to search for additional information, we can start with obtaining the IP address off the default route. When running on a Linux VM, this like most likely to be the internal IP...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT