Question

I need help in writting the PIG command for this question: Question : Find closest city...

I need help in writting the PIG command for this question:

Question : Find closest city for each tweet? For the dataset assume you have two files: full_text_clean.txt: (userid, lat, lon, tweet, modified_lat, modified_lon) and cities_clean.txt: (city_name, lat, lon, modified_lat, modified_lon) [D2L -> Assignment 3 – Pig -> cities_clean.txt].

Hint: For that purpose, both files include a modified lat and lon column (last two columns of both files). So for each of geo-tagged tweets, you will map to multiple nearby cities using the last two columns of both files. After that, for each geo-tagged tweet, you then calculate the distance using the actual lat-lon values and pick the closest city.

Calculating Euclidean Distance (pig example): SQRT((lat_1 – lat_2) * (lat_1 – lat_2) + (lon_1 – lon_2) * (lon_1 – lon_2))

Lat_1/Lon_1 refer to lat/lon in full_text_clean.txt. Lat_2/Lon_2 refer to lat/lon in cities_clean.txt

Only submit command.

0 0
Add a comment Improve this question Transcribed image text
Answer #1
######### 1 ################
grunt> a = load '/user/root/pig/full_text_clean.txt';
grunt> b= sample a 0.1;
grunt> c = store b into '/user/root/pig/full_text_small.txt ';
########### 2 ##############
grunt> a = load '/user/root/pig/full_text_clean.txt' AS
(id:chararray,lat:float, lon:float,
tweet:chararray,modified_lat:float,modified_lon:float);
grunt> b = foreach a generate flatten(TOKENIZE(tweet)) as token;
grunt> c= group b by token;
grunt> d= foreach c generate flatten(group),COUNT(b) as cnt;
grunt> e= order d by cnt desc;
grunt> f= limit e 4;
grunt> dump f;
(I,109447)
(RT,78153)
(the,75595)
######### 3 $##########
grunt> a = load '/user/root/pig/full_text_clean.txt' AS
(id:chararray,lat:float, lon:float,
tweet:chararray,modified_lat:float,modified_lon:float);
grunt> b= GROUP a All;
c= foreach b generate COUNT_STAR(a);
(377616)
###### 4 ##########
a = load '/user/root/pig/full_text_clean.txt' AS
(id:chararray,lat:float, lon:float,
tweet:chararray,modified_lat:float,modified_lon:float);
b = load '/user/root/pig/cities_clean.txt' AS
(city_name:chararray,lat:float,
lon:float,modified_lat:float,modified_lon:float);
c= join a by (modified_lat,modified_lon),b
by(modified_lat,modified_lon);
d= foreach c generate a::tweet as tweet,b::city_name as
city,SQRT((a::lat - b::lat) * (a::lat - b::lat) + (a::lon - b::lon) *
(a::lon - b::lon)) as distance;
e= group d by (tweet,city_name);
f= foreach e {
>> sortd = order d by distance asc;
>> l = limit sortd 1;
>> generate FLATTEN(group) AS (tweet,cityy),l as dd;
>> };
g= limit f 3;
dump g;
Add a comment
Know the answer?
Add Answer to:
I need help in writting the PIG command for this question: Question : Find closest city...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • I need a help on question #21 D & E only! I figure weighted/unweighted mean already...

    I need a help on question #21 D & E only! I figure weighted/unweighted mean already Weighted mean is : (3.336, 3.483) / Unweighted mean is: (2.94, 2.48) thanks in advance! 20. Is the following data on incomes positively or negatively skewed? You do not need to calculate skewness, but you should justify your answer Data in thousands: 45, 43, 32, 23, 45, 43, 47, 39, 21, 90, 230. 21. (a) Find the weighted mean center of population, where cities'...

  • Hello, I need help with the question. this is on Netflix. you can find the information from Netfl...

    Hello, I need help with the question. this is on Netflix. you can find the information from Netflix annual report 2017 which is online, that why I can't send it here. and please tell the page no. too in which it requires Thank you This is the full information. you just have to look in thile Netflix annual report 2017 Thank you 5. Compare the current, quick and debt to total asset ratios computed with industry averages. (Remember that the...

  • I need help with this code, I'm stuck on it, please remember step 4, I'm very...

    I need help with this code, I'm stuck on it, please remember step 4, I'm very much stuck on that part. It says something about putting how many times it appears Assignment #1: Sorting with Binary Search Tree Through this programming assignment, the students will learn to do the following: Know how to process command line arguments. 1 Perform basic file I/O. 2. Use structs, pointers, and strings. Use dynamic memory. 3. 4. This assignment asks you to sort the...

  • The Acme Trucking company has hired you to write software to help dispatch its trucks. One important element of this sof...

    The Acme Trucking company has hired you to write software to help dispatch its trucks. One important element of this software is knowing the distance between any two cities that it services. Design and implement a Distance class that stores the distances between cities in a two-dimensional array. This class contains the following required data members and methods: Required Data Members: String[] cities; //it is used to store city names int[][] distance; // this 2-D array repreents distance between two...

  • C++ Programming Hi! Sorry for all the material attached. I simply need help in writing the...

    C++ Programming Hi! Sorry for all the material attached. I simply need help in writing the Facility.cpp file and the other files are included in case they're needed for understanding. I was able to complete the mayday.cpp file but am stuck on Facility. The following link contains a tar file with the files provided by the professor. Thank you so much in advanced! http://web.cs.ucdavis.edu/~fgygi/ecs40/homework/hw4/ Closer.h: #ifndef CLOSER_H #define CLOSER_H #include <string> #include "gcdistance.h" struct Closer { const double latitude, longitude;...

  • Need help starting from question 9. I have tried multiple codes but the program says it is incorrect. Case Problem 1 Da...

    Need help starting from question 9. I have tried multiple codes but the program says it is incorrect. Case Problem 1 Data Files needed for this Case Problem: mi pricing_txt.html, mi_tables_txt.css, 2 CSS files, 3 PNG files, 1 TXT file, 1 TTF file, 1 WOFF file 0 Marlin Internet Luis Amador manages the website for Marlin Internet, an Internet service provider located in Crystal River, Florida. You have recently been hired to assist in the redesign of the company's website....

  • Please help me with this question!!! I need all the schedule fill it out and I don’t know how to ...

    please help me with this question!!! I need all the schedule fill it out and I don’t know how to do it; please help me ! Help Save& Exit Submit Check my work During 2018, Jason and Vicki Hurting, who are married with two children and fling jointly, had the following tak information Jason landscaping business, and Vicki works as a sales executive for a manufacturing business Jason (SSN 412-34-5670) and Vicki (SSN 412-34-5671) reside at 123 Bate Street, Bright,...

  • K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

    K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of the process made it popular to data analysts. The task is to form clusters of similar data objects (points, properties etc.). When the dataset given is unlabeled, we try to make some conclusion about the data by forming clusters. Now, the number of clusters can be pre-determined and number of points can have any range. The main idea behind the process is finding nearest...

  • Question #6 - I need some help graphing this out. bis folusel 244 Part III ....

    Question #6 - I need some help graphing this out. bis folusel 244 Part III . Elements of Logistics Systems ed that because the e LTL classification of 200 ded to ask the motor car- CASE 13.1 CHIPPY POTATO CHIP COMPANY Located in Reno, Nevada, since 1947, the Chippy Potato Chip Company manufactured potato chips and distrib- uted them within a 100-mile radius of Reno. It used its ovement are the classification CASE Chippy management believed that he new chips...

  • I need help with my very last assignment of this term PLEASE!!, and here are the instructions: After reading Chapter T...

    I need help with my very last assignment of this term PLEASE!!, and here are the instructions: After reading Chapter Two, “Keys to Successful IT Governance,” from Roger Kroft and Guy Scalzi’s book entitled, IT Governance in Hospitals and Health Systems, please refer to the following assignment instructions below. This chapter consists of interviews with executives identifying mistakes that are made when governing healthcare information technology (IT). The chapter is broken down into subheadings listing areas of importance to understand...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT