We are interested in detecting communities in a social media
dataset.
1. How would you mine this problem?
2. Choose a social network and explain the kind of data cleaning
you need.
3. What kind of data mining algorithm can you use?
1. Recently, social networking sites are offering a rich resource of heterogeneous data. The analysis of such data can lead to the discovery of unknown information and relations in these networks. The detection of communities including ‘similar’ nodes is a challenging topic in the analysis of social network data, and it has been widely studied in the social networking community in the context of underlying graph structure. Online social networks, in addition to having graph structures, include effective user information within networks. Using this information leads to enhance quality of community discovery. In this study, a method of community discovery is provided. Besides communication among nodes to improve the quality of the discovered communities, content information is used as well. This is a new approach based on frequent patterns and the actions of users on networks, particularly social networking sites where users carry out their preferred activities. The main contributions of proposed method are twofold: First, based on the interests and activities of users on networks, some small communities of similar users are discovered, and then by using social relations, the discovered communities are extended. The F-measure is used to evaluate the results of two real-world data set demonstrating that the proposed method principals to improve the community detection quality.
2. If you are analyzing your social media activities without cleaning your data first, you are wasting your time. Clean data is the key to getting your strategy right.
In social media, conversations aren't always that clean.
By unclean conversations we mean irrelevant ones. With the huge amount of existing social conversations, there naturally comes a lot of irrelevant ones too.
When speaking of companies and social data, irrelevant data refers to spam, ads, posts by the company itself or its employees, as well as posts not related to the brand. In other words, "noise". Also, not everything is posted by real humans. Some can be created by social media robots, so called bots.
Let's say that ACME brand has received 12 500 messages during the past seven days. Of all the messages, 77 % was noise. Which means, only 23 % of the whole conversation is relevant and created by real humans. Hence, you can ask yourself, would you rather make your decisions based on spam, as in the total of 12 500 messages, or on those 2875 messages actually created by real users?
Filtering out and removing the noise is indeed very time consuming, yet vital. In marketing, as well as in product development, it’s crucial to base your decisions on reliable and relevant data in order to understand what your customers really want.
3. The Balanced Link Density Label Propagation Algorithm is used .
The proposed method (BLDLP) substitutes random selection from famous LPA algorithm with a rational choice.
The BLDLP algorithm results are more stable
The BLDLP algorithm increases the original time efficiency
We are interested in detecting communities in a social media dataset. 1. How would you mine this problem? 2. Choose a social network and explain the kind of data cleaning you need. 3. What kind of dat...
In a 1-2 page paper explain the difference in how you would mine data based on the 3 categories; Prediction, Clustering, and Association. Within the paper, please include responses to the following questions: What is the difference in the type of data needed? Which data mining approach would you choose? Why? How will the outcomes of the analysis be used?
• Describe in detail a problem domain in the area of GIS. • Explain in detail how you would gather data needed to address this problem. Is there a public dataset available? Would you need to purchase the data you need? • Explain how you would model and/or store the data. Would you use a relational database? A NoSQL database? A GIS-specific database? A flat file? What would your data model look like and why? How (if at all) would...
3. (25 pts) Consider the data points: t y 0 1.20 1 1.16 2 2.34 3 6.08 ake a least squares fitting of these data using the model yü)- Be + Be-. Suppose we want to m (a) Explain how you would compute the parameters β | 1 . Namely, if β is the least squares solution of the system Χβ y, what are the matrix X and the right-hand side vector y? what quantity does such β minimize? (b)...
1. Write what would be an effective title for the passage. 2. What kind of language is used? 3. What are some related words which help unify the passage? 4. What is the purpose (persuade, inform or entertain) and what do you base that on? 5. Write the thesis in your own words. Social media has been growing in popularity since its beginnings in the early 2000s. With this rise in popularity comes more social media use by children and...
5. (20 pts) Suppose that we have a dataset {(yi, x, Tt2, X;3), i,1,... ,n} together with some general belief on the data that higher (lower) value of each covariate x; (j = 1,2,3) will tend to result in higher (lower) y. In this study, we are interested in predicting y; from the total set of the regressors x;i, X;2, xt3. So, we apply the multiple linear regression yi = Bo+B1x1 +B2x52 + B3x43 + t to the data and...
1. Describe in a couple of sentences what the data describes. 2. In one or two sentences, explain how the graphic helped you to understand the data and describe any conclusions about or reactions to the data you had. 3. How could you use this in your classroom? Contrasting social media Democrats to real life Say they don't follow the news much 27% of Democrats on social media 59% of other Democrats As many know (I hope), what we see...
In this Module 2 Discussion, we shall discuss how to use R to obtain information by exploring, cleaning, and preprocessing the data. The following is a kind of checklist of frequent steps in data preparation. More precisely, they are also typical steps in “cleansing” data. Such steps include (at least): No. Steps R functions 1 Loading and looking at the dataset in R 2 Identify missing values 3 Identify outliers 4 Check for overall plausibility and errors (e.g, typos)...
2. In what units of measure do we use for a zone of inhibition? 3. Given the picture shown here, what steps do you need to go through to decide which of these antibiotics would be best for use on a patient? Zone with no bacterial growth Bacteria growing on agar gel Antibiotic disc 4. What was the purpose of the catalase test that you did for the Unknown Identification? 5. Explain how each of the following contributes to bacterial...
What kind of website have you built . e-commerce, lead generator, informational .Depending on which of the above listed types, what are you providing what is you mission What color scheme(s) did you choose and why How have you placed content on the main page are images placed in a certain way for a purpose · is written content key to what you are providing Provide your keywords box and negative keywords box Where is you call-to-action (CTA) is it...
3) Out of the following, name which kind of attack you carried out in part 1 and part2: a. ciphertext only, b. known plaintext, c. chosen plaintext, d. chosen ciphertext. Explain your answer Problem 3 10 points] A 4-bit long message was encrypted using one-time pad to yield a cipher-text “1010” Assuming the message space consists of all 4-bit long messages, what is the probability that the corresponding plaintext was “1001”? Explain your answer. Problem 4 Assume we perform a...