Abstract:
Big Data is large volumes of structured and unstructured data. This data is what organizations collect on a daily basis. The amount of data is not the important part, but the information gathered from that data is the key. Collecting and analyzing Big Data gives organizations enhanced insight, decision making, and process automation. Approximately each one can agree that big data has taken the business world by storm, but what’s next? Will data continue to grow? What technologies will develop around it? Or will big data become a relic as quickly as the next trend — cognitive technology? Fast data? - appears on the horizon. I believe, am that big data is only going to get bigger and those companies that ignore it will be left further and further behind. This paper studies about what is big data, how does it helps organizations to extract information, its tools and technologies and its future.
Introduction:
In this digital era, analysts have enormous amounts of data available on hand. Big Data is the term for a collection of unstructured, semi-structured and structured datasets whose volume, complexity and rate of growth make them difficult to be captured, managed, processed or analyzed by using the typical database software tools and technologies. Different varieties are in the form of text, video, image, audio, webpage log files, blogs, tweets, location information, sensor data etc. Discovering useful insight from such huge datasets requires smart and scalable analytics services, programming tools and applications [1]. Data mining is also known as Knowledge Discovery in Database (KDD) is an analytical process used in different disciplines to search for significant relationships among variables in large data sets. Analyzing fast and massive stream data may lead to new valuable knowledge and theoretical concepts. Big data has potential to help organizations to improve operations and make faster & more intelligent decisions.
Big Data mining:
Big data is a term for data elements that are so large or intricate that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. Big data is data that's too big for traditional data management to handle. Big, of course, is also subjective. That's why we'll explain it according to three vectors: volume, velocity, and variety -- the three Vs and there are two more V's Variability and value.
Global pulse -Big Data for development:
The work that Global Pulse is doing using Big Data to improve life in developing countries. Global Pulse is a United Nations initiative, launched in 2009, that functions as an innovative lab, and that is based in mining Big Data for developing countries.
The strategy that consists of 1) researching innovative methods and techniques for analyzing real-time digital data to detect early emerging vulnerabilities; 2) assembling free and open source technology toolkit for analyzing real-time data and sharing
Challanges and Oppurtunities:
: • Early warning: develop fast response in time of crisis, detecting anomalies in the usage of digital media • Real-time awareness: design programs and policies with a more fine-grained representation of reality • Real-time feedback: check what policies and programs fails, monitoring it in real time, and using this feedback make the needed changes
Contributed articles:
The articles contributed in big data mining is:
- Scaling Big Data Mining Infrastructure: The Twitter Experience
Mining Heterogeneous Information Networks: A Structural Analysis Approach
Big Graph Mining: Algorithms and discoveries
Mining Large Streams of User Data for Personalized Recommendations
Controversy about Big Data:
s Big Data is a new hot topic, there have been a lot of controversy about it
There is no need to distinguish Big Data analytics from data analytics, as data will continue growing, and it will never be small again.
In real time analytics, data may be changing. In that case, what it is important is not the size of the data, it is its recency.
Limited access to Big Data creates new digital divides. There may be a digital divide between people or organizations being able to analyze Big Data or not.
Tools:Open Source Revolution:
Big Data infrastructure deals with Hadoop, and other related software as:
In Big Data Mining, there are many open source initiatives. The most popular are the following:
Apache Mahout]:
R :
MOA :
Vowpal Wabbit
Forecast to the future:
There are many future important challenges in Big Data management and analytics, that arise from the nature of data:
1. Data volumes will continue to grow. In present day of internet world, we will continue generating bulk amount of data, so the number of devices handheld and Internet-connected devices exponentially grows.
2. Ways to analyze data will improve. As Ovum Says, While SQL as the standard, Spark is emerging as an analytical complementary tool and will continue to grow.
3. More tools for analysis (without the analyst) will emerge. Microsoft MSFT+0.18% and Sales force both recently announced features to let non-coders create apps to view business data.
4. Prescriptive analytics will be built in to business analytics software. IDC predicts that before 2020 intelligence will include in half of all business analytics software. Users will want to be able to use data to make decisions in real time programs like Kafka and Spark.
5. Big data will face huge challenges around privacy, especially with the new privacy regulation by the European Union. Companies will be forced to address the „elephant in the room‟ around their privacy controls and procedures. According to Gartner, business ethics violations will be related to data is about 50% by 2018.
Conclusion:
Big Data is going to continue growing during the next years, and each data scientist will have to manage much more amount of data every year. This data is going to be more diverse, larger, and faster. We discussed in this paper some insights about the topic, and what we consider are the main concerns, and the main challenges for the future.
References:
[1] SAMOA, http://samoa-project.net, 2013.
[2] C. C. Aggarwal, editor. Managing and Mining Sensor Data. Advances in Database Systems. Springer, 2013.
[3] Apache Hadoop, http://hadoop.apache.org.
[4] Apache Mahout, http://mahout.apache.org.
[5] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive Online Analysis http://moa. cms.waikato.ac.nz/. Journal of Machine Learning Research (JMLR), 2010.
[6] C. Bockermann and H. Blom. The streams Framework. Technical Report 5, TU Dortmund University, 12 2012
you can get this Mining Big Data: Current Status, and Forecast to the Future pdf in the google search. this one is the article by Wei Fan Lab Instructions: Read the articles enclosed with thi...
this is the question. The Real-Time City, you find this pdf in google search. Lab Instructions Read the two articles enclosed with this assignment; "The real-time city." For article, write a minimum of paragraphs. paragraph should provide your OPINION of the article. Paragraphs should be approximately 4-8 sentences each Do not plagiarize from the articles provided. All work should be your own. Submit your work as a Microsoft Word file. This lab is due Sunday by 11:59pm. Please structure your...
First, read the article on "The Delphi Method for Graduate Research." ------ Article is posted below Include each of the following in your answer (if applicable – explain in a paragraph) Research problem: what do you want to solve using Delphi? Sample: who will participate and why? (answer in 5 -10 sentences) Round one questionnaire: include 5 hypothetical questions you would like to ask Discuss: what are possible outcomes of the findings from your study? Hint: this is the conclusion....