solution:
given data:
a)
Hadeep is an ideal platform to run ETL. You can feed the results into a traditional data warehouse, or better yet, simply use Hadoop itself as your warehouse. Two for the price of one. And injesting data from all sources into a centralized Hadoop repository is future proof, as your business scale and the data grows rapidly, the Hadoop infrastructure can scale easily.
ETL process in Hadoop:
Kere the typical steps to Hadoop for ETL,
1. Set up a Hadoop cluster,
2. Connect data sources,
3. Define the metadata,
4. Create the ETL jobs,
5. Create the workflow.
b)
For creating high available environmen the data cleansing and transformations are easier done when multiple jobs cascade into a workflow, each performing a special task, often data mappings/transformations need to be executed in a specific order and/or there may be dependencies to check.These dependencies and sequences are captured in workflows-parallel flow allow parallel execution that can speed up the ETL process. Finally the entire workflow needs to be scheduled. They may have to run weekly, nightly, or perhaps even hourly.
Although technologies such as Oozie provide some workflow management, it is typically insufficient. Many organizations create their own workflow management tools. This can be a complex process as it is important to take care of failure scenarios and restart the workflow appropriatly.
c)
Structured data is comprised of clearly defined data type whose pattern makes them easily searchable, while unstructured data -"everything else'-is comprised of data that is usually not as easily searchable, including formats like audio, video, and social media postings.
Unstructured data vs. structured data does not denote conflict between the two. Customers select one or the other not based on their data structure, but on the applications that use them:relational database for structured, and most any other type os application for unstructured data.
d)
i.Click on the HDFS service, and under quick links choose "replication". From the drop down list choose HDFS replication.
ii.Then fill the replication form. You have to supply the source and destination clusters, which path to replicate (choose/for all) , what kind of schedule to set ( run once now, run once in the future or recurring schedule)and when. The default user to run the replication task is hdfs, so it's better to just leave it that way.
iii. If you want to change the default values you can go to resources tab, where you can set how many MapReduce jobs will run concurrently(default is 20) and how they will pick their work.
e)
Do one of the following:
i. Select Backup >Replications
ii. Click schudule HDFS Replications
or
i. Select cluster> HDFS service name
ii. Select Quick Link> Replication
iii. Click Schedule HDFS Replications
The create Replication dialog box displays. Click the source filed and select the source HDFS service.
f)
It is easy to quickly get lost in the details when talking about information security. To minimize confussion we will focus on three fundamental areas.
1. How data is encrypted or otherwise protected while it is in storage (at rest) and when it is moving across the network(in motion).
2. How syatems and users are authenticated before they access data in the Hadoop infrastructure.
3. How accsee to different data is managed within the environment.
The Hadoop ecosystem has resources to support security.Knock and Ranger are two important Apache open source projects.
g)
SQL - on -Hadoop is a class of analytical application tools that combine established SQL-style querying with newer Hadoop data framework elements.
The different means for executing SQL in Hadoop environment can be divided into (1)Connectors that translate SQL into a MapReduce format, (2)"push-down" systems that forgo batch-oriented MapReduce and execute SQL within Hadoop clusters and, (3)Systems that apportion SQL work between MapReduce HDFS clusters or raw HDFS clusters, depending on the workload.
h)
Apache Hadoop is a comprehensive ecosystem which now features many open source components that can fundamentally change an enterprises approach to storing, processing, and analyzing data.
Unlike traditional ralationship database management systems, Hadoop now enables different types, Hadoop now enables different types of analytical workloads to run the same set of data and can also manage data volumes and massive scale with advanced hardware and software applications. We can see many soyrce platforms as popular distributions of Hadoop.
please give me thumb up
3) In a Hadoop environment, there are many capabilities which allow for Hadoop to be integrated...
Read this article. Then write a 250 word response on two of the programs you like the most. Open source business intelligence software 1. BIRT BIRT is an open source BI program that CloudTweaks says is often viewed as the industry standard. BIRT boasts “over 12 million downloads and over 2.5 million developers across 157 countries.” Its users include heavyweights such as Cisco, S1, and IBM (which is also a BIRT sponsor). They also have maturity going for them, as...
please answer question 3. Please do take some updated
information about Sears regarding their possible Bankruptcy and
could the data be a tangible asset used for liquidation. Answr
should be at least 2 paragraph.
The shrinkage data, combined with sale and purchase data, has expanded the organization 2009, Sears decided to begin an initiativ closer to its customers. They wanted to achieve objective by implementing Big Data technol However, their IT capabilities were not up to the It is clear...
Discussion questions
1. What is the link between internal marketing and service
quality in the airline industry?
2. What internal marketing programmes could British Airways
put into place to avoid further internal unrest? What potential is
there to extend auch programmes to external partners?
3. What challenges may BA face in implementing an internal
marketing programme to deliver value to its customers?
(1981)ǐn the context ofbank marketing ths theme has bon pururd by other, nashri oriented towards the identification of...