Question

Pleaaase help me :(( I need new and unique answers, please. (Use your own words, don't...

Pleaaase help me :((

I need new and unique answers, please. (Use your own words, don't copy and paste), Please Use your keyboard (Don't use handwriting) Thank you..

Q1: Describe the differences between structured and unstructured data. Explain structured data in big data environment and give one (1) example of machine generated structured data.

Q2: What does data pre-processing mean in Data Mining and why is it important? Explain the five (5) steps in data pre-processing?

Q3: What is an outlier? Why outlier detection is important in some of the data mining applications? Give examples of such applications.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Q1.

Describe the differences between structured and unstructured data.

Structured data are organised into a formatted pattern while unstructured data do not follow pattern. Here pattern is more relevant to machines rather than humans.

Structured data can be stored in databases with well-defined schemas while the unstructured data cannot be stored in a tabular form with defined attributes.

Due to defined attributes, structured data can be indexed for which a data is easily searchable following some standard searching algorithms. In unstructured data, the search operation is tedious since the data do not follow specific indexing and mostly followed lexicographic model as in dictionaries.

Transaction management and other concurrency control operations can be easily implemented in structured data while no such consistency and reliability measures can be taken in an unstructured data.

Structured data suffers from scalability problem while unstructured data are easily scalable.

Explain structured data in big data environment

Data that are created in large volumes are called Big data. These data are continuously generated in the form of structured data from broadly two sources.

  1. Machine generated
  2. Human generated

Machine generated data are generally produced by the specific machines involved within specific domains. These machines are generally fully automated except the operational procedures that may require partial human intervention.

Give one (1) example of machine generated structured data.

The level of potassium under the root of a plant is being continuously monitored and measured by a sensor where the relevant data (of course structured) is being captured and sent to a computing system for analysis.

Q2:

What does data pre-processing mean in Data Mining

Data pre-processing is a method of converting a raw data into such a format so that an analytical system can read the prepared data without inconsistency and biasness and perform analysis with higher accuracy.

Why is it important?

Raw data suffers from:

  1. incompleteness such as missing attributes, data in a combined form etc.
  2. inconsistency due to inability to implement ACID properties
  3. noise due to erroneous data and outliers

Due to the above problems, the analysis will be biased, erroneous and far from accuracy. In order to perform more accurate analysis, the system must be provided with clean, processed, structured and error-free data.

Explain the five (5) steps in data pre-processing?

  • Data Cleaning: This step deals with missing values where suitable statistical algorithm is applied to fill the gap. The noisy data, if any, is also removed using Data Smoothing techniques. The raw data are resolving for any inconsistency in the data.
  • Data Integration: This step involves integrating the raw data with different representations into a consistent and rational format to avoid conflicts.
  • Data Transformation: This step involves data standardisation and aggregation and generalisation of the integrated data.
  • Data Reduction: This step attempts to reduce the size of data for efficient storage. This may be achieved through dimensionality reduction, compression algorithms etc.
  • Data Discretisation: This step involves creating chunks of data into discretised variables called buckets. This helps in reduced number of states that consequently helps in managing data in an easier manner.

Q3:

What is an outlier?

Among a set of observations, an observation that lies at a significantly different distance from the other values in a random sample is called an outlier.

Why outlier detection is important in some of the data mining applications?

Outlier detections are very important because they affect the mean and median that eventually generates a deviation along the error line in any data set. This deviation can potentially form a bias towards an analysis performed on the dataset. In some applications, this deviation or distorted cluster formation (in case of clustering technique) indicates that there are outliers present in the dataset that appears ‘unnatural’ to the general trend of the dataset.

Give examples of such applications.

  1. Intrusion detection: Any unwanted and harmful data can be identified by an outlier since its position will appear in an unusual position in the dataset.
  2. Fraud detection of credit cards, calling cards etc.
  3. Medical and health care: Identifying outliers in medical and health data helps to locate and identifying disease
  4. Industrial damage: An outlier in the sensor data in an industry may help to be proactive on the part of the team to avert probable damages.
Add a comment
Know the answer?
Add Answer to:
Pleaaase help me :(( I need new and unique answers, please. (Use your own words, don't...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT