Question

Question Discuss three (3) issues to consider during data cleaning. Give an example of a real...

Question

Discuss three (3) issues to consider during data cleaning. Give an example of a real world problem for each issue.?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

​Data Cleansing or Scrubbing is the process of detecting & removing inconsistencies & errors from data to improve the quality of data. The need for data cleansing increases significantly when multiple data sources are integrated. This process of making data accurate and consistent is riddled with many problems, few of which are mentioned below:

  1. Misspellings:

    Misspellings occur mostly due to typing error. The wrong spelling can be detected and corrected for common words and grammatical errors, however, as database constrain huge amount of data that is unique, it is hard to detect spelling mistake at input-level. Further, Spelling mistakes in data such as names, addresses are always difficult to identify and correct.

  2. Lexical Errors:

    Lexical errors occur in data due to name discrepancies between the structure of the data items and the specified format. Example, a particular database records attribute for name, age, sex and height. When an individual does not enter an intermediate value say (age) the data for following attributes changes field. In above case, when individual does not enter value for age, value for sex, say male is read as age and value of height is read as sex.

  3. Misfielded Value:

    Misfielded value problem occurs when the values entered are correct as far format is concerned but does not belong to the field. Example in field of city, value recorded is Germany.

  4. Domain Format Errors:

    Domain format errors occur when the value for a particular attribute is correct but do not comply with format of domain. Example, a particular NAME database requires first name and surname to be separated with comma but the input is without comma. In this case while the input may be correct but it does not comply with domain format.

  5. Irregularities:

    Irregularities deal with non-uniform use of units or values. Example while doing entry of salary of employee, the salary is mentioned using different currencies. This kind of data requires subjective interpretation and can often result in wrong results.

  6. Missing Values:

    Missing values occur as a result of omissions that happen while collecting the data. They signify unavailability of values during process of data entry. Both dummy values and null values are included in missing values. For example, 000-0000 and 999-9999 in the telephone number field.

Add a comment
Know the answer?
Add Answer to:
Question Discuss three (3) issues to consider during data cleaning. Give an example of a real...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT