Question

Problem statement For this program, you are to implement a simple machine-learning algorithm that uses a rule-based classifier to predict whether or not a particular patient has diabetes. In order to do so, you will need to first train your program, using a provided data set, to recognize a disease. Once a program is capable of doing it, you will run it on new data sets and predict the existence or absence of a disease. While solving this problem, you are to use a procedural approach in Java. Decompose your program into methods and use small lists where appropriate, i.e. you are only to use data types covered in chapters 1-6 and chapter 10. Training In order to train your program to recognize diabetes, you need to use a data set that contains both data and diagnoses (this is the learning phase). Your program is to ask the user to enter the name of the training file. You may assume that the file will be a txt file. Each line on that file constitutes one patient record. Each patients record consists of 6 attributes and a numerical diagnosis. From our perspective, it does not matter much what these attributes signify. The only important observation here is that the attributes are floating point numbers delimited by tabs. If an attribute is missing, a question mark replaces a number. In your program, treat question marks as zeros. As far as diagnosis is concerned (the 7th value listed on a line), any value0.65 indicates a presence of a disease, while a value0.65 indicates its absence. You may assume that all data in the file will be valid, i.e there is no need to handle user error or file errors. Sample file train.txt is provided Here is a sample line from the input file 25 175.6 101.14 159.4 Since the last element is a zero, this patient has no diabetes. In case you are wondering, the first 6 attributes are: age, sex, random blood sugar level fasting blood sugar level, oral glucose tolerance level, hemoglobin A1c. Based on the input data that follows this format, the program is to create a classifier that will be used later on to process new patients and make diagnostic predictions for these new patients. The classifier is created in the following fashion. For all patients with no diabetes, for each attribute, we calculate the average value of that attribute. For all patients with diabetes, for each attribute, we calculate the average value of that attribute. After processing all the patients, we end up with two sets of averages-one set for healthy patients (use a list) and one set for ill patients (use a list). In short, as you process a line, you need to first determine whether you are dealing with a healthy or ill patient and update your averages accordingly The picture below illustrates the idea (one row in either table is one line in a file). Health ents Ill patients 13 51 10 4 2 16 61 13 54 11 52 14 | S| 3 4 300 115 0 3 10 0Here is a sample line from the input file 175.6 25 101.14 159.4 Since the last element is a zero, this patient has no diabetes. In case you are wondering, the first 6 attributes are: age, sex, random blood sugar level, fasting blood sugar level, oral glucose tolerance level, hemoglobin A1. Based on the input data that follows this format, the program is to create a classifier that will be used later on to process new patients and make diagnostic predictions for these new patients. The classifier is created in the following fashion. For all patients with no diabetes, for each attribute, we caethe average value of that attribute. For all patients with diabetes, for each attribute, wecalate the average value of that attribute. After processing all the patients, we end up with two sets of averages-one set for healthy patients (use a list) and one set for ill patients (use a list). In short, as you process a line, you need to first determine whether you are dealing with a healthy or ill patient and update your averages accordingly The picture below illustrates the idea (one row in either table is one line in a file). Healt atients Ill patients 13 | 51 10 42 16 6 1 13 54 11 5 2 14 5 3 4 300 1 15 |0 1 0 0 3 10 0 135 Average of healthy patients Average of ill patients Once you have these two averages calculated, you need to find a midpoint (mean) between them-these are called the class separation values. 13 5 Average of healthy patients Average of ill patients 8 7.5 Class separation valuesYour program needs to print the averages of healthy patients, ill patients,ad class separation values to the screen. This is the sample run of the beginning of the program (blue highlighting indicates user input) Enter the name of the training set file: train.txt Healthy patients averages: 37.50,0.00,182.80,107-79,164.47,3.65) Ill patients averages: [56.50,1.00,255.72,134.95,277.06, 8.15] Separator values: [47.00,0.50,219.26, 121.37,220.76,5.90] Run Classifier Once you know class separation values, you can run your program on new sets and make predictions for new patients that havent been diagnosed yet. This is where the second input file comes in. The file will be in a csv format and each line in that file will model one patient: his/her id and 6 attributes described above. The attributes will be in the same exact order as in the training file and will be separated by commas. You may assume that all data in the file will be valid, i.e. there is no need to handle user error or file errors. Sample file set1 . csv is provided For each of these patients, you need to compare his attributes to the separator values. If a patients attribute has a higher value than the corresponding separator value, then the attribute needs to be labeled as at risk. Overall, if a patient has more than half at risk attributes, the patient is classified as ill. In the picture below, we are dealing with an ill patient since this patient is above the norm on more than half of the attributes. 47 0.5 219.26 121.37 220.76 5.9 Class separator values 63 1230.45 127.9 203 4.2 Patient values Once you determine if a patient should be diagnosed as having diabetes, you need to print that patients id and diagnosis to results.csv (sample file provided) use 1, if 4 attributes are above average use 2, if 5 attributes are above average use 3, if all attributes ae above average Finally, print to the screen how many patients you processed, how many received diagnosis codes 1, 2, and 3 This is the sample run of the next phase of the program (blue highlighting indicates user input)Enter the name of the training set file: train.txt Healthy patients averages: 37.50,0.00,182.80,107.79,164.47,3.65 Ill patients averages: [56.50,1.00, 255.72,134.95, 277.06,8.15] Separator values: [47.00,0.50,219.26,121.37,220.76,5.90] Run Classifier Once you know class separation values, you can run your program on new sets and make predictions for new patients that havent been diagnosed yet. This is where the second input file comes in. The file will be in a csv format and each line in that file will model one patient: his/her id and 6 attributes described above. The attributes will be in the same exact order as in the training file and will be separated by commas. You may assume that all data in the file will be valid, i.e. there is no need to handle user error or file errors. Sample file setl.csv is provided For each of these patients, you need to compare his attributes to the separator values. If a patients attribute has a higher value than the corresponding separator value, then the attribute needs to be labeled asat risk. Overall, if a patient has more than half at risk attributes, the patient is classified as ill. In the picture below, we are dealing with an ill patient since this patient is above the norm on more than half of the attributes. 7 0.5 219.26 121.37 220.76 5.9 Class separator values 63 1230.45 127.9 203 4.2 Patient values Once you determine if a patient should be diagnosed as having diabetes, you need to print that patients id and diagnosis to results.csv (sample file provided) . use 1, if 4 attributes are above average . use 2, if 5 attributes are above average . use 3, if all attributes are above average Finally, print to the screen how many patients you processed, how many received diagnosis codes 1, 2, and 3 This is the sample run of the next phase of the program (blue highlighting indicates user input) Enter the name of file containing patient data: set1.csv Number of patients: Diagnosis code 1: Diagnosis code 2: Diagnosis code 3: Check results.csv for specific patients and their codes 303 79

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Answer: fromfuture import with statement #COUNT FOR HEALTHY PATIENTS goodCnt0i att1=0 att2=0 att3=0 att4=0 att5=0 #COUNT FORbadCountt-1 #HEALTHY PATIENTS AVERAGE Hatt1-att1/goodCnt Hatt2-att2/goodCnt Hatt3-att 3/goodCnt Hatt Hatt5-att5/goodCnt Hatt6if float (st [6])-0: ng+=1 ntwnt+i1 else: Accuracy(ng/geodCnt) pant Accuracy : print Accuracy #DIAGNOSING TEST DATA SET lencode:

from __future__ import with_statement
#COUNT FOR HEALTHY PATIENTS
goodCnt=0;
att1=0
att2=0
att3=0
att4=0
att5=0
att6=0
#COUNT FOR ILL PATIENTS
badCount=0;
attA=0
attB=0
attC=0
attD=0
attE=0
attF=0
attG=0
#READING TRAINING DATA SET
filen=raw_input("enter the name of the training set file:")
with open(filen,'rb') as f:
for temp in f:
st=temp.split(",")
#HEALTHY PATIENTS
if float(st[6])==0:
att1+=(float(st[0]))
att2+=(float(st[1]))
att3+=(float(st[2]))
att4+=(float(st[3]))
att5+=(float(st[4]))
att6+=(float(st[5]))   
goodCnt+=1
#ILL PATIENTS
else:
attA+=(float(st[0]))
attB+=(float(st[1]))
attC+=(float(st[2]))
attD+=(float(st[3]))
attE+=(float(st[4]))
attF+=(float(st[5]))   
badCount+=1
#HEALTHY PATIENTS AVERAGE
Hatt1=att1/goodCnt
Hatt2=att2/goodCnt
Hatt3=att3/goodCnt
Hatt4=att4/goodCnt
Hatt5=att5/goodCnt
Hatt6=att6/goodCnt
#ILL PATIENTS AVERAGE
HattA=attA/badCount
HattB=attB/badCount
HattC=attC/badCount
HattD=attD/badCount
HattE=attE/badCount
HattF=attF/badCount

#PRINT HEALTHY AND ILL PATIENTS AVERAGE
print "healthy patients' averages:"
print Hatt1,Hatt2,Hatt3,Hatt4,Hatt5,Hatt6

print "Ill patients' averages:"
print HattA,HattB,HattC,HattD,HattE,HattF

#FIND SEPARATOR VALUES
satt1=(Hatt1+HattA)/2
satt2=(Hatt2+HattB)/2
satt3=(Hatt3+HattC)/2
satt4=(Hatt4+HattD)/2
satt5=(Hatt5+HattE)/2
satt6=(Hatt6+HattF)/2

#DISPLAY SEPARATOR VALUES
print "Separator values:"
print satt1,satt2,satt3,satt4,satt5,satt6

#FINDING ACCURACY
ng=0;
nt=0
with open(filen,'rb') as f:
for temp in f:
st=temp.split(",")
if float(st[6])==0:
ng+=1
nt=nt+1
else:
nt+=1
Accuracy=(ng/goodCnt)

print "Accuracy:"
print Accuracy

#DIAGNOSING TEST DATA SET   
infilename=raw_input("enter the name of the test set file:")
outfile=raw_input("Enter the name of the output file:")
ot=open(outfile,'wb')
with open(infilename,'rb') as f:
for temp in f:
st=temp.split(",")
att1=(int(st[0]))
att2=(float(st[1]))
att3=(float(st[2]))
att4=(float(st[3]))
att5=(float(st[4]))
att6=(float(st[5]))
att7=(float(st[6]))
if((att2>satt1) or (att3>satt2) or (att4>satt3) or (att5>satt4) or (att6>satt5) or (att7>satt6)):
st=str(att1)+",Ill"
else:
st=str(att1)+",Healthy"
ot.write(st) #WRITING TO FILE
ot.write(" ")
ot.close()#CLOSE OUTPUT FILE
print "Done"

Note: Due to time constraint i am not able to prepare the flowchart

Add a comment
Know the answer?
Add Answer to:
Problem statement For this program, you are to implement a simple machine-learning algorithm that uses a...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Write a program that will take input from a file of numbers of type double and...

    Write a program that will take input from a file of numbers of type double and output the average of the numbers in the file to the screen. Output the file name and average. Allow the user to process multiple files in one run. Part A use an array to hold the values read from the file modify your average function so that it receives an array as input parameter, averages values in an array and returns the average Part...

  • In this assignment, you will write a program in C++ which uses files and nested loops...

    In this assignment, you will write a program in C++ which uses files and nested loops to create a file from the quiz grades entered by the user, then reads the grades from the file and calculates each student’s average grade and the average quiz grade for the class. Each student takes 6 quizzes (unknown number of students). Use a nested loop to write each student’s quiz grades to a file. Then read the data from the file in order...

  • Any little bit of information will be helpful. Thank you. Problem Statement You are to develop a program to read Mo...

    Any little bit of information will be helpful. Thank you. Problem Statement You are to develop a program to read MotoGP rider information from an input file. You will need a MotoGpRider class whose instances will be used to store the statistics for a MotoGP motorcycle racer. The Rider class will have attributes for last name, first name, racing number, nation, motorcycle make, world championship points, world championship position, and an array that stores the number of top finishes (number...

  • Problem: You will write a program to compute some statistics based on monthly average temperatures for...

    Problem: You will write a program to compute some statistics based on monthly average temperatures for a given month in each of the years 1901 to 2016. The data for the average August temperatures in the US has been downloaded from the Climate Change Knowledge Portal, and placed in a file named “tempAugData.txt”, available on the class website. The file contains a sequence of 116 values. The temperatures are in order, so that the first one is for 1901, the...

  • PLEASE DO THIS IN C#.Design and implement a program (name it ProcessGrades) that reads from the...

    PLEASE DO THIS IN C#.Design and implement a program (name it ProcessGrades) that reads from the user four integer values between 0 and 100, representing grades. The program then, on separate lines, prints out the entered grades followed by the highest grade, lowest grade, and averages of all four grades. Format the outputs following the sample runs below. Sample run 1: You entered: 95, 80, 100, 70 Highest grade: 100 Lowest grade: 70 Average grade: 86.25

  • I need this in JAVA please: Design and implement a program (name it MinMaxAvg) that defines...

    I need this in JAVA please: Design and implement a program (name it MinMaxAvg) that defines three methods as follows: Method max (int x, int y, int z) returns the maximum value of three integer values. Method min (int X, int y, int z) returns the minimum value of three integer values. Method average (int x, int y, int z) returns the average of three integer values. In the main method, test all three methods with different input value read...

  • HELP NEEDED in C++ (preferred) or any other language Write a simple program where you create...

    HELP NEEDED in C++ (preferred) or any other language Write a simple program where you create an array of single byte characters. Make the array 100 bytes long. In C this would be an array of char. Use pointers and casting to put INTEGER (4 byte) and CHARACTER (1 byte) data into the array and pull it out. Make sure you can access both character and integer data at any location in the array. Read data from a text file....

  • Topics: list, file input/output (Python) You will write a program that allows the user to read...

    Topics: list, file input/output (Python) You will write a program that allows the user to read grade data from a text file, view computed statistical values based on the data, and to save the computed statistics to a text file. You will use a list to store the data read in, and for computing the statistics. You must use functions. The data: The user has the option to load a data file. The data consists of integer values representing student...

  • In this assignment, you are to write a C++ program (using Visual C++ 2015) that reads...

    In this assignment, you are to write a C++ program (using Visual C++ 2015) that reads training data in WEKA arff format and generates ID3 decision tree in a format similar to that of the tree generated by Weka ID3. Please note the following: Your algorithm will use the entire data set to generate the tree. You may assume that the attributes (a) are of nominal type (i.e., no numeric data), and (b) have no missing values. In general, the...

  • Create the Python code for a program adhering to the following specifications. Write an Employee class...

    Create the Python code for a program adhering to the following specifications. Write an Employee class that keeps data attributes for the following pieces of information: - Employee Name (a string) - Employee Number (a string) Make sure to create all the accessor, mutator, and __str__ methods for the object. Next, write a class named ProductionWorker that is a subclass of the Employee class. The ProductionWorker class should keep data attributes for the following information: - Shift number (an integer,...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT