Need assistance with syntax errors two python scripts:
tester:
cat testdata2.txt | python kmeansMapper.py | sort | python kmeansReducer.py
#kmeansReducer.py
#!/usr/bin/python
import sys
currId = None # this is the "current" key
currXs = []
currYs = []
id = None
# The input comes from standard input (line by line)
for line in sys.stdin:
line = line.strip()
ln = line.split('\t')
id = ln[0]
if currId == id:
currXs.append(float(ln[1]))
currYs.append(float(ln[2]))
else:
if currId:
#calculate center
centerX = sum(currXs)/len(currXs)
centerY = sum(currYs)/len(currYs)
print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))
currXs = []
currYs = []
currId = id
currXs.append(float(ln[1]))
currYs.append(float(ln[2]))
# output the last key
if currId == id:
#calculate center
centerX = sum(currXs)/len(currXs)
centerY = sum(currYs)/len(currYs)
print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))
and
cat testdata2.txt | python kmeansMapper.py | sort | python kmeansReducer.py
kmeansMapper.py
#!/usr/bin/python
import sys
import math
fd = open('centers.txt', 'r')
centers = []
for line in fd:
line = line.strip()
vals = line.split('')
centers.extend([vals])
fd.close()
for line in sys.stdin:
line = line.strip()
vals = line.split('')
clusterNum = None
distance = None
i = 0
#compare to each center and store the smallest distance
for center in centers:
euclidDist = math.sqrt((float(vals[0])-float(center[0]))**2 + (float(vals[1])
if clusterNum:
if euclidDist < distance:
clusterNum = i+1
distance = euclidDist
else: #always record the first cluster
clusterNum = i+1
distance = euclidDist
i += 1
print clusterNum, '\t', vals[0], '\t', vals[1]
Dataset = testdata2.txt = {32 45, 23 67, 98 09, 56 87, 13 65, 87 67, 90 78,...}
Dataset = centers.txt = {4 32,55 20, 39 8,17 11 }
what's your question??? Yest it is a mapper and a reducer scripts
Hi,
Hope you are enjoying coding. I am assuming you are using Python 2 and code is working.
As I can see in your kmeansReducer.py there is some indentation error which I have corrected them.
#!/usr/bin/python
import sys
currId = None# this is the "current"key
currXs = []
currYs = []
id = None
# The input comes from standard input(line by line)
for line in sys.stdin:
line = line.strip()
ln = line.split('\t')
id = ln[0]
if currId == id:
currXs.append(float(ln[1]))
currYs.append(float(ln[2]))
else:
if currId:
#calculate center
centerX = sum(currXs) /
len(currXs)
centerY = sum(currYs) /
len(currYs)
print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))
currXs = []
currYs = []
currId = id
currXs.append(float(ln[1]))
currYs.append(float(ln[2]))
# output the last key
if currId == id:
#calculate center
centerX = sum(currXs) / len(currXs)
centerY = sum(currYs) / len(currYs)
print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))
Also, in your kmeansMapper.py, there is close bracket missing in line where you are calculating euclidDist. I have corrected them also.
#!/usr/bin/python
import sys
import math
fd = open('centers.txt', 'r')
centers = []
for line in fd:
line = line.strip()
vals = line.split('')
centers.extend([vals])
fd.close()
for line in sys.stdin:
line = line.strip()
vals = line.split('')
clusterNum = None
distance = None
i = 0
#compare to each center and store the smallest
distance
for center in centers:
euclidDist =
math.sqrt((float(vals[0]) - float(center[0]))**2) +
(float(vals[1]))
# for center in centers:
# euclidDist =
math.sqrt((float(vals[0])-float(center[0]))**2 +
(float(vals[1]))
if clusterNum:
if euclidDist < distance:
clusterNum =
i+1
distance =
euclidDist
else: #always record the first cluster
clusterNum = i+1
distance = euclidDist
i += 1
print clusterNum, '\t', vals[0], '\t', vals[1]
Now, it should work and kindly check and let me know.
Happy Coding !!
Thanks
Need assistance with syntax errors two python scripts: tester: cat testdata2.txt | python kmeansMapper.py | sort...
IT PYTHON
QUESTION1 Consider the following Python code, where infile.txt and outfile.txt both exist in the current directory 'z' ) 。1d = open ( ' infile. txt ' , for line in old: new.write (line) new.write') ne«.close () old.close) Which of the following options best describes the purpose or outcome of this code? O A copy of the file infile.txt is made (except in double line spacing) and saved as outfile.txt in the current directory. O A copy of the...
Could anyone help add to my python code? I now need to calculate the mean and median. In this programming assignment you are to extend the program you wrote for Number Stats to determine the median and mode of the numbers read from the file. You are to create a program called numstat2.py that reads a series of integer numbers from a file and determines and displays the following: The name of the file. The sum of the numbers. The...