Question

Need assistance with syntax errors two python scripts: tester: cat testdata2.txt | python kmeansMapper.py | sort...

Need assistance with syntax errors two python scripts:

tester:

cat testdata2.txt | python kmeansMapper.py | sort | python kmeansReducer.py

#kmeansReducer.py

#!/usr/bin/python

import sys

currId = None # this is the "current" key

currXs = []

currYs = []

id = None

# The input comes from standard input (line by line)

for line in sys.stdin:

    line = line.strip()

    ln = line.split('\t')

    id = ln[0]

if currId == id:

    currXs.append(float(ln[1]))

    currYs.append(float(ln[2]))

else:

  if currId:

    #calculate center

    centerX = sum(currXs)/len(currXs)

    centerY = sum(currYs)/len(currYs)

    print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))

    currXs = []

    currYs = []

currId = id

currXs.append(float(ln[1]))

currYs.append(float(ln[2]))

# output the last key

if currId == id:

    #calculate center

    centerX = sum(currXs)/len(currXs)

    centerY = sum(currYs)/len(currYs)

    print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))

and

cat testdata2.txt | python kmeansMapper.py | sort | python kmeansReducer.py

kmeansMapper.py

#!/usr/bin/python

import sys

import math

fd = open('centers.txt', 'r')

centers = []

for line in fd:

    line = line.strip()

    vals = line.split('')

    centers.extend([vals])

fd.close()

for line in sys.stdin:

    line = line.strip()

    vals = line.split('')

    clusterNum = None

    distance = None

    i = 0

    #compare to each center and store the smallest distance

    for center in centers:

        euclidDist = math.sqrt((float(vals[0])-float(center[0]))**2 + (float(vals[1])

if clusterNum:

            if euclidDist < distance:

                clusterNum = i+1

                distance = euclidDist

        else: #always record the first cluster

            clusterNum = i+1

            distance = euclidDist

        i += 1

print clusterNum, '\t', vals[0], '\t', vals[1]

Dataset = testdata2.txt = {32 45, 23 67, 98 09, 56 87, 13 65, 87 67, 90 78,...}

Dataset = centers.txt = {4 32,55 20, 39 8,17 11 }

what's your question??? Yest it is a mapper and a reducer scripts

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Hi,

Hope you are enjoying coding. I am assuming you are using Python 2 and code is working.

As I can see in your kmeansReducer.py there is some indentation error which I have corrected them.

#!/usr/bin/python

import sys

currId = None# this is the "current"key

currXs = []

currYs = []

id = None

# The input comes from standard input(line by line)

for line in sys.stdin:

line = line.strip()

ln = line.split('\t')

id = ln[0]

if currId == id:
   currXs.append(float(ln[1]))
   currYs.append(float(ln[2]))
else:
   if currId:
       #calculate center
       centerX = sum(currXs) / len(currXs)
       centerY = sum(currYs) / len(currYs)

print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))

currXs = []

currYs = []

currId = id

currXs.append(float(ln[1]))

currYs.append(float(ln[2]))

# output the last key

if currId == id:
  
   #calculate center
  
   centerX = sum(currXs) / len(currXs)
  
   centerY = sum(currYs) / len(currYs)

print '%s %s %s %s' % (centerX, centerY, currId, zip(currXs, currYs))

Also, in your kmeansMapper.py, there is close bracket missing in line where you are calculating euclidDist. I have corrected them also.

#!/usr/bin/python

import sys

import math

fd = open('centers.txt', 'r')

centers = []

for line in fd:

line = line.strip()

vals = line.split('')

centers.extend([vals])

fd.close()

for line in sys.stdin:
   line = line.strip()
   vals = line.split('')
   clusterNum = None
   distance = None
   i = 0
   #compare to each center and store the smallest distance
   for center in centers:
       euclidDist = math.sqrt((float(vals[0]) - float(center[0]))**2) + (float(vals[1]))
   # for center in centers:
   #    euclidDist = math.sqrt((float(vals[0])-float(center[0]))**2 + (float(vals[1]))
      

   if clusterNum:
       if euclidDist < distance:
           clusterNum = i+1
           distance = euclidDist
  
   else: #always record the first cluster
       clusterNum = i+1
       distance = euclidDist
       i += 1
      
   print clusterNum, '\t', vals[0], '\t', vals[1]

Now, it should work and kindly check and let me know.

Happy Coding !!

Thanks

Add a comment
Know the answer?
Add Answer to:
Need assistance with syntax errors two python scripts: tester: cat testdata2.txt | python kmeansMapper.py | sort...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT