Wednesday, April 3, 2013

Weird Combiner in Hadoop

I write my own MapReduce functions for K means problem.

When I test my code using a small dataset(Small than 1M), it works very well. However, when I apply it to larger one(More than 100M), it returns errors.

Actually, I am not so sure about my idea. But I believe it is true at 90%.


  • When I test small dataset, combiner just be called once between mapper and reducer.

  • However, when I test large dataset, combiner is called more than once between one pair of mapper and reducer.(It is so weird, and I am not so sure, because I haven't study hadoop source code)
You can find my source code for K-Means here:
https://github.com/zhouhao/Hadoop_KMeans_MapReduce_Java/

No comments:

Post a Comment