When I test my code using a small dataset(Small than 1M), it works very well. However, when I apply it to larger one(More than 100M), it returns errors.
Actually, I am not so sure about my idea. But I believe it is true at 90%.
- When I test small dataset, combiner just be called once between mapper and reducer.
- However, when I test large dataset, combiner is called more than once between one pair of mapper and reducer.(It is so weird, and I am not so sure, because I haven't study hadoop source code)
You can find my source code for K-Means here:
https://github.com/zhouhao/Hadoop_KMeans_MapReduce_Java/
https://github.com/zhouhao/Hadoop_KMeans_MapReduce_Java/