RDD.
countByKey
Count the number of elements for each key, and return the result to the master as a dictionary.
Examples
>>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)]) >>> sorted(rdd.countByKey().items()) [('a', 2), ('b', 1)]