RDD.
fullOuterJoin
Perform a right outer join of self and other.
For each element (k, v) in self, the resulting RDD will either contain all pairs (k, (v, w)) for w in other, or the pair (k, (v, None)) if no elements in other have key k.
Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (v, w)) for v in self, or the pair (k, (None, w)) if no elements in self have key k.
Hash-partitions the resulting RDD into the given number of partitions.
New in version 1.2.0.
RDD
another RDD
the number of partitions in new RDD
a RDD containing all pairs of elements with matching keys
See also
RDD.join()
RDD.leftOuterJoin()
RDD.fullOuterJoin()
pyspark.sql.DataFrame.join()
Examples
>>> rdd1 = sc.parallelize([("a", 1), ("b", 4)]) >>> rdd2 = sc.parallelize([("a", 2), ("c", 8)]) >>> sorted(rdd1.fullOuterJoin(rdd2).collect()) [('a', (1, 2)), ('b', (4, None)), ('c', (None, 8))]