Class AbstractBulkWriterContext
- java.lang.Object
-
- org.apache.cassandra.spark.bulkwriter.AbstractBulkWriterContext
-
- All Implemented Interfaces:
com.esotericsoftware.kryo.KryoSerializable,BulkWriterContext
- Direct Known Subclasses:
CassandraBulkWriterContext,CassandraCoordinatedBulkWriterContext
public abstract class AbstractBulkWriterContext extends java.lang.Object implements BulkWriterContext, com.esotericsoftware.kryo.KryoSerializable
Abstract base class for BulkWriterContext implementations.Serialization Architecture:
This class is NOT serialized directly. Instead:
- Driver creates BulkWriterContext using constructor
- Driver extracts BulkWriterConfig in
CassandraBulkSourceRelationconstructor - BulkWriterConfig gets broadcast to executors
- Executors reconstruct BulkWriterContext via
BulkWriterContext.from(BulkWriterConfig)
Broadcastable wrappers used in BulkWriterConfig:
IBroadcastableClusterInfo→ reconstructs toCassandraClusterInfoorCassandraClusterInfoGroupBroadcastableJobInfo→ reconstructs toCassandraJobInfoBroadcastableSchemaInfo→ reconstructs toCassandraSchemaInfo
Implements KryoSerializable with fail-fast approach to detect missing Kryo registration.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringKRYO_REGISTRATION_WARNINGUse the implementation of the KryoSerializable interface as a detection device to make sureSbwKryoRegistratoris properly in place.
-
Constructor Summary
Constructors Modifier Constructor Description protectedAbstractBulkWriterContext(BulkSparkConf conf, org.apache.spark.sql.types.StructType structType, int sparkDefaultParallelism)Constructor for driver usage.protectedAbstractBulkWriterContext(BulkWriterConfig config)Constructor for executor usage.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description org.apache.cassandra.bridge.CassandraBridgebridge()protected org.apache.cassandra.bridge.CassandraBridgebuildCassandraBridge()protected abstract ClusterInfobuildClusterInfo()protected JobInfobuildJobInfo()protected org.apache.cassandra.spark.common.stats.JobStatsPublisherbuildJobStatsPublisher()protected SchemaInfobuildSchemaInfo(org.apache.spark.sql.types.StructType structType)protected TransportContextbuildTransportContext(boolean isOnDriver)BulkSparkConfbulkSparkConf()ClusterInfocluster()protected TransportContextcreateTransportContext(boolean isOnDriver)protected java.lang.StringfindLowestCassandraVersion()protected abstract MultiClusterContainer<java.util.UUID>generateRestoreJobIds()Generate the restore job IDs used in the receiving Cassandra Sidecar clusters.protected TableSchemainitializeTableSchema(BulkSparkConf conf, org.apache.spark.sql.types.StructType dfSchema, TableInfoProvider tableInfoProvider, java.lang.String lowestCassandraVersion)JobInfojob()org.apache.cassandra.spark.common.stats.JobStatsPublisherjobStats()protected java.lang.StringlowestCassandraVersion()voidread(com.esotericsoftware.kryo.Kryo kryo, com.esotericsoftware.kryo.io.Input input)protected ClusterInforeconstructClusterInfoOnExecutor(IBroadcastableClusterInfo clusterInfo)Reconstructs ClusterInfo on executors from broadcastable versions.protected JobInforeconstructJobInfoOnExecutor(BroadcastableJobInfo jobInfo)Reconstructs JobInfo on executors from BroadcastableJobInfo.protected SchemaInforeconstructSchemaInfoOnExecutor(BroadcastableSchemaInfo schemaInfo)Reconstructs SchemaInfo on executors from BroadcastableSchemaInfo.SchemaInfoschema()voidshutdown()protected intsparkDefaultParallelism()TransportContexttransportContext()protected abstract voidvalidateKeyspaceReplication()voidwrite(com.esotericsoftware.kryo.Kryo kryo, com.esotericsoftware.kryo.io.Output output)
-
-
-
Field Detail
-
KRYO_REGISTRATION_WARNING
public static final java.lang.String KRYO_REGISTRATION_WARNING
Use the implementation of the KryoSerializable interface as a detection device to make sureSbwKryoRegistratoris properly in place.If this class is serialized by Kryo, it means we're not set up correctly, and therefore we log and fail. This failure will occur early in the job and be very clear, so users can quickly fix their code and get up and running again, rather than having a random NullPointerException further down the line.
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
AbstractBulkWriterContext
protected AbstractBulkWriterContext(@NotNull BulkSparkConf conf, @NotNull org.apache.spark.sql.types.StructType structType, @NotNull int sparkDefaultParallelism)Constructor for driver usage. Builds all components fresh on the driver.- Parameters:
conf- Bulk Spark configurationstructType- DataFrame schemasparkDefaultParallelism- Spark default parallelism
-
AbstractBulkWriterContext
protected AbstractBulkWriterContext(@NotNull BulkWriterConfig config)Constructor for executor usage. Reconstructs components from broadcast configuration on executors. This is used by the factory methodBulkWriterContext.from(BulkWriterConfig).- Parameters:
config- immutable configuration for the bulk writer with pre-computed values
-
-
Method Detail
-
bulkSparkConf
public final BulkSparkConf bulkSparkConf()
-
sparkDefaultParallelism
protected final int sparkDefaultParallelism()
-
lowestCassandraVersion
protected java.lang.String lowestCassandraVersion()
-
buildClusterInfo
protected abstract ClusterInfo buildClusterInfo()
-
reconstructClusterInfoOnExecutor
protected ClusterInfo reconstructClusterInfoOnExecutor(IBroadcastableClusterInfo clusterInfo)
Reconstructs ClusterInfo on executors from broadcastable versions. This method is only called on executors when reconstructing BulkWriterContext from broadcast BulkWriterConfig. Each broadcastable type knows how to reconstruct itself into the appropriate full ClusterInfo implementation.- Parameters:
clusterInfo- the BroadcastableClusterInfo from broadcast- Returns:
- reconstructed ClusterInfo (CassandraClusterInfo or CassandraClusterInfoGroup)
-
reconstructJobInfoOnExecutor
protected JobInfo reconstructJobInfoOnExecutor(BroadcastableJobInfo jobInfo)
Reconstructs JobInfo on executors from BroadcastableJobInfo. This method is only called on executors when reconstructing BulkWriterContext from broadcast BulkWriterConfig. It rebuilds CassandraJobInfo with TokenPartitioner reconstructed from the broadcastable partition mappings.- Parameters:
jobInfo- the BroadcastableJobInfo from broadcast- Returns:
- reconstructed CassandraJobInfo
-
reconstructSchemaInfoOnExecutor
protected SchemaInfo reconstructSchemaInfoOnExecutor(BroadcastableSchemaInfo schemaInfo)
Reconstructs SchemaInfo on executors from BroadcastableSchemaInfo. This method is only called on executors when reconstructing BulkWriterContext from broadcast BulkWriterConfig. It reconstructs CassandraSchemaInfo and TableSchema from the broadcast data (no Sidecar calls needed).- Parameters:
schemaInfo- the BroadcastableSchemaInfo from broadcast- Returns:
- reconstructed CassandraSchemaInfo
-
validateKeyspaceReplication
protected abstract void validateKeyspaceReplication()
-
buildJobInfo
protected JobInfo buildJobInfo()
-
generateRestoreJobIds
protected abstract MultiClusterContainer<java.util.UUID> generateRestoreJobIds()
Generate the restore job IDs used in the receiving Cassandra Sidecar clusters. In the coordinated write mode, there should be a unique uuid per cluster; In the single cluster write mode, the MultiClusterContainer would contain one single entry.- Returns:
- restore job ids that are unique per cluster
-
buildCassandraBridge
protected org.apache.cassandra.bridge.CassandraBridge buildCassandraBridge()
-
buildTransportContext
protected TransportContext buildTransportContext(boolean isOnDriver)
-
buildJobStatsPublisher
protected org.apache.cassandra.spark.common.stats.JobStatsPublisher buildJobStatsPublisher()
-
findLowestCassandraVersion
protected java.lang.String findLowestCassandraVersion()
-
buildSchemaInfo
protected SchemaInfo buildSchemaInfo(org.apache.spark.sql.types.StructType structType)
-
job
public JobInfo job()
- Specified by:
jobin interfaceBulkWriterContext
-
cluster
public ClusterInfo cluster()
- Specified by:
clusterin interfaceBulkWriterContext
-
schema
public SchemaInfo schema()
- Specified by:
schemain interfaceBulkWriterContext
-
bridge
public org.apache.cassandra.bridge.CassandraBridge bridge()
- Specified by:
bridgein interfaceBulkWriterContext
-
jobStats
public org.apache.cassandra.spark.common.stats.JobStatsPublisher jobStats()
- Specified by:
jobStatsin interfaceBulkWriterContext
-
transportContext
public TransportContext transportContext()
- Specified by:
transportContextin interfaceBulkWriterContext
-
shutdown
public void shutdown()
- Specified by:
shutdownin interfaceBulkWriterContext
-
initializeTableSchema
@NotNull protected TableSchema initializeTableSchema(@NotNull BulkSparkConf conf, @NotNull org.apache.spark.sql.types.StructType dfSchema, TableInfoProvider tableInfoProvider, java.lang.String lowestCassandraVersion)
-
createTransportContext
@NotNull protected TransportContext createTransportContext(boolean isOnDriver)
-
write
public void write(com.esotericsoftware.kryo.Kryo kryo, com.esotericsoftware.kryo.io.Output output)- Specified by:
writein interfacecom.esotericsoftware.kryo.KryoSerializable
-
read
public void read(com.esotericsoftware.kryo.Kryo kryo, com.esotericsoftware.kryo.io.Input input)- Specified by:
readin interfacecom.esotericsoftware.kryo.KryoSerializable
-
-