pyspark.sql.streaming.DataStreamWriter¶
-
class
pyspark.sql.streaming.
DataStreamWriter
(df: DataFrame)[source]¶ Interface used to write a streaming
DataFrame
to external storage systems (e.g. file systems, key-value stores, etc). UseDataFrame.writeStream
to access this.New in version 2.0.0.
Changed in version 3.5.0: Supports Spark Connect.
Notes
This API is evolving.
Examples
The example below uses Rate source that generates rows continuously. After that, we operate a modulo by 3, and then writes the stream out to the console. The streaming query stops in 3 seconds.
>>> import time >>> df = spark.readStream.format("rate").load() >>> df = df.selectExpr("value % 3 as v") >>> q = df.writeStream.format("console").start() >>> time.sleep(3) >>> q.stop()
Methods
foreach
(f)Sets the output of the streaming query to be processed using the provided writer
f
.foreachBatch
(func)Sets the output of the streaming query to be processed using the provided function.
format
(source)Specifies the underlying output data source.
option
(key, value)Adds an output option for the underlying data source.
options
(**options)Adds output options for the underlying data source.
outputMode
(outputMode)Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
partitionBy
(*cols)Partitions the output by the given columns on the file system.
queryName
(queryName)Specifies the name of the
StreamingQuery
that can be started withstart()
.start
([path, format, outputMode, …])Streams the contents of the
DataFrame
to a data source.toTable
(tableName[, format, outputMode, …])Starts the execution of the streaming query, which will continually output results to the given table as new data arrives.
trigger
(*[, processingTime, once, …])Set the trigger for the stream query.