pyspark.sql.streaming.DataStreamReader.schema¶
-
DataStreamReader.
schema
(schema: Union[pyspark.sql.types.StructType, str]) → pyspark.sql.streaming.readwriter.DataStreamReader[source]¶ Specifies the input schema.
Some data sources (e.g. JSON) can infer the input schema automatically from data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading.
New in version 2.0.0.
Changed in version 3.5.0: Supports Spark Connect.
- Parameters
- schema
pyspark.sql.types.StructType
or str a
pyspark.sql.types.StructType
object or a DDL-formatted string (For examplecol0 INT, col1 DOUBLE
).
- schema
Notes
This API is evolving.
Examples
>>> from pyspark.sql.types import StructField, StructType, StringType >>> spark.readStream.schema(StructType([StructField("data", StringType(), True)])) <...streaming.readwriter.DataStreamReader object ...> >>> spark.readStream.schema("col0 INT, col1 DOUBLE") <...streaming.readwriter.DataStreamReader object ...>
The example below specifies a different schema to CSV file.
>>> import tempfile >>> import time >>> with tempfile.TemporaryDirectory() as d: ... # Start a streaming query to read the CSV file. ... spark.readStream.schema("col0 INT, col1 STRING").format("csv").load(d).printSchema() root |-- col0: integer (nullable = true) |-- col1: string (nullable = true)