A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.The proposed directory structure is displayed below:Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?
A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30 seconds. The streaming write is currently configured with a trigger interval of 10 seconds.Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?
Which statement describes Delta Lake Auto Compaction?
Which statement characterizes the general programming model used by Spark Structured Streaming?
Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?
A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum.Which situation is causing increased duration of the overall job?