{{/* Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */}}
Key | Default | Type | Description |
---|---|---|---|
end-input.watermark |
(none) | Long | Optional endInput watermark used in case of batch mode or bounded stream. |
lookup.async |
false | Boolean | Whether to enable async lookup join. |
lookup.async-thread-number |
16 | Integer | The thread number for lookup async. |
lookup.bootstrap-parallelism |
4 | Integer | The parallelism for bootstrap in a single task for lookup join. |
lookup.cache |
AUTO | Enum |
The cache mode of lookup join. Possible values:
|
lookup.dynamic-partition |
(none) | String | Specific dynamic partition for lookup, only support 'max_pt()' currently. |
lookup.dynamic-partition.refresh-interval |
1 h | Duration | Specific dynamic partition refresh interval for lookup, scan all partitions and obtain corresponding partition. |
lookup.refresh.async |
false | Boolean | Whether to refresh lookup table in an async thread. |
lookup.refresh.async.pending-snapshot-count |
5 | Integer | If the pending snapshot count exceeds the threshold, lookup operator will refresh the table in sync. |
partition.end-input-to-done |
false | Boolean | Whether mark the done status to indicate that the data is ready when end input. |
partition.idle-time-to-done |
(none) | Duration | Set a time duration when a partition has no new data after this time duration, mark the done status to indicate that the data is ready. |
partition.time-interval |
(none) | Duration | You can specify time interval for partition, for example, daily partition is '1 d', hourly partition is '1 h'. |
scan.infer-parallelism |
true | Boolean | If it is false, parallelism of source are set by global parallelism. Otherwise, source parallelism is inferred from splits number (batch mode) or bucket number(streaming mode). |
scan.infer-parallelism.max |
1024 | Integer | If scan.infer-parallelism is true, limit the parallelism of source through this option. |
scan.parallelism |
(none) | Integer | Define a custom parallelism for the scan source. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration. If user enable the scan.infer-parallelism, the planner will derive the parallelism by inferred parallelism. |
scan.push-down |
true | Boolean | If true, flink will push down projection, filters, limit to the source. The cost is that it is difficult to reuse the source in a job. With flink 1.18 or higher version, it is possible to reuse the source even with projection push down. |
scan.remove-normalize |
false | Boolean | Whether to force the removal of the normalize node when streaming read. Note: This is dangerous and is likely to cause data errors if downstream is used to calculate aggregation and the input is not complete changelog. |
scan.split-enumerator.batch-size |
10 | Integer | How many splits should assign to subtask per batch in StaticFileStoreSplitEnumerator to avoid exceed `akka.framesize` limit. |
scan.split-enumerator.mode |
fair | Enum |
The mode used by StaticFileStoreSplitEnumerator to assign splits. Possible values:
|
scan.watermark.alignment.group |
(none) | String | A group of sources to align watermarks. |
scan.watermark.alignment.max-drift |
(none) | Duration | Maximal drift to align watermarks, before we pause consuming from the source/task/partition. |
scan.watermark.alignment.update-interval |
1 s | Duration | How often tasks should notify coordinator about the current watermark and how often the coordinator should announce the maximal aligned watermark. |
scan.watermark.emit.strategy |
on-event | Enum |
Emit strategy for watermark generation. Possible values:
|
scan.watermark.idle-timeout |
(none) | Duration | If no records flow in a partition of a stream for that amount of time, then that partition is considered "idle" and will not hold back the progress of watermarks in downstream operators. |
sink.clustering.by-columns |
(none) | String | Specifies the column name(s) used for comparison during range partitioning, in the format 'columnName1,columnName2'. If not set or set to an empty string, it indicates that the range partitioning feature is not enabled. This option will be effective only for bucket unaware table without primary keys and batch execution mode. |
sink.clustering.sample-factor |
100 | Integer | Specifies the sample factor. Let S represent the total number of samples, F represent the sample factor, and P represent the sink parallelism, then S=F×P. The minimum allowed sample factor is 20. |
sink.clustering.sort-in-cluster |
true | Boolean | Indicates whether to further sort data belonged to each sink task after range partitioning. |
sink.clustering.strategy |
"auto" | String | Specifies the comparison algorithm used for range partitioning, including 'zorder', 'hilbert', and 'order', corresponding to the z-order curve algorithm, hilbert curve algorithm, and basic type comparison algorithm, respectively. When not configured, it will automatically determine the algorithm based on the number of columns in 'sink.clustering.by-columns'. 'order' is used for 1 column, 'zorder' for less than 5 columns, and 'hilbert' for 5 or more columns. |
sink.committer-cpu |
1.0 | Double | Sink committer cpu to control cpu cores of global committer. |
sink.committer-memory |
(none) | MemorySize | Sink committer memory to control heap memory of global committer. |
sink.committer-operator-chaining |
true | Boolean | Allow sink committer and writer operator to be chained together |
sink.cross-partition.managed-memory |
256 mb | MemorySize | Weight of managed memory for RocksDB in cross-partition update, Flink will compute the memory size according to the weight, the actual memory used depends on the running environment. |
sink.managed.writer-buffer-memory |
256 mb | MemorySize | Weight of writer buffer in managed memory, Flink will compute the memory size for writer according to the weight, the actual memory used depends on the running environment. |
sink.parallelism |
(none) | Integer | Defines a custom parallelism for the sink. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration. |
sink.savepoint.auto-tag |
false | Boolean | If true, a tag will be automatically created for the snapshot created by flink savepoint. |
sink.use-managed-memory-allocator |
false | Boolean | If true, flink sink will use managed memory for merge tree; otherwise, it will create an independent memory allocator. |
source.checkpoint-align.enabled |
false | Boolean | Whether to align the flink checkpoint with the snapshot of the paimon table, If true, a checkpoint will only be made if a snapshot is consumed. |
source.checkpoint-align.timeout |
30 s | Duration | If the new snapshot has not been generated when the checkpoint starts to trigger, the enumerator will block the checkpoint and wait for the new snapshot. Set the maximum waiting time to avoid infinite waiting, if timeout, the checkpoint will fail. Note that it should be set smaller than the checkpoint timeout. |
streaming-read.shuffle-bucket-with-partition |
true | Boolean | Whether shuffle by partition and bucket when streaming read. |
unaware-bucket.compaction.parallelism |
(none) | Integer | Defines a custom parallelism for the unaware-bucket table compaction job. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration. |