gigl.common.beam.better_tfrecordio#

Internal fork of WriteToTFRecord with improved TFRecord sink. Specifically we add functionality to cap the max bytes per shard - a feature supported by file based sinks but something not implemented for tensorflow sinks. Also has support for specifying deferred tft.tf_metadata.dataset_metadata.DatasetMetadata, so it can be used in pipelines where DatasetMetadata is derived on runtime.

Classes

BetterWriteToTFRecord

Transform for writing to TFRecord sinks.