GPU-optimized UL2 mixture-of-denoisers data collator for T5/FLAN encoder-decoder pretraining. Supports span corruption, prefix LM, infilling, curriculum learning, Flash Attention unpadding, and HuggingFace Trainer integration.
transformers pytorch encoder-decoder curriculum-learning huggingface t5 pretraining flan-t5 ul2 flash-attention span-corruption data-collator
-
Updated
Dec 28, 2025 - Python