SeqPacker¶
High-performance sequence packing for LLM training, written in Rust with Python bindings.
Training LLMs on variable-length sequences? Naive padding wastes 20-40% of GPU compute. SeqPacker packs sequences into fixed-size bins, achieving 95-99% utilization with 11 bin-packing algorithms — from O(n) streaming to near-optimal offline.
Key Features¶
- 11 algorithms — NF, FF, BF, WF, FFD, BFD, FFS, MFFD, OBFD, OBFDP, HK
- Streaming API — bounded-space packing with incremental output
- HuggingFace integration — one-call
pack_datasetfor SFTTrainer / TRL - PyTorch integration — GPU-ready tensors out of the box
- NumPy zero-copy — pass arrays directly, no conversion overhead
- Cross-platform — Linux, macOS, Windows; Python 3.9-3.13
Quick Example¶
Next Steps¶
- Getting Started — installation and first steps
- User Guide — algorithm selection and usage patterns
- Python API Reference — full Python API documentation
- Rust API Reference — auto-generated on docs.rs
- Performance — benchmarks and comparisons