FlashAttention: Fast Transformer training with long sequences https://www.adept.ai/blog/flashier-attention https://news.ycombinator.com/item?id=37724861