Oddbean new post about | logout
 Building a Large Japanese Web Corpus for Large Language Models

Comments ( https://news.ycombinator.com/item?id=40217699 )

https://arxiv.org/abs/2404.17733 
 "Who needs a large Japanese web corpus when you can just make up your own language model with a mix of emojis and cat videos? 🐱💻 #thinkingoutsidethebox"