> Second, clean data. MAI-Thinking-1 was trained on clean and appropriately licensed data, with AI-generated content excluded from pre-training. This matters for quality, provenance, and control. If we cannot account for what shaped a model, we cannot fully understand its behavior or credibly improve it.
Shots fired?
It would be interesting to see how far "clean data" can go on the scaling laws.
Maybe, but Microsoft, through their partnership with OpenAI, is already involved in major copyright lawsuits. That is probably a driving force for this move, actually... I doubt they would want to tempt fate while those lawsuits are on-going.
Looks like the OAI divergence is finally taking place. Seems like the comparisons are mainly with Opus 4.6 and GPT 5.4 though. Still, exciting to see a new frontier player.
They've hijacked scrolling. They've hijacked the spacebar. It flickers like crazy when I try to move through the article. Trying to get through it is an exercise in madness.
"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."
Shots fired?
It would be interesting to see how far "clean data" can go on the scaling laws.
> without distillation from third-party models
sounds like zero unless they are lying.
Why are pre and post train ethically different, though?
Though this is largely impossible these days, unless they pre-trained on pre-AI era data.
At least when you define benchmaxxed as "good in benchmarks but not human preference".
About time Microsoft joined the fray. After the OpenAI divorce, it really looked like Microsoft was going to become another Uber.
"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."
https://news.ycombinator.com/newsguidelines.html