Wednesday, April 1, 2026

more parameters at low-precision is better than less parameters at high-precision

https://x.com/i/status/2039207501477097541

Reasoning ability comes from the structure of billions of connections, not the precision of each one. an 8b model with coarse weights still captures more of the original model's knowledge than a small model with precise weights.