https://x.com/i/status/2039207501477097541
Reasoning ability comes from the structure of billions of connections, not the precision of each one. an 8b model with coarse weights still captures more of the original model's knowledge than a small model with precise weights.
No comments:
Post a Comment