• realharo@lemm.ee
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 months ago

    As far as I know, that is mainly used where a better, bigger model generates training data for a more efficient smaller model to bring it a bit closer to its level.

    Were there any cases of an already state of the art model using this method to improve itself?