With a self-hosted LLM, that loop happens locally. The model is downloaded to your machine, loaded into memory, and runs directly on your CPU or GPU. So you’re not dependent on an internet connection ...
This is because the different variants are all around 60GB to 65GB, and we subtract approximately 18GB to 24GB (depending on ...