7 min read

keep your models close

it's not quite 8am on a Saturday, i'm up early and enjoying the quiet of the morning (okay, i say up, i'm still in bed but on my computer). let's try to have a quick thought about AI! but also have enough nuance that no-one yells at me.

first i should say that i wrote a long and thoughtful talk about AI (when i say AI in this post i mean LLMs, this talk was also about LLMs). so go read that for some well considered thoughts, rather than these less considered ones. tl;dr: things have second and third order impacts, and it's worth thinking about those.

so with all that out the way, here's the thought:

i feel very differently about an AI model which runs locally than one which runs remotely

why do i feel this way? well, fundamentally there's a way in which AI models distil down books, transcripts, websites, Everything, into a big file made of parameters. you run the model and you can reconstruct versions of the things it was trained on. it's a type of lossy compression. the model is a representation of the corpus. and when that corpus was Everything, it can (re)produce a lot of types of things.

now, as i wrote in my longer and more thoughtful thing from a few years ago, that Everything, that significant-proportion-of-human-culture, that has value. we built that together, and long may we continue to do so!

if we're talking about a model that runs remotely, on someone else's server: we're looking at taking Everything, and gating it behind a monthly subscription fee to OpenAI. or credits to Anthropic. or given for free in the Google search summaries that a team of highly paid engineers are feverishly working to inject advertising into. it sucks! it's the enclosure of the mind.

but in the local case, where it runs on the machine we're using... sure, we're taking Everything from all of the individual people who have contributed to it. and some nerds with ulterior motives somewhere have figured out how to squish it all into a multi-gigabyte file. but... i can use it however i want. however much i want. i can change it, i can think of new ways to use it. i don't need permission, i don't need to pay rent. it's kind of beautiful, just as an object? like carrying around a dump of wikipedia, it just feels good to be able to touch a crystallised representation of human knowledge.

i guess the contrast here is between Pirate Bay and a SAAS. they come from very different places, ideologically. one is a gift, the other is rent. now, it's reasonable to be a creator and to be mad at Pirate Bay! they take a thing you worked hard on and they let people circumvent paying for it. the gift was not theirs to give! but the musicians i know are so much madder at the way Spotify operates, taking a monthly subscription and giving legal access to music... but somehow not letting any but a trickle of that money get back to the artists.

a confession

i do actually have a local model that sits on my computer, and that i occasionally use. it's MacWhisper, a nice app that wraps OpenAI's Whisper model. you feed it a recording of speech and it gives you a text file with a transcript in it. it works pretty well - if you're publishing something you probably want to clean it up/check over it, but it provides a pretty good replacement for transcription and a great starting place when doing subtitles. MacWhisper has paid options, but the free version works fine. i like it! and a big part of that is because it's a program that sits in my Applications folder, not a service i'm subscribing to. it's not going anywhere! it can't go bust or change it's business model. and it can't use my data in ways i don't like, which is especially important for something like transcription, where the thing being transcribed might be sensitive info. i'm not uploading that into whatever cloud service with whatever data retention policies and whatever partners and whatever policies on using the data for other purposes.

the fact that the thing i downloaded is not made by the people who made the model also points to the potential in local models. sure, there's tons of people making products that are interesting interfaces wrapped around an OpenAI API call. but the fact of every query having a cost limits the forms those products can take. it increases the latency, makes downtime possible, means those interfaces also have to be a live service, maintained, running the red queen's race. whereas, if you can download the model, lots more shapes of software become possible. you can provide access as a live service - as a provider you can choose your choice of hosting to run the models on, and you're not locked into a single expensive provider. you can also make it standalone, as with MacWhisper. or you can just embed it within your products as a little corner, a single feature. and you can also edit the model. you don't have to use it exactly how it came out the box, you can finetune it to work a different way, to express a particular corner of the latent space more, to have a different personality. you are part of the development process, you are on a level with the people who trained the model in a way you never can be if they control the model and access to it.

(a sidebar to get on a little hobby horse of mine: a local model also gives the possibility of really low latencies. reducing the latency between input and output can lead to a step change difference in experience of using a tool. now, lots of models are local and are heavyweight enough they'll run slowly on a normal computer, whereas remote once might be running on hyper-specialised hardware and will run faster - but remote models will always have the latency involved in communicating over the network. only local ones offer the possibility of running in proper realtime, in getting quick enough to feel transparent. i should write more on latency in tooling.)

if you're an AI sceptic, you might be saying here "wow, V, what's up with you? i know you talked about how AI is stealing IP, but what about the energy cost? what about the water? AI is burning the planet!". and... yeah, locally run models make that part better too! sure, maybe the training was big and expensive and cost a lot to do. but if the model can run locally, then the power consumption is... i mean, maybe your computer spins up it's fans, maybe your phone gets a bit hotter, i'm not saying there's no additional energy consumption. but it's limited by your mains electrics. it's within your control. you know how much energy it's using (you're paying for it, you can look at the electricity bill when it gets delivered).

(and flipping to the developer perspective again - internet businesses are based on the idea that running the servers is very cheap, and development costs only have to be paid once, so you can offer services to people for something like zero marginal cost. this is why you don't have to pay for WhatsApp. but when AI models start to really take a lot of money to run, that stops working. local models, or models that are lightweight enough they could run locally - they change that back to the zero marginal cost model. i would like to add a feature to Downpour where it can automatically cut subjects out of photos. i will not add this feature if i have to pay every time a user does this. i would add this feature if it could run on the device.)

so why aren't they all local?

this does bring us to the flipside, and why this isn't the way that all AI models are run. the first reason is: you can only run stuff locally if it's a small enough and fast enough model. if it can run on whatever hardware and doesn't need a ruinously expensive NVidia chip to complete in a reasonable amount of time. and that's... some of them? not all of them? definitely not the cutting edge, best performance, state of the art ones. but these big new expensive models have a tendency to get optimised, to get distilled, to get replicated smaller and cheaper. i'm hopeful!

the second reason you can't run models locally is because: you need to download it first. so, someone needs to put it online - either in a leak, or, more often, licensing it openly. why would they put all this time and money (lots of money) into making a good model and then give it away? well, sometimes they're academic researchers, but mainly because they're a big tech company who are not currently in the lead in terms of making AI models, and they see a future where every model is owned by a few companies who charge rent for accessing them, and they want to make a strategic play to try to keep stuff open and competitive. this has been Meta, this was most recently DeepSeek... even in this world of local models, we're still beholden to the manoeuvrings of the giants as they sumo wrestle, hoping not to get squished by their feet as they shuffle around.

but lots of models are not available for download. the tech companies are keeping them close to their chests. sometimes if you ask them why, they will say the reason is safety, which... yeah, okay, there's something to that. not so much the risk that the AI will turn out to be incredibly smart and take over the world, i don't think that's a real thing. but what is real is: download an image model locally and you can finetune it to generate deepfake porn and child abuse images and other awful stuff. download an LLM and you can finetune it or change the prompts and get it to tell you how to build bombs or be your fake girlfriend (who is actually allowed to sext you) or sweet talk you into weird and awful ideologies.

but also the main reason you can't download them is: they have spent a lot of money to train their models. they do not want their rivals to see how their model operates and catch up with them. instead, they want to become the monopoly provider of AI services. they are looking forward to a world where they get a cut whenever anyone does anything with a computer.

in conclusion

i don't know if i like AI, as practiced today. but i do know i feel differently about local models versus remote ones. they feel like different visions of the future. neither vision is utopian... but i know which one i prefer.


wow, i tried to write a quick thought and ended up with a 2000 word essay. also it's 2pm now. i'm bad at this. worse thoughts next time, i promise. maybe on a less contentious topic.