Privacy-Preserving AI: Running Local LLMs Without Compromising Your Data
After spending years building privacy-first applications in Web3, I've turned my attention to the AI side of privacy. Here's my practical guide to running powerful LLMs locally — and why it matters more than most developers think.
Privacy-Preserving AI: Running Local LLMs Without Compromising Your Data

Every time you paste proprietary code into ChatGPT, you're making a trust decision. Every time your AI-powered IDE sends your codebase to a cloud API, data leaves your control. For most developers working on personal projects, this is a reasonable trade-off. But if you're working with sensitive code, client data, or in a regulated industry, the calculus changes entirely. Having spent significant time building privacy-first applications at COTI using garbled circuits and other cryptographic techniques, I've become increasingly focused on bringing that same privacy-first mindset to AI workflows.
The Privacy Problem with Cloud AI
Let me be clear: I use cloud-based AI tools daily. Claude, GPT-4, Cursor — they're incredible productivity multipliers. But there are scenarios where sending data to external servers is unacceptable:
- Proprietary algorithms: The core logic that gives your company a competitive advantage
- Customer data: PII, health records, financial information — anything subject to GDPR, HIPAA, or similar regulations
- Security-sensitive code: Authentication systems, encryption implementations, vulnerability details
- Pre-disclosure work: Patent applications, unreleased product features, M&A due diligence
The standard response is "trust the provider's data handling policies." Having worked in privacy infrastructure, I know that policies are not guarantees. Data minimization — not sending the data in the first place — is always the stronger privacy posture.
The State of Local LLMs in 2025
The good news is that running capable LLMs locally has gone from "technically possible but painful" to "genuinely practical" in the span of a year. Several developments converged:
Model quality at smaller sizes. Llama 3.1 8B, Mistral 7B, and Phi-3 deliver remarkably good performance for their size. For code completion, summarization, and structured data extraction, these models handle 80-90% of tasks that previously required GPT-4.
Apple Silicon changed everything. M-series chips with unified memory architecture made running 7B-13B parameter models on a laptop not just possible but fast. A MacBook Pro with 32GB of unified memory can run a 13B model at usable speeds with no GPU purchase required.
Tooling matured. Ollama made running local models as simple as ollama run llama3.1. LM Studio provides a GUI. llama.cpp continues to push performance boundaries with quantization improvements.
My Local AI Setup
Here's what I'm currently running and how it fits into my workflow:
Hardware
- Daily driver: MacBook Pro M3 Pro with 36GB unified memory
- Home server: Mac Mini M2 Pro with 32GB — runs models 24/7 for background tasks
Software Stack
- Ollama for model management and serving
- Open WebUI for a ChatGPT-like interface pointing at local models
- Continue (VS Code extension) configured to use local Ollama models for code completion
Model Selection
Different tasks get different models:
| Task | Model | Why |
|---|---|---|
| Code completion | CodeLlama 13B | Fast, trained specifically on code |
| Code review | Llama 3.1 8B | Good reasoning, fast enough for interactive use |
| Documentation | Mistral 7B | Strong writing quality at low resource cost |
| Data analysis | Llama 3.1 70B (quantized) | Complex reasoning tasks need the larger model — runs on the Mac Mini overnight |
The Hybrid Approach
In practice, I don't run everything locally. My workflow is hybrid:
- Local by default for code completion and quick questions about code I'm actively working on
- Cloud for complex reasoning when I need frontier-model intelligence and the data isn't sensitive
- Local for sensitive work — anything involving client data, proprietary logic, or pre-disclosure work goes through local models exclusively
# My typical setup — start Ollama with the models I need
ollama pull llama3.1:8b
ollama pull codellama:13b
# Ollama serves on localhost:11434 by default
# Configure your tools to point there instead of cloud APIs
Practical Limitations
I want to be honest about the trade-offs:
Quality gap is real. For complex, multi-step reasoning tasks, frontier cloud models (Claude 3.5 Sonnet, GPT-4) are still meaningfully better than anything you can run locally. The gap is closing, but it exists.
Context windows are smaller. Most local models top out at 8K-32K tokens of context, compared to 100K+ for cloud models. For large codebase analysis, this is a real constraint.
No tool use (mostly). The sophisticated tool-calling capabilities of cloud models — browsing, code execution, MCP integration — aren't available with most local setups. You get the raw language model, not the full agent framework.
Setup isn't zero. While Ollama has made things much easier, there's still more setup and maintenance than a cloud API key. Model updates, quantization choices, memory management — it's manageable but not invisible.
The Privacy-AI Intersection
What fascinates me is the convergence of my two areas of focus — Web3 privacy and AI privacy. The cryptographic techniques I worked with at COTI (garbled circuits, secure multi-party computation) are starting to find applications in AI:
- Federated learning allows model training on distributed data without centralizing it
- Homomorphic encryption enables computation on encrypted data — imagine querying an AI model without the model ever seeing your plaintext input
- TEE-based inference (similar to what Coinbase uses for CDP Wallets) can provide hardware-backed guarantees that your data isn't exposed during AI processing
These are still largely research-stage for LLM-scale models, but the trajectory is clear. Privacy-preserving AI isn't just about running models locally — it's about building cryptographic guarantees into the inference pipeline itself.
Getting Started
If you want to start running models locally:
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - Pull a model:
ollama run llama3.1:8b - Configure your tools: Point your IDE's AI features at
localhost:11434 - Experiment: Try your typical AI tasks locally and note where the quality gap matters
You'll likely find that local models handle more than you expect. And for the tasks where they fall short, you now have a clear decision framework: is this data sensitive enough to warrant the quality trade-off?
Privacy isn't a binary choice between "use AI" and "protect data." A hybrid approach — local for sensitive work, cloud for everything else — gives you the best of both worlds. Start local, and you'll be surprised how much you can keep on-device.