Sovereign AGI/ASI · Wiki

Wiki

The technical knowledge base for running frontier-adjacent AI on hardware you own.

Architecture

The One-Mac-Three-Domains Pattern

A single Next.js codebase serves multiple independent public websites from one Mac, routed by HTTP host header and exposed through one Cloudflare tunnel. Each domain is a tenant; the machine is the whole datacenter.

Capabilities

Local Embeddings & Semantic Search

Embeddings turn text into vectors so meaning can be compared by distance, enabling semantic search and retrieval without keyword matching. A small sentence-transformer running locally delivers this at zero marginal cost.

Vision Models Run Locally (Qwen2.5-VL)

Qwen2.5-VL is an open vision-language model that reads images and answers questions about them. Run locally via MLX or LM Studio, it provides private, zero-marginal-cost image tagging, captioning, and visual analysis.

Concepts

Sovereign AI

Sovereign AI is the practice of running inference, embeddings, and AI agents on hardware you own and control, with no per-token cloud dependency in the default path. It treats the model as a fixed asset rather than a metered utility.

Economics

The Economics of $0-Marginal Inference

Once you own the hardware, every additional inference call costs only electricity, collapsing the per-token price toward zero. This inverts the cloud's cost curve, where building more always costs more.

Infrastructure

DeepSeek-R1 on a Windows GPU Box

DeepSeek-R1-Distill-Qwen-7B runs as the local reasoning brain on the CyberPower Windows GPU box at port 1234, served by llama.cpp under a reboot-survivable NSSM LlamaServer service. The IQ3_XS imatrix quant (3.19 GB) fully offloads all 29 layers onto the 4 GB AMD GPU via Vulkan, hitting 31.9 tok/s with model weights at 2962 MB plus KV cache at 238 MB for roughly 3.2 GB total residency.

Networking

Cloudflare Named Tunnels (Dark Origin)

A Cloudflare named tunnel publishes a service to the public internet through an outbound-only connection from the origin, so the origin Mac needs no open inbound ports, no static IP, and no exposed address. The origin stays dark.

Runtime

Local Inference on Apple Silicon (MLX)

MLX is Apple's array framework for machine learning on Apple Silicon, exploiting unified memory so a single M-series Mac can hold and serve large language and vision models with no discrete GPU. It is the runtime backbone of a Mac-based sovereign stack.

Quantization for Home Inference

Quantization shrinks a model's weights from 16-bit floats to lower-precision integers, cutting memory footprint several-fold so large models fit on consumer hardware. 4-bit is the workhorse precision for sovereign home setups.

Tooling

The Sovereign Twin — Grounded RAG

DAJAI's sovereign AI twin is a local brain that answers in DJ's voice and is grounded by semantic RAG over roughly 395 private memory files, which are split into 4,622 chunks. Those chunks are embedded on a local all-MiniLM server with 384-dimensional vectors and retrieved by cosine similarity, then fed into a SuperGrok-primary generation chain that falls back to local MLX when needed. The result is a twin that stays locked to real private context instead of generic model knowledge.