👻

Semantic Brain Setup

Build an AI-powered semantic memory system with Supabase + pgvector

Store thoughts as vector embeddings. Search by meaning, not keywords. This guide walks you through the complete setup from zero to semantic search.

⏱️ ~15 minutes

🏗️ Step 1: Create Supabase Project

Go to supabase.com and create a new free project. This will be your brain's backend.

Go to supabase.com

Click "New Project"

Choose a name (e.g., "ghost-brain")

Wait for project to create

🗄️ Step 2: Get Your Credentials

Navigate to your project's API settings to get the credentials you'll need.

In your Supabase dashboard, go to Settings → API

Copy Project URL (looks like: https://your-project.supabase.co)

Copy Service Role Key (starts with eyJ...)

For OpenAI, go to platform.openai.com → API Keys

Create a new API key or copy your existing one

🗃️ Step 3: Enable pgvector Extension

pgvector is a PostgreSQL extension that enables vector similarity search. Without it, your database can't compare embeddings efficiently.

CREATE EXTENSION IF NOT EXISTS vector;

What this does: Installs the pgvector extension if it's not already installed. The IF NOT EXISTS part makes it safe to run multiple times without errors.

📊 Step 4: Create the memories Table

This table stores your thoughts as both text and vector embeddings. Each column has a specific purpose.

CREATE TABLE memories (
  id SERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1536),
  metadata JSONB DEFAULT '{}'::jsonb,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  importance VARCHAR(20) CHECK (importance IN ('low', 'medium', 'high'))
);

id SERIAL PRIMARY KEY — Auto-incrementing unique ID

content TEXT NOT NULL — The actual text content

embedding vector(1536) — 1536-dimensional vector from OpenAI (semantic fingerprint)

metadata JSONB — Flexible data storage (topics, tags, custom fields)

created_at TIMESTAMPTZ — Timestamp when memory was created

importance VARCHAR — Priority level with validation (low/medium/high)

🔧 Step 5: Create the RPC Search Function

This Remote Procedure Call function handles vector similarity search. It takes a query embedding and finds the most similar memories using cosine similarity.

CREATE OR REPLACE FUNCTION search_memories(query_embedding text)
RETURNS TABLE (
  id INTEGER,
  content TEXT,
  metadata JSONB,
  created_at TIMESTAMPTZ,
  importance VARCHAR,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    m.id,
    m.content,
    m.metadata,
    m.created_at,
    m.importance,
    1 - (m.embedding::vector <=> query_embedding::vector) as similarity
  FROM memories m
  WHERE m.embedding IS NOT NULL
  ORDER BY m.embedding::vector <=> query_embedding::vector
  LIMIT 10;
END;
$$;

What this does: Takes your query embedding, compares it to all stored memories using cosine distance, and returns the top 10 matches sorted by similarity.

⚙️ Step 6: Configure Your Environment

Create a .env file in your brain skill directory with these credentials.

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_KEY=eyJ...your-service-role-key...
OPENAI_API_KEY=sk-proj-...your-openai-key...

Security note: Never commit .env files to git. Add .env to your .gitignore file.

      graph TD
        A["Create Supabase Project
Get project ID + keys"]
          --> B["Enable pgvector Extension
Run SQL to install extension"]
          --> C["Create memories Table
id, content, embedding vector1536, metadata"]
          --> D["Create RPC Function
search_memories with similarity"]
          --> E["Configure .env File
SUPABASE_URL + SUPABASE_SERVICE_KEY
+ OPENAI_API_KEY"]
          --> F["Capture Memories
Generate embeddings + store as vectors"]
          --> G["Search Semantically
Find by meaning, not keywords"]

        style A fill:#6366f1,stroke:#4f46e5,color:#ffffff
        style B fill:#1a1a1a,stroke:#6366f1,color:#ffffff
        style C fill:#1a1a1a,stroke:#6366f1,color:#ffffff
        style D fill:#1a1a1a,stroke:#6366f1,color:#ffffff
        style E fill:#1a1a1a,stroke:#6366f1,color:#ffffff
        style F fill:#10b981,stroke:#059669,color:#ffffff
        style G fill:#10b981,stroke:#059669,color:#ffffff

💡 How Semantic Search Works:

Embeddings are 1,536 numbers that represent the meaning of your text. When you search, the system compares your query's embedding to all stored memories using cosine similarity. This finds results even when words don't match — "database config" finds memories about "setting up PostgreSQL" because the meanings are similar.

Similarity as percentage: The formula 1 - cosine_distance converts mathematical distance to a percentage. 100% means exact match, 50% means somewhat similar, 10% means not very similar.