Database Schema - Learn English Zero

Design Document Date: 2026-01-09 Status: Complete - Ready for Implementation Database: Supabase PostgreSQL Author: Claude (via /brainstorm interactive session)

Constants Reference: Xem definitions.md cho tất cả constants và enums.


Table of Contents

  1. Architecture Overview
  2. Schema Diagram
  3. Core Tables
  4. Indexes Strategy
  5. Row Level Security (RLS)
  6. Migration Scripts
  7. Data Validation

Architecture Overview

Database Choice: Supabase PostgreSQL

Rationale:

  • Built-in authentication (auth.users) with Google OAuth
  • Row Level Security (RLS) for data protection
  • Real-time subscriptions for progress tracking
  • Free tier: 500MB database, 2GB bandwidth/month (sufficient for MVP)

Schema Organization

Supabase Database Structure:
├── auth schema (managed by Supabase)
│   └── users                    # OAuth, email/password, tokens
│
├── public schema (custom tables)
│   ├── user_profiles            # User info & preferences
│   ├── diagnosis_tests          # Template for diagnosis tests
│   ├── user_diagnosis_results   # User test results & prescriptions
│   ├── words                    # Oxford 3000 vocabulary
│   ├── daily_lessons            # 120-day lesson plans
│   ├── user_word_progress       # SM-2 spaced repetition state
│   ├── user_daily_progress      # Daily completion tracking
│   ├── checkpoint_tests         # Milestone test templates
│   ├── user_checkpoint_results  # User checkpoint results
│   ├── ai_conversations         # AI Coach chat history
│   └── ai_feedback              # User feedback on AI responses

Schema Diagram

erDiagram
    AUTH_USERS ||--o{ USER_PROFILES : "has"
    AUTH_USERS ||--o{ USER_DIAGNOSIS_RESULTS : "takes"
    AUTH_USERS ||--o{ USER_WORD_PROGRESS : "learns"
    AUTH_USERS ||--o{ USER_DAILY_PROGRESS : "completes"
    AUTH_USERS ||--o{ USER_CHECKPOINT_RESULTS : "attempts"
    AUTH_USERS ||--o{ AI_CONVERSATIONS : "chats"

    DIAGNOSIS_TESTS ||--o{ USER_DIAGNOSIS_RESULTS : "generates"
    WORDS ||--o{ USER_WORD_PROGRESS : "tracked_in"
    DAILY_LESSONS ||--o{ USER_DAILY_PROGRESS : "completed_in"
    CHECKPOINT_TESTS ||--o{ USER_CHECKPOINT_RESULTS : "graded_in"
    AI_CONVERSATIONS ||--o{ AI_FEEDBACK : "rated_in"

Core Tables

1. User Management

1.1 auth.users (Supabase Managed)

Purpose: Core authentication, managed automatically by Supabase Auth.

Key fields:

  • id (uuid, PK): User unique identifier
  • email (text): User email
  • email_confirmed_at (timestamp)
  • encrypted_password (text)
  • provider (text): oauth provider (google, email, etc.)
  • last_sign_in_at (timestamp)
  • created_at, updated_at

⚠️ DO NOT modify this table directly.


1.2 public.user_profiles

Purpose: User profile information, preferences, and context for AI personalization.

CREATE TABLE public.user_profiles (
    -- Primary key
    user_id UUID PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,

    -- Basic profile
    nickname TEXT,
    avatar_url TEXT,

    -- AI Coach personalization
    preferred_ai_name TEXT,                    -- Tên user đặt cho AI Coach (e.g., "Bee", "Coach")

    -- Context for AI personalization (from additional questions)
    profession TEXT,                           -- Nghề nghiệp (e.g., "kỹ sư", "sinh viên")
    work_environment TEXT,                     -- Môi trường làm việc (e.g., "văn phòng", "remote")
    uses_english_at_work BOOLEAN DEFAULT false, -- Có dùng tiếng Anh trong công việc không
    learning_goals TEXT[],                     -- Array of goals (e.g., ["travel", "work", "study"])
    available_study_time_minutes INTEGER,      -- Thời gian có thể học mỗi ngày (15, 30, 45, 60)
    previous_learning_experience TEXT,         -- "school" | "center" | "self" | "never"

    -- Progress & gamification
    current_day INTEGER DEFAULT 1,             -- Ngày hiện tại trong lộ trình (1-120)
    streak_current INTEGER DEFAULT 0,          -- Streak hiện tại (số ngày liên tiếp)
    streak_longest INTEGER DEFAULT 0,          -- Streak dài nhất từng đạt
    last_studied_date DATE,                    -- Ngày học cuối cùng

    -- Statistics (denormalized for performance)
    total_words_learned INTEGER DEFAULT 0,     -- Tổng số từ đã học (status != 'new')
    total_words_mastered INTEGER DEFAULT 0,    -- Tổng số từ đã thành thạo
    total_study_minutes INTEGER DEFAULT 0,     -- Tổng thời gian học (phút)

    -- Preferences
    daily_reminder_time TIME,                  -- Thời gian nhắc nhở hàng ngày (e.g., 19:00)
    notification_enabled BOOLEAN DEFAULT true,
    audio_autoplay BOOLEAN DEFAULT true,

    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    CHECK (available_study_time_minutes IN (15, 30, 45, 60) OR available_study_time_minutes IS NULL),
    CHECK (previous_learning_experience IN ('school', 'center', 'self', 'never') OR previous_learning_experience IS NULL)
);

-- Indexes
CREATE INDEX idx_user_profiles_current_day ON public.user_profiles(current_day);
CREATE INDEX idx_user_profiles_last_studied ON public.user_profiles(last_studied_date);

-- RLS Policies
ALTER TABLE public.user_profiles ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own profile"
    ON public.user_profiles FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can update own profile"
    ON public.user_profiles FOR UPDATE
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own profile"
    ON public.user_profiles FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Key Design Decisions:

  • Why denormalize stats? Frequent queries for dashboard, JOIN-heavy otherwise
  • Why TEXT[] for learning_goals? Flexible, can add new goals without migration
  • Why separate work_environment? AI uses this to personalize vocabulary examples

2. Diagnosis System

2.1 public.diagnosis_tests

Purpose: Template for diagnosis tests (4 loại bệnh: Mù/Điếc/Câm/Yếu phản xạ).

CREATE TABLE public.diagnosis_tests (
    -- Primary key
    id SERIAL PRIMARY KEY,

    -- Test metadata
    test_name TEXT NOT NULL,                   -- e.g., "Initial Diagnosis Test v1.0"
    version TEXT NOT NULL,                     -- Versioning for future improvements
    is_active BOOLEAN DEFAULT true,            -- Only one active version at a time

    -- Test structure (JSONB for flexibility)
    sections JSONB NOT NULL,                   -- Array of test sections
    /*
    Example structure:
    {
        "reading": {
            "name": "Đọc - Mù",
            "questions": [...],
            "weight": 0.25
        },
        "listening": {
            "name": "Nghe - Điếc",
            "questions": [...],
            "weight": 0.25
        },
        "speaking": {
            "name": "Nói - Câm",
            "questions": [...],
            "weight": 0.25
        },
        "reflex": {
            "name": "Phản xạ - Yếu phản xạ",
            "questions": [...],
            "weight": 0.25
        }
    }
    */

    -- Prescription logic
    prescription_rules JSONB NOT NULL,         -- Rules to determine which prescription
    /*
    Example:
    {
        "mu": { "threshold": 50, "prescription": "don_mu" },
        "diec": { "threshold": 50, "prescription": "don_diec" },
        "cam": { "threshold": 50, "prescription": "don_cam" },
        "yeu_phan_xa": { "threshold": 50, "prescription": "don_yeu_phan_xa" }
    }
    */

    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Unique constraint
    UNIQUE(version)
);

-- Indexes
CREATE INDEX idx_diagnosis_tests_active ON public.diagnosis_tests(is_active) WHERE is_active = true;

Key Design Decisions:

  • Why JSONB? Test structure may evolve, JSONB allows flexibility without schema migration
  • Why version field? Can A/B test different diagnosis approaches

2.2 public.user_diagnosis_results

Purpose: Store user diagnosis results and prescribed "đơn thuốc".

CREATE TABLE public.user_diagnosis_results (
    -- Primary key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Foreign keys
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    diagnosis_test_id INTEGER NOT NULL REFERENCES public.diagnosis_tests(id),

    -- Test results (%)
    percent_mu NUMERIC(5,2) NOT NULL,          -- % Mù (0-100)
    percent_diec NUMERIC(5,2) NOT NULL,        -- % Điếc (0-100)
    percent_cam NUMERIC(5,2) NOT NULL,         -- % Câm (0-100)
    percent_yeu_phan_xa NUMERIC(5,2) NOT NULL, -- % Yếu phản xạ (0-100)

    -- Prescribed "đơn thuốc"
    prescription_type TEXT NOT NULL,           -- "don_mu" | "don_diec" | "don_cam" | "don_yeu_phan_xa" | "don_foundation"

    -- Raw answers (for future analysis)
    raw_answers JSONB,                         -- User's answers to each question

    -- Metadata
    completed_at TIMESTAMPTZ DEFAULT NOW(),
    created_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    CHECK (percent_mu >= 0 AND percent_mu <= 100),
    CHECK (percent_diec >= 0 AND percent_diec <= 100),
    CHECK (percent_cam >= 0 AND percent_cam <= 100),
    CHECK (percent_yeu_phan_xa >= 0 AND percent_yeu_phan_xa <= 100),
    CHECK (prescription_type IN ('don_mu', 'don_diec', 'don_cam', 'don_yeu_phan_xa', 'don_foundation'))
);

-- Indexes
CREATE INDEX idx_user_diagnosis_user ON public.user_diagnosis_results(user_id);
CREATE INDEX idx_user_diagnosis_completed ON public.user_diagnosis_results(completed_at DESC);

-- RLS Policies
ALTER TABLE public.user_diagnosis_results ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own diagnosis results"
    ON public.user_diagnosis_results FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own diagnosis results"
    ON public.user_diagnosis_results FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Key Design Decisions:

  • Why UUID for id? Prevent enumeration attacks, more secure
  • Why separate percent columns? Easy to query and visualize progress over time
  • Why store raw_answers? Can re-score if prescription_rules change

3. Vocabulary System (from don-mu-vocabulary.md)

3.1 public.words

Purpose: Oxford 3000 vocabulary with metadata.

CREATE TABLE public.words (
    -- Primary key
    id SERIAL PRIMARY KEY,

    -- Word data
    word TEXT NOT NULL,                        -- English word (e.g., "apple")
    vietnamese_meaning TEXT NOT NULL,          -- Vietnamese translation
    part_of_speech TEXT NOT NULL,              -- "noun" | "verb" | "adj" | "adv" | etc.

    -- CEFR & difficulty
    difficulty_level TEXT NOT NULL,            -- "A1" | "A2" | "B1"
    day_introduced INTEGER NOT NULL,           -- Day 1-120
    frequency_rank INTEGER,                    -- Oxford frequency ranking (lower = more common)

    -- Pronunciation
    ipa TEXT,                                  -- IPA pronunciation (optional)
    pronunciation_difficulty SMALLINT DEFAULT 1, -- 1=easy, 2=medium, 3=hard for Vietnamese learners

    -- Learning aids
    is_concrete BOOLEAN DEFAULT true,          -- Has physical form (needs image)
    image_url TEXT,                            -- Unsplash image URL
    audio_url TEXT NOT NULL,                   -- Azure TTS audio URL

    -- Example sentences
    example_sentence_en TEXT,
    example_sentence_vi TEXT,

    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    UNIQUE(word, part_of_speech),              -- Same word can be noun & verb
    CHECK (difficulty_level IN ('A1', 'A2', 'B1')),
    CHECK (day_introduced >= 1 AND day_introduced <= 120),
    CHECK (pronunciation_difficulty >= 1 AND pronunciation_difficulty <= 3)
);

-- Indexes
CREATE INDEX idx_words_day ON public.words(day_introduced);
CREATE INDEX idx_words_level ON public.words(difficulty_level);
CREATE INDEX idx_words_word ON public.words(word);
CREATE INDEX idx_words_concrete ON public.words(is_concrete) WHERE is_concrete = true;

Key Design Decisions:

  • Why UNIQUE(word, part_of_speech)? "run" can be both noun and verb
  • Why pronunciation_difficulty? AI Coach can prioritize hard-to-pronounce words

3.2 public.user_word_progress

Purpose: SM-2 spaced repetition state for each user-word pair.

CREATE TABLE public.user_word_progress (
    -- Primary key
    id BIGSERIAL PRIMARY KEY,

    -- Foreign keys
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    word_id INTEGER NOT NULL REFERENCES public.words(id) ON DELETE CASCADE,

    -- Learning status
    status TEXT DEFAULT 'new',                 -- "new" | "learning" | "mastered" | "forgotten"

    -- SM-2 Algorithm parameters
    ease_factor NUMERIC(3,2) DEFAULT 2.5,      -- 1.3 - 2.5+ (determines interval growth)
    repetition_count INTEGER DEFAULT 0,        -- Number of successful reviews
    interval_days INTEGER DEFAULT 1,           -- Days until next review
    next_review_date DATE,                     -- Next scheduled review

    -- User performance
    last_quality_score SMALLINT,               -- 0-5 (last self-rating)
    correct_count INTEGER DEFAULT 0,
    incorrect_count INTEGER DEFAULT 0,

    -- Timestamps
    last_reviewed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    UNIQUE(user_id, word_id),
    CHECK (status IN ('new', 'learning', 'mastered', 'forgotten')),
    CHECK (ease_factor >= 1.3),
    CHECK (interval_days >= 1 AND interval_days <= 180), -- Cap at 6 months
    CHECK (last_quality_score >= 0 AND last_quality_score <= 5)
);

-- Indexes (critical for performance)
CREATE INDEX idx_user_word_progress_user ON public.user_word_progress(user_id);
CREATE INDEX idx_user_word_progress_review_date ON public.user_word_progress(user_id, next_review_date);
CREATE INDEX idx_user_word_progress_status ON public.user_word_progress(user_id, status);

-- Composite index for daily lesson query
CREATE INDEX idx_user_word_progress_due ON public.user_word_progress(user_id, next_review_date, status)
    WHERE next_review_date IS NOT NULL;

-- RLS Policies
ALTER TABLE public.user_word_progress ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own word progress"
    ON public.user_word_progress FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can update own word progress"
    ON public.user_word_progress FOR UPDATE
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own word progress"
    ON public.user_word_progress FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Key Design Decisions:

  • Why BIGSERIAL for id? Each user × 3000 words = millions of rows
  • Why cap interval_days at 180? Prevent runaway intervals
  • Why composite index on (user_id, next_review_date, status)? Optimizes daily lesson query

3.3 public.daily_lessons

Purpose: Pre-defined lesson plans for 120 days.

CREATE TABLE public.daily_lessons (
    -- Primary key
    id SERIAL PRIMARY KEY,

    -- Lesson metadata
    day_number INTEGER UNIQUE NOT NULL,        -- 1-120
    level TEXT NOT NULL,                       -- "A1" | "A2" | "B1" (source of truth)

    -- Lesson content
    new_word_ids JSONB NOT NULL,               -- Array of 25 word IDs
    /*
    Example: [1, 2, 3, ..., 25]
    */

    estimated_duration_minutes INTEGER DEFAULT 25,

    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    CHECK (day_number >= 1 AND day_number <= 120),
    CHECK (level IN ('A1', 'A2', 'B1'))
);

-- Indexes
CREATE INDEX idx_daily_lessons_day ON public.daily_lessons(day_number);
CREATE INDEX idx_daily_lessons_level ON public.daily_lessons(level);

Key Design Decisions:

  • Why JSONB for new_word_ids? Array of integers, easy to query with @> operator
  • Why level column? Frontend MUST NOT derive level from day_number

3.4 public.user_daily_progress

Purpose: Track daily lesson completion.

CREATE TABLE public.user_daily_progress (
    -- Primary key
    id BIGSERIAL PRIMARY KEY,

    -- Foreign keys
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    daily_lesson_id INTEGER NOT NULL REFERENCES public.daily_lessons(id) ON DELETE CASCADE,

    -- Progress
    status TEXT DEFAULT 'not_started',         -- "not_started" | "in_progress" | "completed"
    words_learned_count INTEGER DEFAULT 0,     -- New words completed
    words_reviewed_count INTEGER DEFAULT 0,    -- Review words completed
    accuracy_rate NUMERIC(5,2),                -- % correct answers (0-100)
    time_spent_minutes INTEGER,

    -- Timestamps
    completed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    UNIQUE(user_id, daily_lesson_id),
    CHECK (status IN ('not_started', 'in_progress', 'completed')),
    CHECK (accuracy_rate >= 0 AND accuracy_rate <= 100)
);

-- Indexes
CREATE INDEX idx_user_daily_progress_user ON public.user_daily_progress(user_id);
CREATE INDEX idx_user_daily_progress_completed ON public.user_daily_progress(user_id, completed_at DESC);

-- RLS Policies
ALTER TABLE public.user_daily_progress ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own daily progress"
    ON public.user_daily_progress FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can update own daily progress"
    ON public.user_daily_progress FOR UPDATE
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own daily progress"
    ON public.user_daily_progress FOR INSERT
    WITH CHECK (auth.uid() = user_id);

3.5 public.checkpoint_tests

Purpose: Milestone test templates (Day 40, 80, 120).

CREATE TABLE public.checkpoint_tests (
    -- Primary key
    id SERIAL PRIMARY KEY,

    -- Test metadata
    day_number INTEGER UNIQUE NOT NULL,        -- 40 | 80 | 120
    level TEXT NOT NULL,                       -- "A1" | "A2" | "B1"
    test_type TEXT DEFAULT 'mixed',            -- "recognition" | "recall" | "mixed"

    -- Test parameters
    total_words_tested INTEGER NOT NULL,       -- 1000 (day 40/80) or 500 (day 120)
    pass_threshold NUMERIC(3,2) DEFAULT 0.70,  -- 70% to pass

    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    CHECK (day_number IN (40, 80, 120)),
    CHECK (level IN ('A1', 'A2', 'B1')),
    CHECK (test_type IN ('recognition', 'recall', 'mixed')),
    CHECK (pass_threshold >= 0 AND pass_threshold <= 1)
);

-- Indexes
CREATE INDEX idx_checkpoint_tests_day ON public.checkpoint_tests(day_number);

Key Design Decisions:

  • Why separate test_type? May experiment with different test formats
  • Why pass_threshold as NUMERIC? Can adjust difficulty per checkpoint

3.6 public.user_checkpoint_results

Purpose: Store user checkpoint test results.

CREATE TABLE public.user_checkpoint_results (
    -- Primary key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Foreign keys
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    checkpoint_test_id INTEGER NOT NULL REFERENCES public.checkpoint_tests(id),

    -- Results
    total_questions INTEGER NOT NULL,
    correct_answers INTEGER NOT NULL,
    score_percentage NUMERIC(5,2) NOT NULL,
    passed BOOLEAN NOT NULL,

    -- Detailed answers (for analysis)
    answers JSONB,                             -- Array of {word_id, user_answer, is_correct}

    -- Timestamps
    completed_at TIMESTAMPTZ DEFAULT NOW(),
    created_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    CHECK (score_percentage >= 0 AND score_percentage <= 100),
    CHECK (correct_answers >= 0 AND correct_answers <= total_questions)
);

-- Indexes
CREATE INDEX idx_user_checkpoint_user ON public.user_checkpoint_results(user_id);
CREATE INDEX idx_user_checkpoint_test ON public.user_checkpoint_results(checkpoint_test_id);
CREATE INDEX idx_user_checkpoint_completed ON public.user_checkpoint_results(completed_at DESC);

-- RLS Policies
ALTER TABLE public.user_checkpoint_results ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own checkpoint results"
    ON public.user_checkpoint_results FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own checkpoint results"
    ON public.user_checkpoint_results FOR INSERT
    WITH CHECK (auth.uid() = user_id);

4. AI Coach System

4.1 public.ai_conversations

Purpose: Store full conversation history with AI Coach.

CREATE TABLE public.ai_conversations (
    -- Primary key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Foreign key
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,

    -- Conversation metadata
    conversation_type TEXT NOT NULL,           -- See definitions.md Section 6.2 for full list
    context TEXT,                              -- What triggered this conversation (e.g., "learning word: apple")

    -- Messages
    user_message TEXT NOT NULL,
    ai_response TEXT NOT NULL,

    -- AI metadata (for debugging/optimization)
    model_used TEXT,                           -- "gpt-4o-mini" | "gpt-4o" | etc.
    prompt_tokens INTEGER,
    completion_tokens INTEGER,
    response_time_ms INTEGER,

    -- Timestamps
    created_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    CHECK (conversation_type IN (
        'vocabulary_question',
        'motivation',
        'scenario_practice',
        'general',
        'milestone_celebration',
        'missed_day_nudge',
        'boundary_redirect',
        'pronunciation_feedback',
        'quiz_feedback'
    ))
);

-- Indexes
CREATE INDEX idx_ai_conversations_user ON public.ai_conversations(user_id);
CREATE INDEX idx_ai_conversations_created ON public.ai_conversations(created_at DESC);
CREATE INDEX idx_ai_conversations_type ON public.ai_conversations(conversation_type);

-- RLS Policies
ALTER TABLE public.ai_conversations ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own AI conversations"
    ON public.ai_conversations FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own AI conversations"
    ON public.ai_conversations FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Key Design Decisions:

  • Why conversation_type? Analytics: what do users ask AI most?
  • Why store tokens? Monitor costs per user
  • Why response_time_ms? Identify slow prompts

4.2 public.ai_feedback

Purpose: User feedback on AI responses for continuous improvement.

CREATE TABLE public.ai_feedback (
    -- Primary key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Foreign key
    conversation_id UUID NOT NULL REFERENCES public.ai_conversations(id) ON DELETE CASCADE,
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,

    -- Feedback
    is_helpful BOOLEAN NOT NULL,               -- Thumbs up/down
    rating SMALLINT,                           -- 1-5 stars (optional)
    feedback_text TEXT,                        -- Optional written feedback

    -- Categorization (optional)
    issue_type TEXT,                           -- "incorrect_info" | "not_relevant" | "confusing" | "other"

    -- Timestamps
    created_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    UNIQUE(conversation_id),                   -- One feedback per conversation
    CHECK (rating >= 1 AND rating <= 5),
    CHECK (issue_type IN ('incorrect_info', 'not_relevant', 'confusing', 'other'))
);

-- Indexes
CREATE INDEX idx_ai_feedback_user ON public.ai_feedback(user_id);
CREATE INDEX idx_ai_feedback_helpful ON public.ai_feedback(is_helpful);
CREATE INDEX idx_ai_feedback_created ON public.ai_feedback(created_at DESC);

-- RLS Policies
ALTER TABLE public.ai_feedback ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own AI feedback"
    ON public.ai_feedback FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own AI feedback"
    ON public.ai_feedback FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Key Design Decisions:

  • Why separate table? Keeps ai_conversations clean, not all conversations get feedback
  • Why UNIQUE(conversation_id)? One feedback per conversation prevents spam

Indexes Strategy

Indexing Philosophy

Principle: Index queries you run frequently, not every column.

Strategy:

  1. User-scoped queries: Always index user_id for user-specific tables
  2. Date-based queries: Index next_review_date, created_at for sorting
  3. Status filters: Index status columns for filtering
  4. Composite indexes: For queries filtering multiple columns simultaneously

Performance Targets

Query Type Target Time Index Used
Get user profile <10ms PK lookup
Get daily lesson <50ms idx_daily_lessons_day
Get due reviews <100ms idx_user_word_progress_due
Get user stats <200ms Denormalized in user_profiles
Checkpoint test <500ms Sampling with idx_words_level

Row Level Security (RLS)

RLS Philosophy

Principle: User data is isolated at database level, not application level.

Benefits:

  • Even if application logic has bugs, users can't access others' data
  • Supabase client queries are automatically filtered
  • Reduces boilerplate authorization code

Policy Patterns

Pattern 1: User-owned data (most tables)

CREATE POLICY "Users can view own {resource}"
    ON public.{table} FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can update own {resource}"
    ON public.{table} FOR UPDATE
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own {resource}"
    ON public.{table} FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Tables: user_profiles, user_word_progress, user_daily_progress, user_diagnosis_results, user_checkpoint_results, ai_conversations, ai_feedback

Pattern 2: Public read-only data

CREATE POLICY "Anyone can read {resource}"
    ON public.{table} FOR SELECT
    USING (true);

Tables: words, daily_lessons, checkpoint_tests, diagnosis_tests

⚠️ Note: No INSERT/UPDATE/DELETE policies = only backend can modify.


Migration Scripts

Migration Order

IMPORTANT: Run migrations in this exact order to satisfy foreign key constraints.

-- Migration 1: User profiles (depends on auth.users)
-- Migration 2: Diagnosis system
-- Migration 3: Vocabulary system
-- Migration 4: AI system

Migration 1: User Profiles

-- File: migrations/001_create_user_profiles.sql

CREATE TABLE public.user_profiles (
    user_id UUID PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,
    nickname TEXT,
    avatar_url TEXT,

    -- AI Coach personalization
    preferred_ai_name TEXT,

    -- Context for AI personalization
    profession TEXT,
    work_environment TEXT,
    uses_english_at_work BOOLEAN DEFAULT false,
    learning_goals TEXT[],
    available_study_time_minutes INTEGER,
    previous_learning_experience TEXT,

    -- Progress & gamification
    current_day INTEGER DEFAULT 1,
    streak_current INTEGER DEFAULT 0,
    streak_longest INTEGER DEFAULT 0,
    last_studied_date DATE,

    -- Statistics
    total_words_learned INTEGER DEFAULT 0,
    total_words_mastered INTEGER DEFAULT 0,
    total_study_minutes INTEGER DEFAULT 0,

    -- Preferences
    daily_reminder_time TIME,
    notification_enabled BOOLEAN DEFAULT true,
    audio_autoplay BOOLEAN DEFAULT true,

    -- Metadata
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),

    -- Constraints
    CHECK (available_study_time_minutes IN (15, 30, 45, 60) OR available_study_time_minutes IS NULL),
    CHECK (previous_learning_experience IN ('school', 'center', 'self', 'never') OR previous_learning_experience IS NULL)
);

CREATE INDEX idx_user_profiles_current_day ON public.user_profiles(current_day);
CREATE INDEX idx_user_profiles_last_studied ON public.user_profiles(last_studied_date);

ALTER TABLE public.user_profiles ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own profile"
    ON public.user_profiles FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can update own profile"
    ON public.user_profiles FOR UPDATE
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own profile"
    ON public.user_profiles FOR INSERT
    WITH CHECK (auth.uid() = user_id);

-- Trigger: Auto-create profile on user signup
CREATE OR REPLACE FUNCTION public.handle_new_user()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO public.user_profiles (user_id)
    VALUES (NEW.id);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

CREATE TRIGGER on_auth_user_created
    AFTER INSERT ON auth.users
    FOR EACH ROW EXECUTE FUNCTION public.handle_new_user();

-- Trigger: Auto-update updated_at
CREATE OR REPLACE FUNCTION public.update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_user_profiles_updated_at
    BEFORE UPDATE ON public.user_profiles
    FOR EACH ROW EXECUTE FUNCTION public.update_updated_at_column();

Migration 2: Diagnosis System

-- File: migrations/002_create_diagnosis_system.sql

CREATE TABLE public.diagnosis_tests (
    id SERIAL PRIMARY KEY,
    test_name TEXT NOT NULL,
    version TEXT NOT NULL UNIQUE,
    is_active BOOLEAN DEFAULT true,
    sections JSONB NOT NULL,
    prescription_rules JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_diagnosis_tests_active ON public.diagnosis_tests(is_active) WHERE is_active = true;

CREATE TABLE public.user_diagnosis_results (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    diagnosis_test_id INTEGER NOT NULL REFERENCES public.diagnosis_tests(id),
    percent_mu NUMERIC(5,2) NOT NULL,
    percent_diec NUMERIC(5,2) NOT NULL,
    percent_cam NUMERIC(5,2) NOT NULL,
    percent_yeu_phan_xa NUMERIC(5,2) NOT NULL,
    prescription_type TEXT NOT NULL,
    raw_answers JSONB,
    completed_at TIMESTAMPTZ DEFAULT NOW(),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    CHECK (percent_mu >= 0 AND percent_mu <= 100),
    CHECK (percent_diec >= 0 AND percent_diec <= 100),
    CHECK (percent_cam >= 0 AND percent_cam <= 100),
    CHECK (percent_yeu_phan_xa >= 0 AND percent_yeu_phan_xa <= 100),
    CHECK (prescription_type IN ('don_mu', 'don_diec', 'don_cam', 'don_yeu_phan_xa', 'don_foundation'))
);

CREATE INDEX idx_user_diagnosis_user ON public.user_diagnosis_results(user_id);
CREATE INDEX idx_user_diagnosis_completed ON public.user_diagnosis_results(completed_at DESC);

ALTER TABLE public.user_diagnosis_results ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own diagnosis results"
    ON public.user_diagnosis_results FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own diagnosis results"
    ON public.user_diagnosis_results FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Migration 3: Vocabulary System

-- File: migrations/003_create_vocabulary_system.sql

-- 3.1 Words table
CREATE TABLE public.words (
    id SERIAL PRIMARY KEY,
    word TEXT NOT NULL,
    vietnamese_meaning TEXT NOT NULL,
    part_of_speech TEXT NOT NULL,
    difficulty_level TEXT NOT NULL,
    day_introduced INTEGER NOT NULL,
    frequency_rank INTEGER,
    ipa TEXT,
    pronunciation_difficulty SMALLINT DEFAULT 1,
    is_concrete BOOLEAN DEFAULT true,
    image_url TEXT,
    audio_url TEXT NOT NULL,
    example_sentence_en TEXT,
    example_sentence_vi TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(word, part_of_speech),
    CHECK (difficulty_level IN ('A1', 'A2', 'B1')),
    CHECK (day_introduced >= 1 AND day_introduced <= 120),
    CHECK (pronunciation_difficulty >= 1 AND pronunciation_difficulty <= 3)
);

CREATE INDEX idx_words_day ON public.words(day_introduced);
CREATE INDEX idx_words_level ON public.words(difficulty_level);
CREATE INDEX idx_words_word ON public.words(word);
CREATE INDEX idx_words_concrete ON public.words(is_concrete) WHERE is_concrete = true;

-- 3.2 Daily lessons
CREATE TABLE public.daily_lessons (
    id SERIAL PRIMARY KEY,
    day_number INTEGER UNIQUE NOT NULL,
    level TEXT NOT NULL,
    new_word_ids JSONB NOT NULL,
    estimated_duration_minutes INTEGER DEFAULT 25,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    CHECK (day_number >= 1 AND day_number <= 120),
    CHECK (level IN ('A1', 'A2', 'B1'))
);

CREATE INDEX idx_daily_lessons_day ON public.daily_lessons(day_number);
CREATE INDEX idx_daily_lessons_level ON public.daily_lessons(level);

-- 3.3 User word progress
CREATE TABLE public.user_word_progress (
    id BIGSERIAL PRIMARY KEY,
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    word_id INTEGER NOT NULL REFERENCES public.words(id) ON DELETE CASCADE,
    status TEXT DEFAULT 'new',
    ease_factor NUMERIC(3,2) DEFAULT 2.5,
    repetition_count INTEGER DEFAULT 0,
    interval_days INTEGER DEFAULT 1,
    next_review_date DATE,
    last_quality_score SMALLINT,
    correct_count INTEGER DEFAULT 0,
    incorrect_count INTEGER DEFAULT 0,
    last_reviewed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(user_id, word_id),
    CHECK (status IN ('new', 'learning', 'mastered', 'forgotten')),
    CHECK (ease_factor >= 1.3),
    CHECK (interval_days >= 1 AND interval_days <= 180),
    CHECK (last_quality_score >= 0 AND last_quality_score <= 5)
);

CREATE INDEX idx_user_word_progress_user ON public.user_word_progress(user_id);
CREATE INDEX idx_user_word_progress_review_date ON public.user_word_progress(user_id, next_review_date);
CREATE INDEX idx_user_word_progress_status ON public.user_word_progress(user_id, status);
CREATE INDEX idx_user_word_progress_due ON public.user_word_progress(user_id, next_review_date, status)
    WHERE next_review_date IS NOT NULL;

ALTER TABLE public.user_word_progress ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own word progress"
    ON public.user_word_progress FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can update own word progress"
    ON public.user_word_progress FOR UPDATE
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own word progress"
    ON public.user_word_progress FOR INSERT
    WITH CHECK (auth.uid() = user_id);

-- 3.4 User daily progress
CREATE TABLE public.user_daily_progress (
    id BIGSERIAL PRIMARY KEY,
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    daily_lesson_id INTEGER NOT NULL REFERENCES public.daily_lessons(id) ON DELETE CASCADE,
    status TEXT DEFAULT 'not_started',
    words_learned_count INTEGER DEFAULT 0,
    words_reviewed_count INTEGER DEFAULT 0,
    accuracy_rate NUMERIC(5,2),
    time_spent_minutes INTEGER,
    completed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(user_id, daily_lesson_id),
    CHECK (status IN ('not_started', 'in_progress', 'completed')),
    CHECK (accuracy_rate >= 0 AND accuracy_rate <= 100)
);

CREATE INDEX idx_user_daily_progress_user ON public.user_daily_progress(user_id);
CREATE INDEX idx_user_daily_progress_completed ON public.user_daily_progress(user_id, completed_at DESC);

ALTER TABLE public.user_daily_progress ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own daily progress"
    ON public.user_daily_progress FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can update own daily progress"
    ON public.user_daily_progress FOR UPDATE
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own daily progress"
    ON public.user_daily_progress FOR INSERT
    WITH CHECK (auth.uid() = user_id);

-- 3.5 Checkpoint tests
CREATE TABLE public.checkpoint_tests (
    id SERIAL PRIMARY KEY,
    day_number INTEGER UNIQUE NOT NULL,
    level TEXT NOT NULL,
    test_type TEXT DEFAULT 'mixed',
    total_words_tested INTEGER NOT NULL,
    pass_threshold NUMERIC(3,2) DEFAULT 0.70,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    CHECK (day_number IN (40, 80, 120)),
    CHECK (level IN ('A1', 'A2', 'B1')),
    CHECK (test_type IN ('recognition', 'recall', 'mixed')),
    CHECK (pass_threshold >= 0 AND pass_threshold <= 1)
);

CREATE INDEX idx_checkpoint_tests_day ON public.checkpoint_tests(day_number);

-- 3.6 User checkpoint results
CREATE TABLE public.user_checkpoint_results (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    checkpoint_test_id INTEGER NOT NULL REFERENCES public.checkpoint_tests(id),
    total_questions INTEGER NOT NULL,
    correct_answers INTEGER NOT NULL,
    score_percentage NUMERIC(5,2) NOT NULL,
    passed BOOLEAN NOT NULL,
    answers JSONB,
    completed_at TIMESTAMPTZ DEFAULT NOW(),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    CHECK (score_percentage >= 0 AND score_percentage <= 100),
    CHECK (correct_answers >= 0 AND correct_answers <= total_questions)
);

CREATE INDEX idx_user_checkpoint_user ON public.user_checkpoint_results(user_id);
CREATE INDEX idx_user_checkpoint_test ON public.user_checkpoint_results(checkpoint_test_id);
CREATE INDEX idx_user_checkpoint_completed ON public.user_checkpoint_results(completed_at DESC);

ALTER TABLE public.user_checkpoint_results ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own checkpoint results"
    ON public.user_checkpoint_results FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own checkpoint results"
    ON public.user_checkpoint_results FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Migration 4: AI System

-- File: migrations/004_create_ai_system.sql

-- 4.1 AI conversations
CREATE TABLE public.ai_conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    conversation_type TEXT NOT NULL,
    context TEXT,
    user_message TEXT NOT NULL,
    ai_response TEXT NOT NULL,
    model_used TEXT,
    prompt_tokens INTEGER,
    completion_tokens INTEGER,
    response_time_ms INTEGER,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    CHECK (conversation_type IN (
        'vocabulary_question',
        'motivation',
        'scenario_practice',
        'general',
        'milestone_celebration',
        'missed_day_nudge',
        'boundary_redirect',
        'pronunciation_feedback',
        'quiz_feedback'
    ))
);

CREATE INDEX idx_ai_conversations_user ON public.ai_conversations(user_id);
CREATE INDEX idx_ai_conversations_created ON public.ai_conversations(created_at DESC);
CREATE INDEX idx_ai_conversations_type ON public.ai_conversations(conversation_type);

ALTER TABLE public.ai_conversations ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own AI conversations"
    ON public.ai_conversations FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own AI conversations"
    ON public.ai_conversations FOR INSERT
    WITH CHECK (auth.uid() = user_id);

-- 4.2 AI feedback
CREATE TABLE public.ai_feedback (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID NOT NULL REFERENCES public.ai_conversations(id) ON DELETE CASCADE,
    user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
    is_helpful BOOLEAN NOT NULL,
    rating SMALLINT,
    feedback_text TEXT,
    issue_type TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(conversation_id),
    CHECK (rating >= 1 AND rating <= 5),
    CHECK (issue_type IN ('incorrect_info', 'not_relevant', 'confusing', 'other'))
);

CREATE INDEX idx_ai_feedback_user ON public.ai_feedback(user_id);
CREATE INDEX idx_ai_feedback_helpful ON public.ai_feedback(is_helpful);
CREATE INDEX idx_ai_feedback_created ON public.ai_feedback(created_at DESC);

ALTER TABLE public.ai_feedback ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own AI feedback"
    ON public.ai_feedback FOR SELECT
    USING (auth.uid() = user_id);

CREATE POLICY "Users can insert own AI feedback"
    ON public.ai_feedback FOR INSERT
    WITH CHECK (auth.uid() = user_id);

Data Validation

Post-Migration Checks

-- File: scripts/validate_schema.sql

-- Check 1: All tables created
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name;

-- Expected output:
-- ai_conversations
-- ai_feedback
-- checkpoint_tests
-- daily_lessons
-- diagnosis_tests
-- user_checkpoint_results
-- user_daily_progress
-- user_diagnosis_results
-- user_profiles
-- user_word_progress
-- words

-- Check 2: RLS enabled
SELECT tablename, rowsecurity
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY tablename;

-- All user-owned tables should have rowsecurity = true

-- Check 3: Indexes created
SELECT
    schemaname,
    tablename,
    indexname
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY tablename, indexname;

-- Check 4: Foreign keys
SELECT
    tc.table_name,
    kcu.column_name,
    ccu.table_name AS foreign_table_name,
    ccu.column_name AS foreign_column_name
FROM
    information_schema.table_constraints AS tc
    JOIN information_schema.key_column_usage AS kcu
      ON tc.constraint_name = kcu.constraint_name
      AND tc.table_schema = kcu.table_schema
    JOIN information_schema.constraint_column_usage AS ccu
      ON ccu.constraint_name = tc.constraint_name
      AND ccu.table_schema = tc.table_schema
WHERE tc.constraint_type = 'FOREIGN KEY'
  AND tc.table_schema = 'public'
ORDER BY tc.table_name;

Implementation Notes

Supabase Setup

  1. Create Supabase project

  2. Run migrations

    • Use Supabase SQL Editor or CLI
    • Run migrations in order (001 → 002 → 003 → 004)
  3. Configure RLS

    • Verify RLS policies in Supabase Dashboard > Authentication > Policies
  4. Setup auth providers

    • Enable Google OAuth in Supabase Dashboard > Authentication > Providers
    • Add Google OAuth credentials

Environment Variables

# .env.local
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key  # Backend only

Next Steps

After schema is implemented:

  1. Seed static data:

    • diagnosis_tests: Create initial diagnosis test
    • checkpoint_tests: Create 3 checkpoint tests (day 40, 80, 120)
    • words: Migrate 3,000 Oxford words (see don-mu-vocabulary.md)
    • daily_lessons: Generate 120 lesson plans
  2. Test with sample users:

    • Create test accounts
    • Verify RLS policies work
    • Check performance of queries
  3. Implement service layer:

    • VocabularyService
    • SpacedRepetitionService
    • DiagnosisService
    • AICoachService

End of Database Schema Design

This schema is ready for implementation. All tables include proper constraints, indexes, and RLS policies for security and performance.