Database Schema - Learn English Zero
Design Document Date: 2026-01-09 Status: Complete - Ready for Implementation Database: Supabase PostgreSQL Author: Claude (via /brainstorm interactive session)
Constants Reference: Xem
definitions.mdcho tất cả constants và enums.
Table of Contents
- Architecture Overview
- Schema Diagram
- Core Tables
- Indexes Strategy
- Row Level Security (RLS)
- Migration Scripts
- Data Validation
Architecture Overview
Database Choice: Supabase PostgreSQL
Rationale:
- Built-in authentication (
auth.users) with Google OAuth - Row Level Security (RLS) for data protection
- Real-time subscriptions for progress tracking
- Free tier: 500MB database, 2GB bandwidth/month (sufficient for MVP)
Schema Organization
Supabase Database Structure:
├── auth schema (managed by Supabase)
│ └── users # OAuth, email/password, tokens
│
├── public schema (custom tables)
│ ├── user_profiles # User info & preferences
│ ├── diagnosis_tests # Template for diagnosis tests
│ ├── user_diagnosis_results # User test results & prescriptions
│ ├── words # Oxford 3000 vocabulary
│ ├── daily_lessons # 120-day lesson plans
│ ├── user_word_progress # SM-2 spaced repetition state
│ ├── user_daily_progress # Daily completion tracking
│ ├── checkpoint_tests # Milestone test templates
│ ├── user_checkpoint_results # User checkpoint results
│ ├── ai_conversations # AI Coach chat history
│ └── ai_feedback # User feedback on AI responses
Schema Diagram
erDiagram
AUTH_USERS ||--o{ USER_PROFILES : "has"
AUTH_USERS ||--o{ USER_DIAGNOSIS_RESULTS : "takes"
AUTH_USERS ||--o{ USER_WORD_PROGRESS : "learns"
AUTH_USERS ||--o{ USER_DAILY_PROGRESS : "completes"
AUTH_USERS ||--o{ USER_CHECKPOINT_RESULTS : "attempts"
AUTH_USERS ||--o{ AI_CONVERSATIONS : "chats"
DIAGNOSIS_TESTS ||--o{ USER_DIAGNOSIS_RESULTS : "generates"
WORDS ||--o{ USER_WORD_PROGRESS : "tracked_in"
DAILY_LESSONS ||--o{ USER_DAILY_PROGRESS : "completed_in"
CHECKPOINT_TESTS ||--o{ USER_CHECKPOINT_RESULTS : "graded_in"
AI_CONVERSATIONS ||--o{ AI_FEEDBACK : "rated_in"
Core Tables
1. User Management
1.1 auth.users (Supabase Managed)
Purpose: Core authentication, managed automatically by Supabase Auth.
Key fields:
id(uuid, PK): User unique identifieremail(text): User emailemail_confirmed_at(timestamp)encrypted_password(text)provider(text): oauth provider (google, email, etc.)last_sign_in_at(timestamp)created_at,updated_at
⚠️ DO NOT modify this table directly.
1.2 public.user_profiles
Purpose: User profile information, preferences, and context for AI personalization.
CREATE TABLE public.user_profiles (
-- Primary key
user_id UUID PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,
-- Basic profile
nickname TEXT,
avatar_url TEXT,
-- AI Coach personalization
preferred_ai_name TEXT, -- Tên user đặt cho AI Coach (e.g., "Bee", "Coach")
-- Context for AI personalization (from additional questions)
profession TEXT, -- Nghề nghiệp (e.g., "kỹ sư", "sinh viên")
work_environment TEXT, -- Môi trường làm việc (e.g., "văn phòng", "remote")
uses_english_at_work BOOLEAN DEFAULT false, -- Có dùng tiếng Anh trong công việc không
learning_goals TEXT[], -- Array of goals (e.g., ["travel", "work", "study"])
available_study_time_minutes INTEGER, -- Thời gian có thể học mỗi ngày (15, 30, 45, 60)
previous_learning_experience TEXT, -- "school" | "center" | "self" | "never"
-- Progress & gamification
current_day INTEGER DEFAULT 1, -- Ngày hiện tại trong lộ trình (1-120)
streak_current INTEGER DEFAULT 0, -- Streak hiện tại (số ngày liên tiếp)
streak_longest INTEGER DEFAULT 0, -- Streak dài nhất từng đạt
last_studied_date DATE, -- Ngày học cuối cùng
-- Statistics (denormalized for performance)
total_words_learned INTEGER DEFAULT 0, -- Tổng số từ đã học (status != 'new')
total_words_mastered INTEGER DEFAULT 0, -- Tổng số từ đã thành thạo
total_study_minutes INTEGER DEFAULT 0, -- Tổng thời gian học (phút)
-- Preferences
daily_reminder_time TIME, -- Thời gian nhắc nhở hàng ngày (e.g., 19:00)
notification_enabled BOOLEAN DEFAULT true,
audio_autoplay BOOLEAN DEFAULT true,
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CHECK (available_study_time_minutes IN (15, 30, 45, 60) OR available_study_time_minutes IS NULL),
CHECK (previous_learning_experience IN ('school', 'center', 'self', 'never') OR previous_learning_experience IS NULL)
);
-- Indexes
CREATE INDEX idx_user_profiles_current_day ON public.user_profiles(current_day);
CREATE INDEX idx_user_profiles_last_studied ON public.user_profiles(last_studied_date);
-- RLS Policies
ALTER TABLE public.user_profiles ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own profile"
ON public.user_profiles FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can update own profile"
ON public.user_profiles FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own profile"
ON public.user_profiles FOR INSERT
WITH CHECK (auth.uid() = user_id);
Key Design Decisions:
- Why denormalize stats? Frequent queries for dashboard, JOIN-heavy otherwise
- Why TEXT[] for learning_goals? Flexible, can add new goals without migration
- Why separate work_environment? AI uses this to personalize vocabulary examples
2. Diagnosis System
2.1 public.diagnosis_tests
Purpose: Template for diagnosis tests (4 loại bệnh: Mù/Điếc/Câm/Yếu phản xạ).
CREATE TABLE public.diagnosis_tests (
-- Primary key
id SERIAL PRIMARY KEY,
-- Test metadata
test_name TEXT NOT NULL, -- e.g., "Initial Diagnosis Test v1.0"
version TEXT NOT NULL, -- Versioning for future improvements
is_active BOOLEAN DEFAULT true, -- Only one active version at a time
-- Test structure (JSONB for flexibility)
sections JSONB NOT NULL, -- Array of test sections
/*
Example structure:
{
"reading": {
"name": "Đọc - Mù",
"questions": [...],
"weight": 0.25
},
"listening": {
"name": "Nghe - Điếc",
"questions": [...],
"weight": 0.25
},
"speaking": {
"name": "Nói - Câm",
"questions": [...],
"weight": 0.25
},
"reflex": {
"name": "Phản xạ - Yếu phản xạ",
"questions": [...],
"weight": 0.25
}
}
*/
-- Prescription logic
prescription_rules JSONB NOT NULL, -- Rules to determine which prescription
/*
Example:
{
"mu": { "threshold": 50, "prescription": "don_mu" },
"diec": { "threshold": 50, "prescription": "don_diec" },
"cam": { "threshold": 50, "prescription": "don_cam" },
"yeu_phan_xa": { "threshold": 50, "prescription": "don_yeu_phan_xa" }
}
*/
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Unique constraint
UNIQUE(version)
);
-- Indexes
CREATE INDEX idx_diagnosis_tests_active ON public.diagnosis_tests(is_active) WHERE is_active = true;
Key Design Decisions:
- Why JSONB? Test structure may evolve, JSONB allows flexibility without schema migration
- Why version field? Can A/B test different diagnosis approaches
2.2 public.user_diagnosis_results
Purpose: Store user diagnosis results and prescribed "đơn thuốc".
CREATE TABLE public.user_diagnosis_results (
-- Primary key
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Foreign keys
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
diagnosis_test_id INTEGER NOT NULL REFERENCES public.diagnosis_tests(id),
-- Test results (%)
percent_mu NUMERIC(5,2) NOT NULL, -- % Mù (0-100)
percent_diec NUMERIC(5,2) NOT NULL, -- % Điếc (0-100)
percent_cam NUMERIC(5,2) NOT NULL, -- % Câm (0-100)
percent_yeu_phan_xa NUMERIC(5,2) NOT NULL, -- % Yếu phản xạ (0-100)
-- Prescribed "đơn thuốc"
prescription_type TEXT NOT NULL, -- "don_mu" | "don_diec" | "don_cam" | "don_yeu_phan_xa" | "don_foundation"
-- Raw answers (for future analysis)
raw_answers JSONB, -- User's answers to each question
-- Metadata
completed_at TIMESTAMPTZ DEFAULT NOW(),
created_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CHECK (percent_mu >= 0 AND percent_mu <= 100),
CHECK (percent_diec >= 0 AND percent_diec <= 100),
CHECK (percent_cam >= 0 AND percent_cam <= 100),
CHECK (percent_yeu_phan_xa >= 0 AND percent_yeu_phan_xa <= 100),
CHECK (prescription_type IN ('don_mu', 'don_diec', 'don_cam', 'don_yeu_phan_xa', 'don_foundation'))
);
-- Indexes
CREATE INDEX idx_user_diagnosis_user ON public.user_diagnosis_results(user_id);
CREATE INDEX idx_user_diagnosis_completed ON public.user_diagnosis_results(completed_at DESC);
-- RLS Policies
ALTER TABLE public.user_diagnosis_results ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own diagnosis results"
ON public.user_diagnosis_results FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own diagnosis results"
ON public.user_diagnosis_results FOR INSERT
WITH CHECK (auth.uid() = user_id);
Key Design Decisions:
- Why UUID for id? Prevent enumeration attacks, more secure
- Why separate percent columns? Easy to query and visualize progress over time
- Why store raw_answers? Can re-score if prescription_rules change
3. Vocabulary System (from don-mu-vocabulary.md)
3.1 public.words
Purpose: Oxford 3000 vocabulary with metadata.
CREATE TABLE public.words (
-- Primary key
id SERIAL PRIMARY KEY,
-- Word data
word TEXT NOT NULL, -- English word (e.g., "apple")
vietnamese_meaning TEXT NOT NULL, -- Vietnamese translation
part_of_speech TEXT NOT NULL, -- "noun" | "verb" | "adj" | "adv" | etc.
-- CEFR & difficulty
difficulty_level TEXT NOT NULL, -- "A1" | "A2" | "B1"
day_introduced INTEGER NOT NULL, -- Day 1-120
frequency_rank INTEGER, -- Oxford frequency ranking (lower = more common)
-- Pronunciation
ipa TEXT, -- IPA pronunciation (optional)
pronunciation_difficulty SMALLINT DEFAULT 1, -- 1=easy, 2=medium, 3=hard for Vietnamese learners
-- Learning aids
is_concrete BOOLEAN DEFAULT true, -- Has physical form (needs image)
image_url TEXT, -- Unsplash image URL
audio_url TEXT NOT NULL, -- Azure TTS audio URL
-- Example sentences
example_sentence_en TEXT,
example_sentence_vi TEXT,
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
UNIQUE(word, part_of_speech), -- Same word can be noun & verb
CHECK (difficulty_level IN ('A1', 'A2', 'B1')),
CHECK (day_introduced >= 1 AND day_introduced <= 120),
CHECK (pronunciation_difficulty >= 1 AND pronunciation_difficulty <= 3)
);
-- Indexes
CREATE INDEX idx_words_day ON public.words(day_introduced);
CREATE INDEX idx_words_level ON public.words(difficulty_level);
CREATE INDEX idx_words_word ON public.words(word);
CREATE INDEX idx_words_concrete ON public.words(is_concrete) WHERE is_concrete = true;
Key Design Decisions:
- Why UNIQUE(word, part_of_speech)? "run" can be both noun and verb
- Why pronunciation_difficulty? AI Coach can prioritize hard-to-pronounce words
3.2 public.user_word_progress
Purpose: SM-2 spaced repetition state for each user-word pair.
CREATE TABLE public.user_word_progress (
-- Primary key
id BIGSERIAL PRIMARY KEY,
-- Foreign keys
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
word_id INTEGER NOT NULL REFERENCES public.words(id) ON DELETE CASCADE,
-- Learning status
status TEXT DEFAULT 'new', -- "new" | "learning" | "mastered" | "forgotten"
-- SM-2 Algorithm parameters
ease_factor NUMERIC(3,2) DEFAULT 2.5, -- 1.3 - 2.5+ (determines interval growth)
repetition_count INTEGER DEFAULT 0, -- Number of successful reviews
interval_days INTEGER DEFAULT 1, -- Days until next review
next_review_date DATE, -- Next scheduled review
-- User performance
last_quality_score SMALLINT, -- 0-5 (last self-rating)
correct_count INTEGER DEFAULT 0,
incorrect_count INTEGER DEFAULT 0,
-- Timestamps
last_reviewed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
UNIQUE(user_id, word_id),
CHECK (status IN ('new', 'learning', 'mastered', 'forgotten')),
CHECK (ease_factor >= 1.3),
CHECK (interval_days >= 1 AND interval_days <= 180), -- Cap at 6 months
CHECK (last_quality_score >= 0 AND last_quality_score <= 5)
);
-- Indexes (critical for performance)
CREATE INDEX idx_user_word_progress_user ON public.user_word_progress(user_id);
CREATE INDEX idx_user_word_progress_review_date ON public.user_word_progress(user_id, next_review_date);
CREATE INDEX idx_user_word_progress_status ON public.user_word_progress(user_id, status);
-- Composite index for daily lesson query
CREATE INDEX idx_user_word_progress_due ON public.user_word_progress(user_id, next_review_date, status)
WHERE next_review_date IS NOT NULL;
-- RLS Policies
ALTER TABLE public.user_word_progress ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own word progress"
ON public.user_word_progress FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can update own word progress"
ON public.user_word_progress FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own word progress"
ON public.user_word_progress FOR INSERT
WITH CHECK (auth.uid() = user_id);
Key Design Decisions:
- Why BIGSERIAL for id? Each user × 3000 words = millions of rows
- Why cap interval_days at 180? Prevent runaway intervals
- Why composite index on (user_id, next_review_date, status)? Optimizes daily lesson query
3.3 public.daily_lessons
Purpose: Pre-defined lesson plans for 120 days.
CREATE TABLE public.daily_lessons (
-- Primary key
id SERIAL PRIMARY KEY,
-- Lesson metadata
day_number INTEGER UNIQUE NOT NULL, -- 1-120
level TEXT NOT NULL, -- "A1" | "A2" | "B1" (source of truth)
-- Lesson content
new_word_ids JSONB NOT NULL, -- Array of 25 word IDs
/*
Example: [1, 2, 3, ..., 25]
*/
estimated_duration_minutes INTEGER DEFAULT 25,
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CHECK (day_number >= 1 AND day_number <= 120),
CHECK (level IN ('A1', 'A2', 'B1'))
);
-- Indexes
CREATE INDEX idx_daily_lessons_day ON public.daily_lessons(day_number);
CREATE INDEX idx_daily_lessons_level ON public.daily_lessons(level);
Key Design Decisions:
- Why JSONB for new_word_ids? Array of integers, easy to query with
@>operator - Why level column? Frontend MUST NOT derive level from day_number
3.4 public.user_daily_progress
Purpose: Track daily lesson completion.
CREATE TABLE public.user_daily_progress (
-- Primary key
id BIGSERIAL PRIMARY KEY,
-- Foreign keys
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
daily_lesson_id INTEGER NOT NULL REFERENCES public.daily_lessons(id) ON DELETE CASCADE,
-- Progress
status TEXT DEFAULT 'not_started', -- "not_started" | "in_progress" | "completed"
words_learned_count INTEGER DEFAULT 0, -- New words completed
words_reviewed_count INTEGER DEFAULT 0, -- Review words completed
accuracy_rate NUMERIC(5,2), -- % correct answers (0-100)
time_spent_minutes INTEGER,
-- Timestamps
completed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
UNIQUE(user_id, daily_lesson_id),
CHECK (status IN ('not_started', 'in_progress', 'completed')),
CHECK (accuracy_rate >= 0 AND accuracy_rate <= 100)
);
-- Indexes
CREATE INDEX idx_user_daily_progress_user ON public.user_daily_progress(user_id);
CREATE INDEX idx_user_daily_progress_completed ON public.user_daily_progress(user_id, completed_at DESC);
-- RLS Policies
ALTER TABLE public.user_daily_progress ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own daily progress"
ON public.user_daily_progress FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can update own daily progress"
ON public.user_daily_progress FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own daily progress"
ON public.user_daily_progress FOR INSERT
WITH CHECK (auth.uid() = user_id);
3.5 public.checkpoint_tests
Purpose: Milestone test templates (Day 40, 80, 120).
CREATE TABLE public.checkpoint_tests (
-- Primary key
id SERIAL PRIMARY KEY,
-- Test metadata
day_number INTEGER UNIQUE NOT NULL, -- 40 | 80 | 120
level TEXT NOT NULL, -- "A1" | "A2" | "B1"
test_type TEXT DEFAULT 'mixed', -- "recognition" | "recall" | "mixed"
-- Test parameters
total_words_tested INTEGER NOT NULL, -- 1000 (day 40/80) or 500 (day 120)
pass_threshold NUMERIC(3,2) DEFAULT 0.70, -- 70% to pass
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CHECK (day_number IN (40, 80, 120)),
CHECK (level IN ('A1', 'A2', 'B1')),
CHECK (test_type IN ('recognition', 'recall', 'mixed')),
CHECK (pass_threshold >= 0 AND pass_threshold <= 1)
);
-- Indexes
CREATE INDEX idx_checkpoint_tests_day ON public.checkpoint_tests(day_number);
Key Design Decisions:
- Why separate test_type? May experiment with different test formats
- Why pass_threshold as NUMERIC? Can adjust difficulty per checkpoint
3.6 public.user_checkpoint_results
Purpose: Store user checkpoint test results.
CREATE TABLE public.user_checkpoint_results (
-- Primary key
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Foreign keys
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
checkpoint_test_id INTEGER NOT NULL REFERENCES public.checkpoint_tests(id),
-- Results
total_questions INTEGER NOT NULL,
correct_answers INTEGER NOT NULL,
score_percentage NUMERIC(5,2) NOT NULL,
passed BOOLEAN NOT NULL,
-- Detailed answers (for analysis)
answers JSONB, -- Array of {word_id, user_answer, is_correct}
-- Timestamps
completed_at TIMESTAMPTZ DEFAULT NOW(),
created_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CHECK (score_percentage >= 0 AND score_percentage <= 100),
CHECK (correct_answers >= 0 AND correct_answers <= total_questions)
);
-- Indexes
CREATE INDEX idx_user_checkpoint_user ON public.user_checkpoint_results(user_id);
CREATE INDEX idx_user_checkpoint_test ON public.user_checkpoint_results(checkpoint_test_id);
CREATE INDEX idx_user_checkpoint_completed ON public.user_checkpoint_results(completed_at DESC);
-- RLS Policies
ALTER TABLE public.user_checkpoint_results ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own checkpoint results"
ON public.user_checkpoint_results FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own checkpoint results"
ON public.user_checkpoint_results FOR INSERT
WITH CHECK (auth.uid() = user_id);
4. AI Coach System
4.1 public.ai_conversations
Purpose: Store full conversation history with AI Coach.
CREATE TABLE public.ai_conversations (
-- Primary key
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Foreign key
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
-- Conversation metadata
conversation_type TEXT NOT NULL, -- See definitions.md Section 6.2 for full list
context TEXT, -- What triggered this conversation (e.g., "learning word: apple")
-- Messages
user_message TEXT NOT NULL,
ai_response TEXT NOT NULL,
-- AI metadata (for debugging/optimization)
model_used TEXT, -- "gpt-4o-mini" | "gpt-4o" | etc.
prompt_tokens INTEGER,
completion_tokens INTEGER,
response_time_ms INTEGER,
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CHECK (conversation_type IN (
'vocabulary_question',
'motivation',
'scenario_practice',
'general',
'milestone_celebration',
'missed_day_nudge',
'boundary_redirect',
'pronunciation_feedback',
'quiz_feedback'
))
);
-- Indexes
CREATE INDEX idx_ai_conversations_user ON public.ai_conversations(user_id);
CREATE INDEX idx_ai_conversations_created ON public.ai_conversations(created_at DESC);
CREATE INDEX idx_ai_conversations_type ON public.ai_conversations(conversation_type);
-- RLS Policies
ALTER TABLE public.ai_conversations ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own AI conversations"
ON public.ai_conversations FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own AI conversations"
ON public.ai_conversations FOR INSERT
WITH CHECK (auth.uid() = user_id);
Key Design Decisions:
- Why conversation_type? Analytics: what do users ask AI most?
- Why store tokens? Monitor costs per user
- Why response_time_ms? Identify slow prompts
4.2 public.ai_feedback
Purpose: User feedback on AI responses for continuous improvement.
CREATE TABLE public.ai_feedback (
-- Primary key
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Foreign key
conversation_id UUID NOT NULL REFERENCES public.ai_conversations(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
-- Feedback
is_helpful BOOLEAN NOT NULL, -- Thumbs up/down
rating SMALLINT, -- 1-5 stars (optional)
feedback_text TEXT, -- Optional written feedback
-- Categorization (optional)
issue_type TEXT, -- "incorrect_info" | "not_relevant" | "confusing" | "other"
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
UNIQUE(conversation_id), -- One feedback per conversation
CHECK (rating >= 1 AND rating <= 5),
CHECK (issue_type IN ('incorrect_info', 'not_relevant', 'confusing', 'other'))
);
-- Indexes
CREATE INDEX idx_ai_feedback_user ON public.ai_feedback(user_id);
CREATE INDEX idx_ai_feedback_helpful ON public.ai_feedback(is_helpful);
CREATE INDEX idx_ai_feedback_created ON public.ai_feedback(created_at DESC);
-- RLS Policies
ALTER TABLE public.ai_feedback ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own AI feedback"
ON public.ai_feedback FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own AI feedback"
ON public.ai_feedback FOR INSERT
WITH CHECK (auth.uid() = user_id);
Key Design Decisions:
- Why separate table? Keeps ai_conversations clean, not all conversations get feedback
- Why UNIQUE(conversation_id)? One feedback per conversation prevents spam
Indexes Strategy
Indexing Philosophy
Principle: Index queries you run frequently, not every column.
Strategy:
- User-scoped queries: Always index
user_idfor user-specific tables - Date-based queries: Index
next_review_date,created_atfor sorting - Status filters: Index
statuscolumns for filtering - Composite indexes: For queries filtering multiple columns simultaneously
Performance Targets
| Query Type | Target Time | Index Used |
|---|---|---|
| Get user profile | <10ms | PK lookup |
| Get daily lesson | <50ms | idx_daily_lessons_day |
| Get due reviews | <100ms | idx_user_word_progress_due |
| Get user stats | <200ms | Denormalized in user_profiles |
| Checkpoint test | <500ms | Sampling with idx_words_level |
Row Level Security (RLS)
RLS Philosophy
Principle: User data is isolated at database level, not application level.
Benefits:
- Even if application logic has bugs, users can't access others' data
- Supabase client queries are automatically filtered
- Reduces boilerplate authorization code
Policy Patterns
Pattern 1: User-owned data (most tables)
CREATE POLICY "Users can view own {resource}"
ON public.{table} FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can update own {resource}"
ON public.{table} FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own {resource}"
ON public.{table} FOR INSERT
WITH CHECK (auth.uid() = user_id);
Tables: user_profiles, user_word_progress, user_daily_progress, user_diagnosis_results, user_checkpoint_results, ai_conversations, ai_feedback
Pattern 2: Public read-only data
CREATE POLICY "Anyone can read {resource}"
ON public.{table} FOR SELECT
USING (true);
Tables: words, daily_lessons, checkpoint_tests, diagnosis_tests
⚠️ Note: No INSERT/UPDATE/DELETE policies = only backend can modify.
Migration Scripts
Migration Order
IMPORTANT: Run migrations in this exact order to satisfy foreign key constraints.
-- Migration 1: User profiles (depends on auth.users)
-- Migration 2: Diagnosis system
-- Migration 3: Vocabulary system
-- Migration 4: AI system
Migration 1: User Profiles
-- File: migrations/001_create_user_profiles.sql
CREATE TABLE public.user_profiles (
user_id UUID PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,
nickname TEXT,
avatar_url TEXT,
-- AI Coach personalization
preferred_ai_name TEXT,
-- Context for AI personalization
profession TEXT,
work_environment TEXT,
uses_english_at_work BOOLEAN DEFAULT false,
learning_goals TEXT[],
available_study_time_minutes INTEGER,
previous_learning_experience TEXT,
-- Progress & gamification
current_day INTEGER DEFAULT 1,
streak_current INTEGER DEFAULT 0,
streak_longest INTEGER DEFAULT 0,
last_studied_date DATE,
-- Statistics
total_words_learned INTEGER DEFAULT 0,
total_words_mastered INTEGER DEFAULT 0,
total_study_minutes INTEGER DEFAULT 0,
-- Preferences
daily_reminder_time TIME,
notification_enabled BOOLEAN DEFAULT true,
audio_autoplay BOOLEAN DEFAULT true,
-- Metadata
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraints
CHECK (available_study_time_minutes IN (15, 30, 45, 60) OR available_study_time_minutes IS NULL),
CHECK (previous_learning_experience IN ('school', 'center', 'self', 'never') OR previous_learning_experience IS NULL)
);
CREATE INDEX idx_user_profiles_current_day ON public.user_profiles(current_day);
CREATE INDEX idx_user_profiles_last_studied ON public.user_profiles(last_studied_date);
ALTER TABLE public.user_profiles ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own profile"
ON public.user_profiles FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can update own profile"
ON public.user_profiles FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own profile"
ON public.user_profiles FOR INSERT
WITH CHECK (auth.uid() = user_id);
-- Trigger: Auto-create profile on user signup
CREATE OR REPLACE FUNCTION public.handle_new_user()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO public.user_profiles (user_id)
VALUES (NEW.id);
RETURN NEW;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE TRIGGER on_auth_user_created
AFTER INSERT ON auth.users
FOR EACH ROW EXECUTE FUNCTION public.handle_new_user();
-- Trigger: Auto-update updated_at
CREATE OR REPLACE FUNCTION public.update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_user_profiles_updated_at
BEFORE UPDATE ON public.user_profiles
FOR EACH ROW EXECUTE FUNCTION public.update_updated_at_column();
Migration 2: Diagnosis System
-- File: migrations/002_create_diagnosis_system.sql
CREATE TABLE public.diagnosis_tests (
id SERIAL PRIMARY KEY,
test_name TEXT NOT NULL,
version TEXT NOT NULL UNIQUE,
is_active BOOLEAN DEFAULT true,
sections JSONB NOT NULL,
prescription_rules JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_diagnosis_tests_active ON public.diagnosis_tests(is_active) WHERE is_active = true;
CREATE TABLE public.user_diagnosis_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
diagnosis_test_id INTEGER NOT NULL REFERENCES public.diagnosis_tests(id),
percent_mu NUMERIC(5,2) NOT NULL,
percent_diec NUMERIC(5,2) NOT NULL,
percent_cam NUMERIC(5,2) NOT NULL,
percent_yeu_phan_xa NUMERIC(5,2) NOT NULL,
prescription_type TEXT NOT NULL,
raw_answers JSONB,
completed_at TIMESTAMPTZ DEFAULT NOW(),
created_at TIMESTAMPTZ DEFAULT NOW(),
CHECK (percent_mu >= 0 AND percent_mu <= 100),
CHECK (percent_diec >= 0 AND percent_diec <= 100),
CHECK (percent_cam >= 0 AND percent_cam <= 100),
CHECK (percent_yeu_phan_xa >= 0 AND percent_yeu_phan_xa <= 100),
CHECK (prescription_type IN ('don_mu', 'don_diec', 'don_cam', 'don_yeu_phan_xa', 'don_foundation'))
);
CREATE INDEX idx_user_diagnosis_user ON public.user_diagnosis_results(user_id);
CREATE INDEX idx_user_diagnosis_completed ON public.user_diagnosis_results(completed_at DESC);
ALTER TABLE public.user_diagnosis_results ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own diagnosis results"
ON public.user_diagnosis_results FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own diagnosis results"
ON public.user_diagnosis_results FOR INSERT
WITH CHECK (auth.uid() = user_id);
Migration 3: Vocabulary System
-- File: migrations/003_create_vocabulary_system.sql
-- 3.1 Words table
CREATE TABLE public.words (
id SERIAL PRIMARY KEY,
word TEXT NOT NULL,
vietnamese_meaning TEXT NOT NULL,
part_of_speech TEXT NOT NULL,
difficulty_level TEXT NOT NULL,
day_introduced INTEGER NOT NULL,
frequency_rank INTEGER,
ipa TEXT,
pronunciation_difficulty SMALLINT DEFAULT 1,
is_concrete BOOLEAN DEFAULT true,
image_url TEXT,
audio_url TEXT NOT NULL,
example_sentence_en TEXT,
example_sentence_vi TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(word, part_of_speech),
CHECK (difficulty_level IN ('A1', 'A2', 'B1')),
CHECK (day_introduced >= 1 AND day_introduced <= 120),
CHECK (pronunciation_difficulty >= 1 AND pronunciation_difficulty <= 3)
);
CREATE INDEX idx_words_day ON public.words(day_introduced);
CREATE INDEX idx_words_level ON public.words(difficulty_level);
CREATE INDEX idx_words_word ON public.words(word);
CREATE INDEX idx_words_concrete ON public.words(is_concrete) WHERE is_concrete = true;
-- 3.2 Daily lessons
CREATE TABLE public.daily_lessons (
id SERIAL PRIMARY KEY,
day_number INTEGER UNIQUE NOT NULL,
level TEXT NOT NULL,
new_word_ids JSONB NOT NULL,
estimated_duration_minutes INTEGER DEFAULT 25,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
CHECK (day_number >= 1 AND day_number <= 120),
CHECK (level IN ('A1', 'A2', 'B1'))
);
CREATE INDEX idx_daily_lessons_day ON public.daily_lessons(day_number);
CREATE INDEX idx_daily_lessons_level ON public.daily_lessons(level);
-- 3.3 User word progress
CREATE TABLE public.user_word_progress (
id BIGSERIAL PRIMARY KEY,
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
word_id INTEGER NOT NULL REFERENCES public.words(id) ON DELETE CASCADE,
status TEXT DEFAULT 'new',
ease_factor NUMERIC(3,2) DEFAULT 2.5,
repetition_count INTEGER DEFAULT 0,
interval_days INTEGER DEFAULT 1,
next_review_date DATE,
last_quality_score SMALLINT,
correct_count INTEGER DEFAULT 0,
incorrect_count INTEGER DEFAULT 0,
last_reviewed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(user_id, word_id),
CHECK (status IN ('new', 'learning', 'mastered', 'forgotten')),
CHECK (ease_factor >= 1.3),
CHECK (interval_days >= 1 AND interval_days <= 180),
CHECK (last_quality_score >= 0 AND last_quality_score <= 5)
);
CREATE INDEX idx_user_word_progress_user ON public.user_word_progress(user_id);
CREATE INDEX idx_user_word_progress_review_date ON public.user_word_progress(user_id, next_review_date);
CREATE INDEX idx_user_word_progress_status ON public.user_word_progress(user_id, status);
CREATE INDEX idx_user_word_progress_due ON public.user_word_progress(user_id, next_review_date, status)
WHERE next_review_date IS NOT NULL;
ALTER TABLE public.user_word_progress ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own word progress"
ON public.user_word_progress FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can update own word progress"
ON public.user_word_progress FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own word progress"
ON public.user_word_progress FOR INSERT
WITH CHECK (auth.uid() = user_id);
-- 3.4 User daily progress
CREATE TABLE public.user_daily_progress (
id BIGSERIAL PRIMARY KEY,
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
daily_lesson_id INTEGER NOT NULL REFERENCES public.daily_lessons(id) ON DELETE CASCADE,
status TEXT DEFAULT 'not_started',
words_learned_count INTEGER DEFAULT 0,
words_reviewed_count INTEGER DEFAULT 0,
accuracy_rate NUMERIC(5,2),
time_spent_minutes INTEGER,
completed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(user_id, daily_lesson_id),
CHECK (status IN ('not_started', 'in_progress', 'completed')),
CHECK (accuracy_rate >= 0 AND accuracy_rate <= 100)
);
CREATE INDEX idx_user_daily_progress_user ON public.user_daily_progress(user_id);
CREATE INDEX idx_user_daily_progress_completed ON public.user_daily_progress(user_id, completed_at DESC);
ALTER TABLE public.user_daily_progress ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own daily progress"
ON public.user_daily_progress FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can update own daily progress"
ON public.user_daily_progress FOR UPDATE
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own daily progress"
ON public.user_daily_progress FOR INSERT
WITH CHECK (auth.uid() = user_id);
-- 3.5 Checkpoint tests
CREATE TABLE public.checkpoint_tests (
id SERIAL PRIMARY KEY,
day_number INTEGER UNIQUE NOT NULL,
level TEXT NOT NULL,
test_type TEXT DEFAULT 'mixed',
total_words_tested INTEGER NOT NULL,
pass_threshold NUMERIC(3,2) DEFAULT 0.70,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
CHECK (day_number IN (40, 80, 120)),
CHECK (level IN ('A1', 'A2', 'B1')),
CHECK (test_type IN ('recognition', 'recall', 'mixed')),
CHECK (pass_threshold >= 0 AND pass_threshold <= 1)
);
CREATE INDEX idx_checkpoint_tests_day ON public.checkpoint_tests(day_number);
-- 3.6 User checkpoint results
CREATE TABLE public.user_checkpoint_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
checkpoint_test_id INTEGER NOT NULL REFERENCES public.checkpoint_tests(id),
total_questions INTEGER NOT NULL,
correct_answers INTEGER NOT NULL,
score_percentage NUMERIC(5,2) NOT NULL,
passed BOOLEAN NOT NULL,
answers JSONB,
completed_at TIMESTAMPTZ DEFAULT NOW(),
created_at TIMESTAMPTZ DEFAULT NOW(),
CHECK (score_percentage >= 0 AND score_percentage <= 100),
CHECK (correct_answers >= 0 AND correct_answers <= total_questions)
);
CREATE INDEX idx_user_checkpoint_user ON public.user_checkpoint_results(user_id);
CREATE INDEX idx_user_checkpoint_test ON public.user_checkpoint_results(checkpoint_test_id);
CREATE INDEX idx_user_checkpoint_completed ON public.user_checkpoint_results(completed_at DESC);
ALTER TABLE public.user_checkpoint_results ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own checkpoint results"
ON public.user_checkpoint_results FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own checkpoint results"
ON public.user_checkpoint_results FOR INSERT
WITH CHECK (auth.uid() = user_id);
Migration 4: AI System
-- File: migrations/004_create_ai_system.sql
-- 4.1 AI conversations
CREATE TABLE public.ai_conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
conversation_type TEXT NOT NULL,
context TEXT,
user_message TEXT NOT NULL,
ai_response TEXT NOT NULL,
model_used TEXT,
prompt_tokens INTEGER,
completion_tokens INTEGER,
response_time_ms INTEGER,
created_at TIMESTAMPTZ DEFAULT NOW(),
CHECK (conversation_type IN (
'vocabulary_question',
'motivation',
'scenario_practice',
'general',
'milestone_celebration',
'missed_day_nudge',
'boundary_redirect',
'pronunciation_feedback',
'quiz_feedback'
))
);
CREATE INDEX idx_ai_conversations_user ON public.ai_conversations(user_id);
CREATE INDEX idx_ai_conversations_created ON public.ai_conversations(created_at DESC);
CREATE INDEX idx_ai_conversations_type ON public.ai_conversations(conversation_type);
ALTER TABLE public.ai_conversations ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own AI conversations"
ON public.ai_conversations FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own AI conversations"
ON public.ai_conversations FOR INSERT
WITH CHECK (auth.uid() = user_id);
-- 4.2 AI feedback
CREATE TABLE public.ai_feedback (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID NOT NULL REFERENCES public.ai_conversations(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
is_helpful BOOLEAN NOT NULL,
rating SMALLINT,
feedback_text TEXT,
issue_type TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(conversation_id),
CHECK (rating >= 1 AND rating <= 5),
CHECK (issue_type IN ('incorrect_info', 'not_relevant', 'confusing', 'other'))
);
CREATE INDEX idx_ai_feedback_user ON public.ai_feedback(user_id);
CREATE INDEX idx_ai_feedback_helpful ON public.ai_feedback(is_helpful);
CREATE INDEX idx_ai_feedback_created ON public.ai_feedback(created_at DESC);
ALTER TABLE public.ai_feedback ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own AI feedback"
ON public.ai_feedback FOR SELECT
USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own AI feedback"
ON public.ai_feedback FOR INSERT
WITH CHECK (auth.uid() = user_id);
Data Validation
Post-Migration Checks
-- File: scripts/validate_schema.sql
-- Check 1: All tables created
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name;
-- Expected output:
-- ai_conversations
-- ai_feedback
-- checkpoint_tests
-- daily_lessons
-- diagnosis_tests
-- user_checkpoint_results
-- user_daily_progress
-- user_diagnosis_results
-- user_profiles
-- user_word_progress
-- words
-- Check 2: RLS enabled
SELECT tablename, rowsecurity
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY tablename;
-- All user-owned tables should have rowsecurity = true
-- Check 3: Indexes created
SELECT
schemaname,
tablename,
indexname
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY tablename, indexname;
-- Check 4: Foreign keys
SELECT
tc.table_name,
kcu.column_name,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM
information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
ON tc.constraint_name = kcu.constraint_name
AND tc.table_schema = kcu.table_schema
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
AND ccu.table_schema = tc.table_schema
WHERE tc.constraint_type = 'FOREIGN KEY'
AND tc.table_schema = 'public'
ORDER BY tc.table_name;
Implementation Notes
Supabase Setup
-
Create Supabase project
- Go to https://supabase.com/dashboard
- Create new project (choose region closest to Vietnam: Singapore)
-
Run migrations
- Use Supabase SQL Editor or CLI
- Run migrations in order (001 → 002 → 003 → 004)
-
Configure RLS
- Verify RLS policies in Supabase Dashboard > Authentication > Policies
-
Setup auth providers
- Enable Google OAuth in Supabase Dashboard > Authentication > Providers
- Add Google OAuth credentials
Environment Variables
# .env.local
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key # Backend only
Next Steps
After schema is implemented:
-
Seed static data:
diagnosis_tests: Create initial diagnosis testcheckpoint_tests: Create 3 checkpoint tests (day 40, 80, 120)words: Migrate 3,000 Oxford words (seedon-mu-vocabulary.md)daily_lessons: Generate 120 lesson plans
-
Test with sample users:
- Create test accounts
- Verify RLS policies work
- Check performance of queries
-
Implement service layer:
- VocabularyService
- SpacedRepetitionService
- DiagnosisService
- AICoachService
End of Database Schema Design
This schema is ready for implementation. All tables include proper constraints, indexes, and RLS policies for security and performance.