tooling/vercel-ai-sdk/.claude/commands/ai-rag-setup.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252

---
allowed-tools: Read, Write, Edit, MultiEdit, Bash
description: Set up RAG (Retrieval-Augmented Generation) system
argument-hint: "[basic|advanced|conversational|agentic]"
---

## Set up RAG (Retrieval-Augmented Generation) System

Create a comprehensive RAG implementation with embeddings, vector storage, and retrieval: $ARGUMENTS

### Current Project Analysis

Existing database setup: !`find . -name "*schema*" -o -name "*migration*" -o -name "drizzle.config.*" | head -5`

Vector database configuration: !`grep -r "vector\|embedding" . --include="*.ts" --include="*.sql" | head -5`

AI SDK integration: !`grep -r "embed\|embedMany" . --include="*.ts" | head -5`

### RAG Implementation Types

**Basic RAG**: Simple query → retrieve → generate pipeline
**Advanced RAG**: Multi-query, re-ranking, hybrid search, filtering
**Conversational RAG**: Context-aware retrieval with chat history
**Agentic RAG**: Tool-based retrieval with dynamic knowledge access

### Your Task

1. **Analyze current data infrastructure** and vector storage capabilities
2. **Design embedding and chunking strategy** for optimal retrieval
3. **Set up vector database** with proper indexing and search
4. **Implement embedding pipeline** with batch processing
5. **Create retrieval system** with similarity search and ranking
6. **Build RAG generation pipeline** with context injection
7. **Add evaluation metrics** for retrieval and generation quality
8. **Implement comprehensive testing** for all RAG components

### Implementation Requirements

#### Data Processing Pipeline

- Document ingestion and preprocessing
- Intelligent chunking strategies (sentence, semantic, sliding window)
- Metadata extraction and enrichment
- Batch embedding generation with rate limiting
- Deduplication and quality filtering

#### Vector Storage and Search

- Database setup (PostgreSQL + pgvector, Pinecone, Supabase, etc.)
- Proper indexing (HNSW, IVFFlat) for performance
- Similarity search with filtering and ranking
- Hybrid search combining vector and text search
- Metadata filtering and faceted search

#### RAG Generation

- Context selection and ranking
- Prompt engineering for RAG scenarios
- Context window management
- Response grounding and source attribution
- Quality control and relevance scoring

### Expected Deliverables

1. **Document processing pipeline** with chunking and embedding
2. **Vector database setup** with optimized indexing
3. **Retrieval system** with advanced search capabilities
4. **RAG generation API** with streaming support
5. **Evaluation framework** for quality measurement
6. **Admin interface** for content management
7. **Comprehensive documentation** and examples

### Database Schema Design

#### PostgreSQL with pgvector

```sql
-- Enable vector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Documents table
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    content TEXT NOT NULL,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Chunks table
CREATE TABLE document_chunks (
    id SERIAL PRIMARY KEY,
    document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
    content TEXT NOT NULL,
    chunk_index INTEGER,
    metadata JSONB,
    embedding VECTOR(1536),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes for performance
CREATE INDEX ON document_chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON document_chunks (document_id);
CREATE INDEX ON documents USING gin (metadata);
```

#### Drizzle ORM Schema

```typescript
export const documents = pgTable('documents', {
  id: serial('id').primaryKey(),
  title: varchar('title', { length: 255 }),
  content: text('content').notNull(),
  metadata: jsonb('metadata'),
  createdAt: timestamp('created_at').defaultNow(),
  updatedAt: timestamp('updated_at').defaultNow(),
});

export const documentChunks = pgTable(
  'document_chunks',
  {
    id: serial('id').primaryKey(),
    documentId: integer('document_id').references(() => documents.id, {
      onDelete: 'cascade',
    }),
    content: text('content').notNull(),
    chunkIndex: integer('chunk_index'),
    metadata: jsonb('metadata'),
    embedding: vector('embedding', { dimensions: 1536 }),
    createdAt: timestamp('created_at').defaultNow(),
  },
  (table) => ({
    embeddingIndex: index('embedding_idx').using(
      'hnsw',
      table.embedding.op('vector_cosine_ops'),
    ),
    documentIdIndex: index('document_id_idx').on(table.documentId),
  }),
);
```

### Embedding Strategy

#### Chunking Algorithms

- **Sentence-based**: Split on sentence boundaries for coherent chunks
- **Semantic**: Use NLP models to identify semantic boundaries
- **Sliding window**: Overlapping chunks to preserve context
- **Recursive**: Hierarchical chunking for different granularities

#### Model Selection

- **OpenAI**: text-embedding-3-small/large for versatility
- **Cohere**: embed-english-v3.0 for specialized domains
- **Local models**: Sentence-transformers for privacy/cost
- **Multilingual**: Support for multiple languages

### Advanced RAG Patterns

#### Multi-Query RAG

```typescript
async function multiQueryRAG(userQuery: string) {
  // Generate multiple query variants
  const queryVariants = await generateQueryVariants(userQuery);
  
  // Retrieve for each variant
  const retrievalResults = await Promise.all(
    queryVariants.map(query => retrieveDocuments(query))
  );
  
  // Combine and re-rank results
  const combinedResults = combineAndRerankResults(retrievalResults);
  
  return combinedResults;
}
```

#### Conversational RAG

```typescript
async function conversationalRAG(messages: Message[], query: string) {
  // Extract conversation context
  const conversationContext = extractContext(messages);
  
  // Generate context-aware query
  const contextualQuery = await generateContextualQuery(query, conversationContext);
  
  // Retrieve with conversation awareness
  const documents = await retrieveWithContext(contextualQuery, conversationContext);
  
  return documents;
}
```

### Quality Evaluation

#### Retrieval Metrics

- **Precision@K**: Relevant documents in top-K results
- **Recall@K**: Coverage of relevant documents
- **MRR**: Mean Reciprocal Rank of first relevant document
- **NDCG**: Normalized Discounted Cumulative Gain

#### Generation Metrics

- **Faithfulness**: Response grounded in retrieved context
- **Relevance**: Response relevance to user query
- **Completeness**: Coverage of important information
- **Coherence**: Logical flow and readability

### Testing and Validation

#### Unit Testing

- Embedding generation accuracy
- Chunking algorithm correctness
- Similarity search precision
- Database operations integrity

#### Integration Testing

- End-to-end RAG pipeline
- Performance under load
- Quality with various document types
- Scalability testing

#### Evaluation Testing

- Golden dataset evaluation
- A/B testing with different strategies
- User feedback collection
- Continuous quality monitoring

### Performance Optimization

#### Database Optimization

- Proper indexing strategies (HNSW vs IVFFlat)
- Connection pooling and caching
- Query optimization and profiling
- Horizontal scaling considerations

#### Embedding Optimization

- Batch processing for efficiency
- Caching frequently used embeddings
- Model quantization for speed
- Parallel processing pipelines

Focus on building a production-ready RAG system that provides accurate, relevant, and fast retrieval-augmented generation with proper evaluation and optimization strategies.