leopost/.planning/research/ARCHITECTURE.md

# Architecture Patterns

**Domain:** AI-powered Social Media Management SaaS
**Researched:** 2026-01-31
**Confidence:** HIGH

## Executive Summary

AI-powered social media management platforms in 2026 are built on **headless, microservices-based architectures** that decouple frontend experiences from backend logic. The dominant pattern is a **chat-first interface** with real-time bidirectional communication (WebSockets/SSE), orchestrating multiple specialized backend services: AI provider abstraction layer, social media API gateway, background job queue, persistent user context store, and multi-tenant data isolation.

For Leopost specifically, the recommended architecture follows a **modular monolith transitioning to microservices** approach, prioritizing rapid iteration in early phases while maintaining clear component boundaries for future scaling.

---

## Recommended Architecture

### High-Level System Diagram

```
┌─────────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                                  │
├─────────────────────────────────────────────────────────────────────┤
│  Web App (Next.js)    │  Telegram Bot  │  WhatsApp Bot (future)    │
│  - Chat Interface     │  - Webhook     │  - Twilio/Cloud API       │
│  - Real-time Updates  │  - Commands    │  - Message Forwarding     │
└──────────────┬────────┴────────┬───────┴────────────────────────────┘
               │                 │
               │  WebSocket/SSE  │  HTTPS Webhook
               │                 │
┌──────────────▼─────────────────▼────────────────────────────────────┐
│                        API GATEWAY                                   │
│  - Authentication (JWT)                                              │
│  - Rate Limiting (per tenant)                                        │
│  - Request Routing                                                   │
│  - WebSocket Connection Manager                                      │
└──────────────┬──────────────────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     BACKEND SERVICES                                 │
├─────────────────┬───────────────┬──────────────┬────────────────────┤
│ Chat Service    │ AI Orchestrator│ Social API  │ Job Queue          │
│                 │                │  Gateway     │  Service           │
│ - Message       │ - Provider     │              │                    │
│   handling      │   routing      │ - Meta       │ - Scheduling       │
│ - Context       │ - Streaming    │ - LinkedIn   │ - Publishing       │
│   injection     │ - Retry logic  │ - X/Twitter  │ - Retries          │
│ - SSE emit      │                │ - Rate mgmt  │ - Analytics sync   │
│                 │ ┌──────────┐   │              │                    │
│                 │ │ OpenAI   │   │              │                    │
│                 │ │ Anthropic│   │              │                    │
│                 │ │ Google   │   │              │                    │
│                 │ └──────────┘   │              │                    │
└─────────────────┴───────────────┴──────────────┴────────────────────┘
               │                 │                │
               ▼                 ▼                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        DATA LAYER                                    │
├─────────────────┬───────────────┬──────────────┬────────────────────┤
│ PostgreSQL      │ Redis         │ S3/Storage   │ Vector DB          │
│                 │               │              │  (future)          │
│ - Users/Tenants │ - Sessions    │ - Generated  │                    │
│ - Posts         │ - Job Queue   │   images     │ - User context     │
│ - Social Auth   │ - Cache       │ - Uploads    │   embeddings       │
│ - Analytics     │ - Rate limits │              │ - Semantic search  │
│                 │               │              │                    │
│ Multi-tenant:   │               │              │                    │
│ tenant_id on    │               │              │                    │
│ all tables      │               │              │                    │
└─────────────────┴───────────────┴──────────────┴────────────────────┘
```

---

## Component Boundaries

### 1. **API Gateway / Authentication Layer**

**Responsibility:**
- Single entry point for all client requests (web, Telegram, WhatsApp)
- JWT-based authentication and tenant identification
- Rate limiting per tenant/user
- WebSocket connection lifecycle management
- Request routing to appropriate backend service

**Communicates With:**
- **Inbound:** Web app (Next.js), Telegram Bot webhook, WhatsApp Bot webhook
- **Outbound:** Chat Service, AI Orchestrator, Social API Gateway, Job Queue Service
- **Data:** Redis (session cache, rate limiting counters)

**Technology Recommendation:**
- Next.js API routes for initial implementation (monolith)
- Future: Nginx/Kong API Gateway or AWS API Gateway (microservices transition)

**Build Order Implication:**
- Build FIRST (Phase 1) - MVP needs basic auth + routing

---

### 2. **Chat Service**

**Responsibility:**
- Handle incoming user messages from web/Telegram/WhatsApp
- Retrieve user context (brand info, preferences, conversation history)
- Inject context into AI prompt
- Orchestrate AI response streaming
- Emit real-time updates via WebSocket/SSE
- Store conversation history

**Communicates With:**
- **Inbound:** API Gateway (user messages)
- **Outbound:** AI Orchestrator (prompt + context), PostgreSQL (context retrieval/storage), Redis (session state)
- **Streams:** Real-time responses to connected clients via SSE/WebSocket

**Technology Recommendation:**
- Node.js/Express or Next.js API routes
- Socket.io for WebSocket management OR SSE for simpler one-way streaming
- LangChain/LangGraph for conversation chain management

**Build Order Implication:**
- Build SECOND (Phase 1) - Core product experience depends on this

---

### 3. **AI Orchestrator**

**Responsibility:**
- Abstract multiple AI providers (OpenAI, Anthropic, Google) behind unified interface
- Provider routing based on task type (text generation, image analysis, etc.)
- Streaming response handling (SSE from AI → SSE to client)
- Retry logic and fallback to alternative providers
- Cost tracking per provider/tenant
- Token counting and budget enforcement

**Communicates With:**
- **Inbound:** Chat Service, Job Queue Service (for scheduled AI tasks)
- **Outbound:** OpenAI API, Anthropic API, Google Gemini API
- **Data:** Redis (cache frequent prompts), PostgreSQL (usage logs)

**Technology Recommendation:**
- LiteLLM (multi-provider abstraction library) or custom adapter pattern
- OpenAI SDK, Anthropic SDK, Google AI SDK
- Implement **Multi-Provider Gateway pattern** (single unified API)

**Architecture Pattern:**
```javascript
// Unified interface
interface AIProvider {
  generateText(prompt: string, options: GenerationOptions): Promise<Stream>
  generateImage(prompt: string): Promise<ImageURL>
}

// Provider implementations
class OpenAIProvider implements AIProvider { ... }
class AnthropicProvider implements AIProvider { ... }
class GoogleProvider implements AIProvider { ... }

// Router selects provider based on task/cost/availability
class AIRouter {
  selectProvider(task: AITask): AIProvider
}
```

**Build Order Implication:**
- Build SECOND (Phase 1) - Can start with single provider (OpenAI), add multi-provider in Phase 2

---

### 4. **Social API Gateway**

**Responsibility:**
- Centralize authentication with social platforms (OAuth 2.0 flows)
- Abstract platform-specific APIs (Meta Graph API, LinkedIn API, X API) behind unified interface
- Normalize data formats across platforms (posts, analytics, media)
- Rate limiting per platform (respects API quotas)
- Retry logic with exponential backoff
- Credential storage and refresh token management

**Communicates With:**
- **Inbound:** Chat Service (publish request), Job Queue Service (scheduled posts, analytics sync)
- **Outbound:** Facebook/Instagram Graph API, LinkedIn API, X/Twitter API
- **Data:** PostgreSQL (social account credentials, encrypted tokens), Redis (rate limit tracking)

**Technology Recommendation:**
- Consider unified API platforms: **Outstand**, **Sociality.io**, **Ayrshare** (reduces integration complexity)
- Alternative: Custom adapter pattern with individual SDKs
- OAuth library: Passport.js or NextAuth.js

**Architecture Pattern:**
```javascript
// Unified social media interface
interface SocialPlatform {
  publish(post: UnifiedPost): Promise<PublishResult>
  getAnalytics(postId: string): Promise<Analytics>
  schedulePost(post: UnifiedPost, date: Date): Promise<ScheduleResult>
}

// Platform-specific implementations
class MetaAdapter implements SocialPlatform { ... }
class LinkedInAdapter implements SocialPlatform { ... }
class XAdapter implements SocialPlatform { ... }

// Unified post format
interface UnifiedPost {
  text: string
  media?: MediaFile[]
  platforms: Platform[]
  scheduledTime?: Date
}
```

**Build Order Implication:**
- Build THIRD (Phase 2) - Not needed for MVP chat experience, add once publishing is prioritized

---

### 5. **Job Queue Service**

**Responsibility:**
- Schedule posts for future publishing (cron-based or specific time)
- Background analytics sync from social platforms
- Retry failed publish attempts (exponential backoff)
- Image generation queue (async processing)
- Bulk operations (multi-platform publishing)
- Email notifications (scheduled, event-triggered)

**Communicates With:**
- **Inbound:** Chat Service (enqueue publish), Social API Gateway (enqueue analytics sync)
- **Outbound:** Social API Gateway (execute publish), AI Orchestrator (image generation), Email Service
- **Data:** Redis (job queue storage), PostgreSQL (job history, status)

**Technology Recommendation:**
- **BullMQ** (most popular, Redis-backed, excellent for Node.js)
- Alternative: **Trigger.dev** (managed service, no infra), **Inngest** (event-driven, no queue setup)
- Avoid: Temporal (overkill for this use case), Bee-Queue (less feature-rich)

**Architecture Pattern:**
```javascript
// Job types
enum JobType {
  PUBLISH_POST = 'publish_post',
  SYNC_ANALYTICS = 'sync_analytics',
  GENERATE_IMAGE = 'generate_image',
  SEND_NOTIFICATION = 'send_notification'
}

// Job queue interface
queue.add(JobType.PUBLISH_POST, {
  tenantId: '...',
  postId: '...',
  platforms: ['facebook', 'linkedin'],
  scheduledTime: '2026-02-01T10:00:00Z'
}, {
  delay: calculateDelay(scheduledTime),
  attempts: 3,
  backoff: { type: 'exponential', delay: 2000 }
})
```

**Build Order Implication:**
- Build FOURTH (Phase 2-3) - Essential for scheduling feature, but not for MVP

---

### 6. **User Context Store**

**Responsibility:**
- Store user/tenant-specific information (brand voice, target audience, preferences)
- Persist conversation history for AI context
- Learn from user feedback (thumbs up/down on AI responses)
- Retrieve relevant context for AI prompt injection
- Future: Vector embeddings for semantic search over past posts

**Communicates With:**
- **Inbound:** Chat Service (store/retrieve context), AI Orchestrator (context injection)
- **Outbound:** PostgreSQL (structured context), Vector DB (embeddings - future phase)
- **Data:** PostgreSQL (brand info, user preferences), Redis (session cache)

**Technology Recommendation:**
- PostgreSQL (JSON columns) for structured context (Phase 1-2)
- Future: Pinecone, Qdrant, or Supabase Vector (pgvector) for semantic search (Phase 3+)
- LangChain Memory classes for conversation chain management

**Architecture Pattern:**
```javascript
// Context storage
interface UserContext {
  tenantId: string
  brandInfo: {
    name: string
    voice: string // "Professional", "Casual", "Humorous"
    targetAudience: string
    industry: string
  }
  preferences: {
    defaultPlatforms: Platform[]
    postingSchedule: Schedule
    aiProvider: 'openai' | 'anthropic' | 'google'
  }
  conversationHistory: Message[] // Last N messages
}

// Context retrieval
async function getContextForPrompt(tenantId: string): Promise<string> {
  const context = await db.getUserContext(tenantId)
  return `
    Brand: ${context.brandInfo.name}
    Voice: ${context.brandInfo.voice}
    Target Audience: ${context.brandInfo.targetAudience}
    Recent conversations: ${formatHistory(context.conversationHistory)}
  `
}
```

**Build Order Implication:**
- Build SECOND (Phase 1) - Basic brand info storage needed for MVP
- Extend in Phase 3 with vector search for advanced AI memory

---

### 7. **Image Generation Pipeline**

**Responsibility:**
- Generate images via AI (DALL-E, Midjourney API, Stable Diffusion)
- Process/resize/optimize images for social platform requirements
- Store generated images in cloud storage
- Track generation costs per tenant
- Handle async generation (enqueue job, notify when ready)

**Communicates With:**
- **Inbound:** Chat Service (user request), Job Queue Service (async generation)
- **Outbound:** AI Orchestrator (image model API), S3/Cloud Storage (upload), Chat Service (completion notification)
- **Data:** S3 (image storage), PostgreSQL (image metadata)

**Technology Recommendation:**
- OpenAI DALL-E 3, Stability AI, Midjourney API (via third-party)
- Image processing: Sharp (Node.js), Pillow (Python)
- Storage: AWS S3, Cloudflare R2, or Supabase Storage

**Architecture Pattern:**
```javascript
// Async image generation workflow
async function generateImage(prompt: string, tenantId: string) {
  // Enqueue job
  const jobId = await queue.add('generate_image', {
    prompt,
    tenantId,
    provider: 'openai-dalle3'
  })

  // Job processor (background worker)
  queue.process('generate_image', async (job) => {
    const imageUrl = await aiOrchestrator.generateImage(job.data.prompt)
    const optimizedUrl = await processAndUpload(imageUrl, job.data.tenantId)
    await notifyUser(job.data.tenantId, optimizedUrl)
  })
}
```

**Build Order Implication:**
- Build FIFTH (Phase 3) - Nice-to-have enhancement, not core MVP

---

### 8. **Multi-Tenant Data Isolation**

**Responsibility:**
- Ensure tenant A cannot access tenant B's data
- Apply tenant_id filter to all database queries
- Enforce row-level security (RLS) at database level
- Isolate file storage per tenant (S3 paths)

**Communicates With:**
- **All services** that access PostgreSQL or S3 must enforce tenant isolation

**Technology Recommendation:**
- **Shared Database, Shared Schema (Pool Model)** - Most cost-effective for micro-SaaS
- PostgreSQL Row-Level Security (RLS) for defense-in-depth
- Supabase RLS policies (if using Supabase Cloud)
- Application-level enforcement: Always filter by `tenant_id` in WHERE clauses

**Architecture Pattern:**
```sql
-- PostgreSQL RLS example
CREATE POLICY tenant_isolation ON posts
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application sets tenant context per request
SET LOCAL app.current_tenant = 'tenant-uuid';
SELECT * FROM posts; -- Automatically filtered
```

**Best Practice:**
- Middleware extracts `tenant_id` from JWT at API Gateway
- All downstream services receive `tenant_id` in request context
- Database queries ALWAYS include `WHERE tenant_id = ?`

**Build Order Implication:**
- Build FIRST (Phase 1) - Critical security foundation, implement from day 1

---

## Data Flow

### 1. **User Sends Chat Message**

```
User (Web/Telegram)
  → API Gateway (auth, tenant_id extraction)
  → Chat Service (retrieve context)
  → PostgreSQL (load brand info, conversation history)
  → AI Orchestrator (inject context, call AI provider)
  → OpenAI/Anthropic API (stream response)
  → Chat Service (emit SSE to client)
  → PostgreSQL (store conversation turn)
```

**Key Decisions:**
- Use **SSE** (Server-Sent Events) for one-way AI streaming (simpler than WebSocket)
- Use **WebSocket** if bidirectional communication needed (e.g., typing indicators)

---

### 2. **User Publishes Post to Social Media**

```
User (chat: "Publish this to LinkedIn and Facebook")
  → Chat Service (parse intent)
  → AI Orchestrator (generate post content if needed)
  → Chat Service (return preview to user)
  → User confirms
  → Social API Gateway (authenticate, publish)
  → LinkedIn API + Facebook Graph API (post content)
  → Social API Gateway (return post IDs)
  → PostgreSQL (store post record)
  → Chat Service (notify user: "Published!")
```

**Alternative: Scheduled Publish**
```
User: "Schedule this for tomorrow 10am"
  → Chat Service (parse schedule time)
  → Job Queue Service (enqueue publish job with delay)
  → Redis (store job)
  → [Wait until scheduled time]
  → BullMQ Worker (process job)
  → Social API Gateway (publish)
  → Email Service (notify user of success/failure)
```

---

### 3. **Telegram Bot Message**

```
Telegram Server (webhook POST to /api/telegram/webhook)
  → API Gateway (validate webhook signature)
  → Chat Service (same logic as web chat)
  → AI Orchestrator (generate response)
  → Chat Service (format for Telegram)
  → Telegram Bot API (send message)
```

**Key Decision:**
- Reuse Chat Service for all channels (web, Telegram, WhatsApp)
- Channel-specific adapters only handle message formatting (Markdown vs HTML)

---

### 4. **AI Provider Failover**

```
Chat Service → AI Orchestrator (request: OpenAI GPT-4)
  → OpenAI API (500 error or rate limit)
  → AI Orchestrator (detect failure, retry logic)
  → [Attempt 1 failed]
  → AI Orchestrator (fallback to Anthropic Claude)
  → Anthropic API (success)
  → Return response
```

**Architecture Pattern:**
- Primary provider (cheapest/fastest): OpenAI GPT-4o-mini
- Fallback provider: Anthropic Claude Sonnet
- Last resort: Google Gemini

---

## Patterns to Follow

### Pattern 1: **Multi-Provider Gateway (AI Abstraction)**

**What:** Single unified interface abstracting multiple AI providers (OpenAI, Anthropic, Google).

**When:** When building AI features that need cost optimization, redundancy, or best-model-for-task routing.

**Example:**
```typescript
// libs/ai/provider-gateway.ts
export class AIProviderGateway {
  private providers: Map<ProviderType, AIProvider>

  async generateText(
    prompt: string,
    options: {
      preferredProvider?: ProviderType,
      fallback?: boolean
    }
  ): Promise<string> {
    const provider = this.selectProvider(options.preferredProvider)

    try {
      return await provider.generate(prompt)
    } catch (error) {
      if (options.fallback) {
        const fallbackProvider = this.getNextProvider()
        return await fallbackProvider.generate(prompt)
      }
      throw error
    }
  }

  private selectProvider(preferred?: ProviderType): AIProvider {
    // Cost-based routing: cheap tasks → OpenAI, reasoning → Anthropic
    // Or user preference from tenant settings
  }
}
```

**Benefits:**
- Cost optimization (route simple tasks to cheaper models)
- High availability (auto-failover)
- Easy migration (swap providers without changing application code)

---

### Pattern 2: **Context Injection (User Memory)**

**What:** Retrieve user-specific context (brand info, past conversations) and inject into AI prompts.

**When:** Building personalized AI experiences that "remember" user preferences.

**Example:**
```typescript
// services/chat/context-injector.ts
export class ContextInjector {
  async buildPrompt(userMessage: string, tenantId: string): Promise<string> {
    const context = await this.getUserContext(tenantId)

    const systemPrompt = `
You are a social media assistant for ${context.brandInfo.name}.

Brand Voice: ${context.brandInfo.voice}
Target Audience: ${context.brandInfo.targetAudience}
Industry: ${context.brandInfo.industry}

Recent conversation:
${this.formatConversationHistory(context.conversationHistory)}

User's new message: ${userMessage}

Generate a helpful response that maintains brand voice and leverages past context.
`
    return systemPrompt
  }
}
```

**Benefits:**
- Personalized AI responses
- Consistency across conversations
- Foundation for long-term AI memory

---

### Pattern 3: **Unified Social Media Adapter**

**What:** Abstract platform-specific APIs (Facebook, LinkedIn, X) behind a common interface.

**When:** Integrating multiple social platforms without scattering platform logic across codebase.

**Example:**
```typescript
// libs/social/unified-adapter.ts
export interface SocialPost {
  text: string
  media?: MediaFile[]
  platforms: Platform[]
}

export interface SocialAdapter {
  publish(post: SocialPost): Promise<PublishResult>
  getAnalytics(postId: string): Promise<Analytics>
}

export class MetaAdapter implements SocialAdapter {
  async publish(post: SocialPost): Promise<PublishResult> {
    // Facebook Graph API specific logic
    const response = await fetch('https://graph.facebook.com/v18.0/me/feed', {
      method: 'POST',
      headers: { Authorization: `Bearer ${token}` },
      body: JSON.stringify({ message: post.text })
    })
    return this.normalizeResponse(response)
  }
}

export class SocialMediaGateway {
  private adapters: Map<Platform, SocialAdapter>

  async publishToAll(post: SocialPost): Promise<PublishResult[]> {
    const results = await Promise.allSettled(
      post.platforms.map(platform =>
        this.adapters.get(platform).publish(post)
      )
    )
    return results
  }
}
```

**Benefits:**
- Add new platforms without changing core logic
- Centralized error handling and retry logic
- Easier testing (mock adapters)

---

### Pattern 4: **Background Job Queue (Scheduling)**

**What:** Decouple long-running tasks (scheduled posts, image generation) from synchronous request handling.

**When:** Tasks that take >2 seconds, need retries, or are scheduled for future execution.

**Example:**
```typescript
// services/queue/post-scheduler.ts
import { Queue, Worker } from 'bullmq'

const postQueue = new Queue('social-posts', { connection: redis })

// Enqueue job
export async function schedulePost(
  post: SocialPost,
  scheduledTime: Date,
  tenantId: string
) {
  await postQueue.add('publish', {
    post,
    tenantId
  }, {
    delay: scheduledTime.getTime() - Date.now(),
    attempts: 3,
    backoff: { type: 'exponential', delay: 2000 }
  })
}

// Worker processes jobs
const worker = new Worker('social-posts', async (job) => {
  const { post, tenantId } = job.data

  try {
    const result = await socialGateway.publishToAll(post)
    await db.posts.update({ id: post.id, status: 'published' })
    await notifyUser(tenantId, 'Post published successfully!')
  } catch (error) {
    await notifyUser(tenantId, `Post failed: ${error.message}`)
    throw error // Triggers retry
  }
}, { connection: redis })
```

**Benefits:**
- Reliable delivery (survives server restarts)
- Automatic retries with exponential backoff
- Horizontal scaling (add more worker processes)

---

### Pattern 5: **Tenant Context Middleware**

**What:** Extract tenant_id from JWT at API Gateway, pass to all services, enforce in all database queries.

**When:** Building multi-tenant SaaS with shared database (pool model).

**Example:**
```typescript
// middleware/tenant-context.ts
export function tenantContextMiddleware(req, res, next) {
  // Extract tenant_id from JWT
  const token = req.headers.authorization?.split(' ')[1]
  const decoded = jwt.verify(token, SECRET)

  // Attach to request
  req.tenantId = decoded.tenantId

  // Set PostgreSQL session variable (for RLS)
  await db.query(`SET LOCAL app.current_tenant = '${req.tenantId}'`)

  next()
}

// All queries automatically filtered by RLS
app.get('/api/posts', async (req, res) => {
  // No need to manually filter by tenant_id - RLS does it
  const posts = await db.query('SELECT * FROM posts')
  res.json(posts)
})
```

**Benefits:**
- Zero chance of cross-tenant data leakage
- Defense-in-depth (app + database enforce isolation)
- Simpler query code (no WHERE tenant_id everywhere)

---

## Anti-Patterns to Avoid

### Anti-Pattern 1: **Tight Coupling to AI Provider**

**What goes wrong:** Hardcoding OpenAI SDK calls throughout codebase.

**Why bad:**
- Vendor lock-in (can't switch providers without massive refactor)
- No fallback when provider is down
- Difficult to A/B test different models

**Instead:** Use AI Provider Gateway pattern (see above).

**Warning Signs:**
```typescript
// BAD - OpenAI SDK scattered everywhere
import OpenAI from 'openai'

async function handleChat(message: string) {
  const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY })
  const response = await openai.chat.completions.create({ ... })
  return response.choices[0].message.content
}
```

**Fix:**
```typescript
// GOOD - Abstract provider behind interface
import { aiGateway } from '@/libs/ai/provider-gateway'

async function handleChat(message: string) {
  return await aiGateway.generateText(message, {
    preferredProvider: 'openai',
    fallback: true
  })
}
```

---

### Anti-Pattern 2: **Missing Tenant Isolation**

**What goes wrong:** Forgetting to filter queries by `tenant_id`.

**Why bad:**
- Data leakage between customers (catastrophic security breach)
- Compliance violations (GDPR, SOC2)
- Potential lawsuit and business destruction

**Instead:**
- Use Tenant Context Middleware pattern
- Enable PostgreSQL Row-Level Security
- Write integration tests that verify isolation

**Warning Signs:**
```typescript
// BAD - No tenant filtering
app.get('/api/posts', async (req, res) => {
  const posts = await db.query('SELECT * FROM posts')
  res.json(posts) // Returns ALL tenants' posts!
})
```

**Fix:**
```typescript
// GOOD - Explicit filtering + RLS
app.get('/api/posts', tenantContextMiddleware, async (req, res) => {
  const posts = await db.query(
    'SELECT * FROM posts WHERE tenant_id = $1',
    [req.tenantId]
  )
  res.json(posts)
})
```

---

### Anti-Pattern 3: **Synchronous Long-Running Tasks**

**What goes wrong:** Publishing to social media in synchronous API request.

**Why bad:**
- Request timeouts (platforms can take 5-10 seconds)
- No retry on failure
- Poor user experience (blocked waiting)

**Instead:** Use Background Job Queue pattern.

**Warning Signs:**
```typescript
// BAD - Blocking request until all platforms publish
app.post('/api/publish', async (req, res) => {
  await publishToFacebook(post) // 5 seconds
  await publishToLinkedIn(post) // 3 seconds
  await publishToTwitter(post) // 2 seconds
  res.json({ success: true }) // User waits 10+ seconds!
})
```

**Fix:**
```typescript
// GOOD - Enqueue job, return immediately
app.post('/api/publish', async (req, res) => {
  const jobId = await postQueue.add('publish', { post, tenantId })
  res.json({ jobId, status: 'queued' }) // Returns in <100ms
  // User gets real-time update via WebSocket when done
})
```

---

### Anti-Pattern 4: **No AI Streaming**

**What goes wrong:** Waiting for entire AI response before showing anything to user.

**Why bad:**
- Poor UX (5-10 second blank screen while AI generates)
- Users think app is broken
- Modern AI chat UX expectation is streaming

**Instead:** Stream AI responses token-by-token via SSE.

**Warning Signs:**
```typescript
// BAD - Wait for full response
const completion = await openai.chat.completions.create({ ... })
res.json({ message: completion.choices[0].message.content })
```

**Fix:**
```typescript
// GOOD - Stream tokens as they arrive
const stream = await openai.chat.completions.create({
  stream: true,
  ...
})

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content
  if (content) {
    res.write(`data: ${JSON.stringify({ content })}\n\n`)
  }
}
res.end()
```

---

### Anti-Pattern 5: **Platform-Specific Logic in Core Services**

**What goes wrong:** Chat Service has `if (platform === 'facebook') { ... }` logic.

**Why bad:**
- Core service becomes bloated with platform quirks
- Adding new platforms requires modifying core logic
- Difficult to test

**Instead:** Use Social Media Adapter pattern with platform-specific implementations.

**Warning Signs:**
```typescript
// BAD - Platform logic in core service
async function publishPost(post, platforms) {
  for (const platform of platforms) {
    if (platform === 'facebook') {
      // Facebook-specific logic
    } else if (platform === 'linkedin') {
      // LinkedIn-specific logic
    } // ...grows forever
  }
}
```

**Fix:**
```typescript
// GOOD - Adapters encapsulate platform logic
const adapters = {
  facebook: new MetaAdapter(),
  linkedin: new LinkedInAdapter()
}

async function publishPost(post, platforms) {
  return Promise.all(
    platforms.map(p => adapters[p].publish(post))
  )
}
```

---

## Scalability Considerations

| Concern | At 100 users | At 10K users | At 100K users |
|---------|--------------|--------------|---------------|
| **Database** | Single Postgres (Supabase free tier) | Postgres with read replicas | Connection pooling (PgBouncer), read replicas, partitioning |
| **AI API Costs** | $50-200/month | $2K-5K/month (need cost controls) | $20K+/month (cache frequent responses, use cheaper models for simple tasks) |
| **WebSocket Connections** | Single Node.js server | 2-3 servers with sticky sessions | Redis Pub/Sub for cross-server messaging, dedicated WebSocket servers |
| **Job Queue** | Single BullMQ worker | 3-5 workers (horizontal scaling) | Worker autoscaling based on queue depth |
| **File Storage** | Supabase Storage (free tier) | S3/R2 with CDN | CDN + image optimization pipeline |
| **Rate Limiting** | In-memory (simple) | Redis-backed rate limiting | Distributed rate limiting with sliding window |
| **Social API Quotas** | Single app credentials | Per-user OAuth (distributes quota) | Enterprise API access + request batching |

### Key Scaling Milestones

**Phase 1 (MVP, 0-100 users):**
- Monolithic Next.js app
- Supabase Cloud (free tier)
- Single AI provider (OpenAI)
- No job queue (simple setTimeout)

**Phase 2 (1K users):**
- Separate BullMQ worker process
- Multi-AI provider support
- Redis for session/cache
- Social API integration (1-2 platforms)

**Phase 3 (10K users):**
- Horizontal scaling (2-3 servers)
- Read replicas for database
- CDN for static assets
- Advanced AI memory (vector search)

**Phase 4 (100K+ users):**
- Microservices architecture
- Dedicated WebSocket servers
- Database partitioning by tenant
- Enterprise social API access

---

## Build Order and Dependencies

### Dependency Graph

```
Phase 1 (Core Chat Experience):
  1. Multi-tenant Auth + API Gateway ───┐
  2. User Context Store (basic)        │
  3. Chat Service                      │
  4. AI Orchestrator (single provider) │
  └──> MVP: Web chat with AI responses

Phase 2 (Social Publishing):
  5. Social API Gateway (1 platform - LinkedIn)
  6. Job Queue Service (BullMQ)
  7. AI Orchestrator (multi-provider)
  └──> Feature: Schedule + publish posts

Phase 3 (Multi-Channel):
  8. Telegram Bot integration
  9. Image Generation Pipeline
  10. Advanced User Context (vector DB)
  └──> Feature: Cross-channel, rich media

Phase 4 (Scale + Polish):
  11. WhatsApp Bot integration
  12. Analytics dashboard
  13. Performance optimizations
  └──> Production-ready SaaS
```

### Critical Path

**Must Build First:**
1. Multi-tenant authentication (foundation for all tenant isolation)
2. API Gateway (single entry point, tenant context middleware)
3. User Context Store (AI needs brand info to personalize responses)
4. Chat Service (core product experience)

**Can Build Later:**
- Job Queue (use simple setTimeout for MVP scheduling)
- Image Generation (nice-to-have, not core)
- Multiple social platforms (start with 1, add more incrementally)

### Parallel Tracks

**Track A (Chat Experience):**
- Auth → Context Store → Chat Service → AI Orchestrator

**Track B (Publishing):**
- Social API Gateway → Job Queue

**Can develop independently, integrate when both ready.**

---

## Technology Stack Recommendations

| Component | Recommended | Alternative | Why |
|-----------|-------------|-------------|-----|
| **Frontend** | Next.js 14+ (App Router) | Remix, SvelteKit | Best DX for React SSR, API routes, built-in streaming |
| **Backend** | Next.js API Routes (Phase 1) → Microservices (Phase 3+) | Express, Fastify | Start monolith, extract services later |
| **Database** | PostgreSQL (Supabase Cloud) | Neon, AWS RDS | Built-in auth, storage, RLS, real-time subscriptions |
| **Cache/Queue** | Redis (Upstash for serverless) | Valkey, KeyDB | Standard for session, cache, BullMQ backend |
| **Job Queue** | BullMQ | Trigger.dev, Inngest | Most mature Node.js queue, Redis-backed |
| **AI Providers** | OpenAI (primary), Anthropic (fallback) | Google Gemini, Groq | OpenAI most reliable, Anthropic best reasoning |
| **AI Abstraction** | LiteLLM or custom gateway | LangChain | LiteLLM simpler for multi-provider, LangChain for complex chains |
| **Social APIs** | Outstand (unified API) | Ayrshare, Sociality.io | Reduces integration complexity, faster iteration |
| **WebSocket** | Server-Sent Events (SSE) | Socket.io, Pusher | Simpler for one-way streaming, built into HTTP |
| **File Storage** | Supabase Storage or Cloudflare R2 | AWS S3, Vercel Blob | R2 zero egress fees, Supabase integrated with DB |
| **Vector DB** | Supabase pgvector | Pinecone, Qdrant | Same database as core data, simpler architecture |
| **Image Processing** | Sharp (Node.js) | Jimp, ImageMagick | Fastest, native performance |
| **Auth** | NextAuth.js v5 or Supabase Auth | Clerk, Auth0 | NextAuth flexible, Supabase integrated |
| **Deployment** | Vercel (frontend) + VPS (workers) | Railway, Fly.io | Vercel best Next.js DX, VPS for background workers |

---

## Sources

### AI Social Media Management Architecture
- [AI-Powered Social Media Management in 2026](https://www.socialnewsdesk.com/blog/ai-powered-social-media-management-in-2026/)
- [How to Make a Social Media App in 2026](https://tech-stack.com/blog/how-to-make-a-social-media-app-complete-guide-for-2025/)
- [Best AI Tools for Social Media in 2026](https://www.supergrow.ai/blog/ai-tools-social-media)
- [Scaling Social Media in 2026](https://www.prnewswire.com/news-releases/scaling-social-media-in-2026-why-cloud-based-mobile-automation-is-the-next-big-leap-302674955.html)

### Headless Architecture
- [Headless CMS: A game-changer for social media](https://www.contentstack.com/blog/all-about-headless/headless-cms-a-game-changer-for-social-media)
- [Headless Web Development Guide 2026](https://blog.singsys.com/web-development-guide-2026/)
- [What are SaaS Headless Tools?](https://payproglobal.com/answers/what-are-saas-headless-tools/)

### Multi-AI Provider Integration
- [Google's Eight Essential Multi-Agent Design Patterns](https://www.infoq.com/news/2026/01/multi-agent-design-patterns/)
- [AI Agent Orchestration Patterns - Azure](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns)
- [MCP Gateways: Developer's Guide](https://composio.dev/blog/mcp-gateways-guide)
- [Multi-Provider Generative AI Gateway on AWS](https://aws-solutions-library-samples.github.io/ai-ml/guidance-for-multi-provider-generative-ai-gateway-on-aws.html)

### Real-Time Chat Architecture
- [Building Real-Time AI Chat Infrastructure](https://render.com/articles/real-time-ai-chat-websockets-infrastructure)
- [WebSocket Architecture Best Practices](https://ably.com/topic/websocket-architecture-best-practices)
- [Advanced Backend System Architecture and Scaling WebSockets](https://medium.com/@ankitjaat24u/advanced-backend-system-architecture-and-scaling-websockets-f2a5637ca1ab)

### Social Media API Integration
- [Top 10 Unified Social Media APIs for Developers in 2026](https://www.outstand.so/blog/best-unified-social-media-apis-for-devs)
- [Social Media APIs - API7.ai](https://api7.ai/learning-center/api-101/social-media-apis)
- [Social Media API Integration Services](https://www.planeks.net/api-integration/social-media/)

### Background Job Queues
- [BullMQ - Background Jobs for NodeJS](https://bullmq.io/)
- [Building a Job Queue System with Node.js](https://neon.com/guides/nodejs-queue-system)
- [Trigger.dev - AI Workflows](https://trigger.dev)
- [No workers necessary - Simple background jobs](https://www.inngest.com/blog/no-workers-necessary-nodejs-express)

### User Context & Memory
- [Building a persistent conversational AI chatbot](https://temporal.io/blog/building-a-persistent-conversational-ai-chatbot-with-temporal)
- [Context Engineering for Personalization - OpenAI](https://cookbook.openai.com/examples/agents_sdk/context_personalization)
- [AI Trend 2026: Context Replaces Compute](https://fourweekmba.com/ai-trend-2026-context-replaces-compute-as-the-new-bottleneck/)
- [Why AI Agents Need Persistent Memory](https://memmachine.ai/blog/2025/09/beyond-the-chatbot-why-ai-agents-need-persistent-memory/)

### Multi-Tenant Architecture
- [Designing Multi-tenant SaaS Architecture on AWS 2026](https://www.clickittech.com/software-development/multi-tenant-architecture/)
- [Multi-Tenant Database Architecture Patterns](https://www.bytebase.com/blog/multi-tenant-database-architecture-patterns-explained/)
- [How to Design a Multi-Tenant SaaS Architecture](https://clerk.com/blog/how-to-design-multitenant-saas-architecture)
- [Architecting Secure Multi-Tenant Data Isolation](https://medium.com/@justhamade/architecting-secure-multi-tenant-data-isolation-d8f36cb0d25e)

### AI Image Generation
- [Complete Guide to AI Image Generation APIs in 2026](https://wavespeed.ai/blog/posts/complete-guide-ai-image-apis-2026/)
- [GLM-Image: Auto-regressive Image Generation](https://z.ai/blog/glm-image)
- [Best AI Image Generators in 2026](https://www.template.net/business/best-ai-image-generators-in-2026/)

### Telegram/WhatsApp Integration
- [Integrate WhatsApp and Telegram - BuildShip](https://buildship.com/integrations/apps/whatsapp-and-telegram)
- [Telegram Bot + WhatsApp - Pipedream](https://pipedream.com/apps/whatsapp-business/integrations/telegram-bot-api)
- [Telegram APIs Official](https://core.telegram.org)