Files
leopost/.planning/research/ARCHITECTURE.md
Michele dc3ea1cf58 docs: complete domain research
Research dimensions:
- STACK.md: Technology stack recommendations (Next.js 15, Supabase, Vercel AI SDK, BullMQ)
- FEATURES.md: Feature landscape analysis (table stakes vs differentiators)
- ARCHITECTURE.md: System architecture design (headless, multi-tenant, job queue)
- PITFALLS.md: Common mistakes to avoid (rate limits, AI slop, cost control)
- SUMMARY.md: Synthesized findings with roadmap implications

Key findings:
- Stack: Next.js 15 + Supabase Cloud + Vercel AI SDK (multi-provider)
- Architecture: Modular monolith → microservices, headless pattern
- Critical pitfall: API rate limits (Meta reduced by 96%), AI cost explosion

Phase recommendations:
1. Core Scheduling Foundation (6-8 weeks)
2. Reliability & Differentiation (4-6 weeks)
3. Advanced Innovation (8-12 weeks)
4. Scale & Polish (ongoing)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 02:08:10 +01:00

39 KiB

Architecture Patterns

Domain: AI-powered Social Media Management SaaS Researched: 2026-01-31 Confidence: HIGH

Executive Summary

AI-powered social media management platforms in 2026 are built on headless, microservices-based architectures that decouple frontend experiences from backend logic. The dominant pattern is a chat-first interface with real-time bidirectional communication (WebSockets/SSE), orchestrating multiple specialized backend services: AI provider abstraction layer, social media API gateway, background job queue, persistent user context store, and multi-tenant data isolation.

For Leopost specifically, the recommended architecture follows a modular monolith transitioning to microservices approach, prioritizing rapid iteration in early phases while maintaining clear component boundaries for future scaling.


High-Level System Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                                  │
├─────────────────────────────────────────────────────────────────────┤
│  Web App (Next.js)    │  Telegram Bot  │  WhatsApp Bot (future)    │
│  - Chat Interface     │  - Webhook     │  - Twilio/Cloud API       │
│  - Real-time Updates  │  - Commands    │  - Message Forwarding     │
└──────────────┬────────┴────────┬───────┴────────────────────────────┘
               │                 │
               │  WebSocket/SSE  │  HTTPS Webhook
               │                 │
┌──────────────▼─────────────────▼────────────────────────────────────┐
│                        API GATEWAY                                   │
│  - Authentication (JWT)                                              │
│  - Rate Limiting (per tenant)                                        │
│  - Request Routing                                                   │
│  - WebSocket Connection Manager                                      │
└──────────────┬──────────────────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     BACKEND SERVICES                                 │
├─────────────────┬───────────────┬──────────────┬────────────────────┤
│ Chat Service    │ AI Orchestrator│ Social API  │ Job Queue          │
│                 │                │  Gateway     │  Service           │
│ - Message       │ - Provider     │              │                    │
│   handling      │   routing      │ - Meta       │ - Scheduling       │
│ - Context       │ - Streaming    │ - LinkedIn   │ - Publishing       │
│   injection     │ - Retry logic  │ - X/Twitter  │ - Retries          │
│ - SSE emit      │                │ - Rate mgmt  │ - Analytics sync   │
│                 │ ┌──────────┐   │              │                    │
│                 │ │ OpenAI   │   │              │                    │
│                 │ │ Anthropic│   │              │                    │
│                 │ │ Google   │   │              │                    │
│                 │ └──────────┘   │              │                    │
└─────────────────┴───────────────┴──────────────┴────────────────────┘
               │                 │                │
               ▼                 ▼                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        DATA LAYER                                    │
├─────────────────┬───────────────┬──────────────┬────────────────────┤
│ PostgreSQL      │ Redis         │ S3/Storage   │ Vector DB          │
│                 │               │              │  (future)          │
│ - Users/Tenants │ - Sessions    │ - Generated  │                    │
│ - Posts         │ - Job Queue   │   images     │ - User context     │
│ - Social Auth   │ - Cache       │ - Uploads    │   embeddings       │
│ - Analytics     │ - Rate limits │              │ - Semantic search  │
│                 │               │              │                    │
│ Multi-tenant:   │               │              │                    │
│ tenant_id on    │               │              │                    │
│ all tables      │               │              │                    │
└─────────────────┴───────────────┴──────────────┴────────────────────┘

Component Boundaries

1. API Gateway / Authentication Layer

Responsibility:

  • Single entry point for all client requests (web, Telegram, WhatsApp)
  • JWT-based authentication and tenant identification
  • Rate limiting per tenant/user
  • WebSocket connection lifecycle management
  • Request routing to appropriate backend service

Communicates With:

  • Inbound: Web app (Next.js), Telegram Bot webhook, WhatsApp Bot webhook
  • Outbound: Chat Service, AI Orchestrator, Social API Gateway, Job Queue Service
  • Data: Redis (session cache, rate limiting counters)

Technology Recommendation:

  • Next.js API routes for initial implementation (monolith)
  • Future: Nginx/Kong API Gateway or AWS API Gateway (microservices transition)

Build Order Implication:

  • Build FIRST (Phase 1) - MVP needs basic auth + routing

2. Chat Service

Responsibility:

  • Handle incoming user messages from web/Telegram/WhatsApp
  • Retrieve user context (brand info, preferences, conversation history)
  • Inject context into AI prompt
  • Orchestrate AI response streaming
  • Emit real-time updates via WebSocket/SSE
  • Store conversation history

Communicates With:

  • Inbound: API Gateway (user messages)
  • Outbound: AI Orchestrator (prompt + context), PostgreSQL (context retrieval/storage), Redis (session state)
  • Streams: Real-time responses to connected clients via SSE/WebSocket

Technology Recommendation:

  • Node.js/Express or Next.js API routes
  • Socket.io for WebSocket management OR SSE for simpler one-way streaming
  • LangChain/LangGraph for conversation chain management

Build Order Implication:

  • Build SECOND (Phase 1) - Core product experience depends on this

3. AI Orchestrator

Responsibility:

  • Abstract multiple AI providers (OpenAI, Anthropic, Google) behind unified interface
  • Provider routing based on task type (text generation, image analysis, etc.)
  • Streaming response handling (SSE from AI → SSE to client)
  • Retry logic and fallback to alternative providers
  • Cost tracking per provider/tenant
  • Token counting and budget enforcement

Communicates With:

  • Inbound: Chat Service, Job Queue Service (for scheduled AI tasks)
  • Outbound: OpenAI API, Anthropic API, Google Gemini API
  • Data: Redis (cache frequent prompts), PostgreSQL (usage logs)

Technology Recommendation:

  • LiteLLM (multi-provider abstraction library) or custom adapter pattern
  • OpenAI SDK, Anthropic SDK, Google AI SDK
  • Implement Multi-Provider Gateway pattern (single unified API)

Architecture Pattern:

// Unified interface
interface AIProvider {
  generateText(prompt: string, options: GenerationOptions): Promise<Stream>
  generateImage(prompt: string): Promise<ImageURL>
}

// Provider implementations
class OpenAIProvider implements AIProvider { ... }
class AnthropicProvider implements AIProvider { ... }
class GoogleProvider implements AIProvider { ... }

// Router selects provider based on task/cost/availability
class AIRouter {
  selectProvider(task: AITask): AIProvider
}

Build Order Implication:

  • Build SECOND (Phase 1) - Can start with single provider (OpenAI), add multi-provider in Phase 2

4. Social API Gateway

Responsibility:

  • Centralize authentication with social platforms (OAuth 2.0 flows)
  • Abstract platform-specific APIs (Meta Graph API, LinkedIn API, X API) behind unified interface
  • Normalize data formats across platforms (posts, analytics, media)
  • Rate limiting per platform (respects API quotas)
  • Retry logic with exponential backoff
  • Credential storage and refresh token management

Communicates With:

  • Inbound: Chat Service (publish request), Job Queue Service (scheduled posts, analytics sync)
  • Outbound: Facebook/Instagram Graph API, LinkedIn API, X/Twitter API
  • Data: PostgreSQL (social account credentials, encrypted tokens), Redis (rate limit tracking)

Technology Recommendation:

  • Consider unified API platforms: Outstand, Sociality.io, Ayrshare (reduces integration complexity)
  • Alternative: Custom adapter pattern with individual SDKs
  • OAuth library: Passport.js or NextAuth.js

Architecture Pattern:

// Unified social media interface
interface SocialPlatform {
  publish(post: UnifiedPost): Promise<PublishResult>
  getAnalytics(postId: string): Promise<Analytics>
  schedulePost(post: UnifiedPost, date: Date): Promise<ScheduleResult>
}

// Platform-specific implementations
class MetaAdapter implements SocialPlatform { ... }
class LinkedInAdapter implements SocialPlatform { ... }
class XAdapter implements SocialPlatform { ... }

// Unified post format
interface UnifiedPost {
  text: string
  media?: MediaFile[]
  platforms: Platform[]
  scheduledTime?: Date
}

Build Order Implication:

  • Build THIRD (Phase 2) - Not needed for MVP chat experience, add once publishing is prioritized

5. Job Queue Service

Responsibility:

  • Schedule posts for future publishing (cron-based or specific time)
  • Background analytics sync from social platforms
  • Retry failed publish attempts (exponential backoff)
  • Image generation queue (async processing)
  • Bulk operations (multi-platform publishing)
  • Email notifications (scheduled, event-triggered)

Communicates With:

  • Inbound: Chat Service (enqueue publish), Social API Gateway (enqueue analytics sync)
  • Outbound: Social API Gateway (execute publish), AI Orchestrator (image generation), Email Service
  • Data: Redis (job queue storage), PostgreSQL (job history, status)

Technology Recommendation:

  • BullMQ (most popular, Redis-backed, excellent for Node.js)
  • Alternative: Trigger.dev (managed service, no infra), Inngest (event-driven, no queue setup)
  • Avoid: Temporal (overkill for this use case), Bee-Queue (less feature-rich)

Architecture Pattern:

// Job types
enum JobType {
  PUBLISH_POST = 'publish_post',
  SYNC_ANALYTICS = 'sync_analytics',
  GENERATE_IMAGE = 'generate_image',
  SEND_NOTIFICATION = 'send_notification'
}

// Job queue interface
queue.add(JobType.PUBLISH_POST, {
  tenantId: '...',
  postId: '...',
  platforms: ['facebook', 'linkedin'],
  scheduledTime: '2026-02-01T10:00:00Z'
}, {
  delay: calculateDelay(scheduledTime),
  attempts: 3,
  backoff: { type: 'exponential', delay: 2000 }
})

Build Order Implication:

  • Build FOURTH (Phase 2-3) - Essential for scheduling feature, but not for MVP

6. User Context Store

Responsibility:

  • Store user/tenant-specific information (brand voice, target audience, preferences)
  • Persist conversation history for AI context
  • Learn from user feedback (thumbs up/down on AI responses)
  • Retrieve relevant context for AI prompt injection
  • Future: Vector embeddings for semantic search over past posts

Communicates With:

  • Inbound: Chat Service (store/retrieve context), AI Orchestrator (context injection)
  • Outbound: PostgreSQL (structured context), Vector DB (embeddings - future phase)
  • Data: PostgreSQL (brand info, user preferences), Redis (session cache)

Technology Recommendation:

  • PostgreSQL (JSON columns) for structured context (Phase 1-2)
  • Future: Pinecone, Qdrant, or Supabase Vector (pgvector) for semantic search (Phase 3+)
  • LangChain Memory classes for conversation chain management

Architecture Pattern:

// Context storage
interface UserContext {
  tenantId: string
  brandInfo: {
    name: string
    voice: string // "Professional", "Casual", "Humorous"
    targetAudience: string
    industry: string
  }
  preferences: {
    defaultPlatforms: Platform[]
    postingSchedule: Schedule
    aiProvider: 'openai' | 'anthropic' | 'google'
  }
  conversationHistory: Message[] // Last N messages
}

// Context retrieval
async function getContextForPrompt(tenantId: string): Promise<string> {
  const context = await db.getUserContext(tenantId)
  return `
    Brand: ${context.brandInfo.name}
    Voice: ${context.brandInfo.voice}
    Target Audience: ${context.brandInfo.targetAudience}
    Recent conversations: ${formatHistory(context.conversationHistory)}
  `
}

Build Order Implication:

  • Build SECOND (Phase 1) - Basic brand info storage needed for MVP
  • Extend in Phase 3 with vector search for advanced AI memory

7. Image Generation Pipeline

Responsibility:

  • Generate images via AI (DALL-E, Midjourney API, Stable Diffusion)
  • Process/resize/optimize images for social platform requirements
  • Store generated images in cloud storage
  • Track generation costs per tenant
  • Handle async generation (enqueue job, notify when ready)

Communicates With:

  • Inbound: Chat Service (user request), Job Queue Service (async generation)
  • Outbound: AI Orchestrator (image model API), S3/Cloud Storage (upload), Chat Service (completion notification)
  • Data: S3 (image storage), PostgreSQL (image metadata)

Technology Recommendation:

  • OpenAI DALL-E 3, Stability AI, Midjourney API (via third-party)
  • Image processing: Sharp (Node.js), Pillow (Python)
  • Storage: AWS S3, Cloudflare R2, or Supabase Storage

Architecture Pattern:

// Async image generation workflow
async function generateImage(prompt: string, tenantId: string) {
  // Enqueue job
  const jobId = await queue.add('generate_image', {
    prompt,
    tenantId,
    provider: 'openai-dalle3'
  })

  // Job processor (background worker)
  queue.process('generate_image', async (job) => {
    const imageUrl = await aiOrchestrator.generateImage(job.data.prompt)
    const optimizedUrl = await processAndUpload(imageUrl, job.data.tenantId)
    await notifyUser(job.data.tenantId, optimizedUrl)
  })
}

Build Order Implication:

  • Build FIFTH (Phase 3) - Nice-to-have enhancement, not core MVP

8. Multi-Tenant Data Isolation

Responsibility:

  • Ensure tenant A cannot access tenant B's data
  • Apply tenant_id filter to all database queries
  • Enforce row-level security (RLS) at database level
  • Isolate file storage per tenant (S3 paths)

Communicates With:

  • All services that access PostgreSQL or S3 must enforce tenant isolation

Technology Recommendation:

  • Shared Database, Shared Schema (Pool Model) - Most cost-effective for micro-SaaS
  • PostgreSQL Row-Level Security (RLS) for defense-in-depth
  • Supabase RLS policies (if using Supabase Cloud)
  • Application-level enforcement: Always filter by tenant_id in WHERE clauses

Architecture Pattern:

-- PostgreSQL RLS example
CREATE POLICY tenant_isolation ON posts
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application sets tenant context per request
SET LOCAL app.current_tenant = 'tenant-uuid';
SELECT * FROM posts; -- Automatically filtered

Best Practice:

  • Middleware extracts tenant_id from JWT at API Gateway
  • All downstream services receive tenant_id in request context
  • Database queries ALWAYS include WHERE tenant_id = ?

Build Order Implication:

  • Build FIRST (Phase 1) - Critical security foundation, implement from day 1

Data Flow

1. User Sends Chat Message

User (Web/Telegram)
  → API Gateway (auth, tenant_id extraction)
  → Chat Service (retrieve context)
  → PostgreSQL (load brand info, conversation history)
  → AI Orchestrator (inject context, call AI provider)
  → OpenAI/Anthropic API (stream response)
  → Chat Service (emit SSE to client)
  → PostgreSQL (store conversation turn)

Key Decisions:

  • Use SSE (Server-Sent Events) for one-way AI streaming (simpler than WebSocket)
  • Use WebSocket if bidirectional communication needed (e.g., typing indicators)

2. User Publishes Post to Social Media

User (chat: "Publish this to LinkedIn and Facebook")
  → Chat Service (parse intent)
  → AI Orchestrator (generate post content if needed)
  → Chat Service (return preview to user)
  → User confirms
  → Social API Gateway (authenticate, publish)
  → LinkedIn API + Facebook Graph API (post content)
  → Social API Gateway (return post IDs)
  → PostgreSQL (store post record)
  → Chat Service (notify user: "Published!")

Alternative: Scheduled Publish

User: "Schedule this for tomorrow 10am"
  → Chat Service (parse schedule time)
  → Job Queue Service (enqueue publish job with delay)
  → Redis (store job)
  → [Wait until scheduled time]
  → BullMQ Worker (process job)
  → Social API Gateway (publish)
  → Email Service (notify user of success/failure)

3. Telegram Bot Message

Telegram Server (webhook POST to /api/telegram/webhook)
  → API Gateway (validate webhook signature)
  → Chat Service (same logic as web chat)
  → AI Orchestrator (generate response)
  → Chat Service (format for Telegram)
  → Telegram Bot API (send message)

Key Decision:

  • Reuse Chat Service for all channels (web, Telegram, WhatsApp)
  • Channel-specific adapters only handle message formatting (Markdown vs HTML)

4. AI Provider Failover

Chat Service → AI Orchestrator (request: OpenAI GPT-4)
  → OpenAI API (500 error or rate limit)
  → AI Orchestrator (detect failure, retry logic)
  → [Attempt 1 failed]
  → AI Orchestrator (fallback to Anthropic Claude)
  → Anthropic API (success)
  → Return response

Architecture Pattern:

  • Primary provider (cheapest/fastest): OpenAI GPT-4o-mini
  • Fallback provider: Anthropic Claude Sonnet
  • Last resort: Google Gemini

Patterns to Follow

Pattern 1: Multi-Provider Gateway (AI Abstraction)

What: Single unified interface abstracting multiple AI providers (OpenAI, Anthropic, Google).

When: When building AI features that need cost optimization, redundancy, or best-model-for-task routing.

Example:

// libs/ai/provider-gateway.ts
export class AIProviderGateway {
  private providers: Map<ProviderType, AIProvider>

  async generateText(
    prompt: string,
    options: {
      preferredProvider?: ProviderType,
      fallback?: boolean
    }
  ): Promise<string> {
    const provider = this.selectProvider(options.preferredProvider)

    try {
      return await provider.generate(prompt)
    } catch (error) {
      if (options.fallback) {
        const fallbackProvider = this.getNextProvider()
        return await fallbackProvider.generate(prompt)
      }
      throw error
    }
  }

  private selectProvider(preferred?: ProviderType): AIProvider {
    // Cost-based routing: cheap tasks → OpenAI, reasoning → Anthropic
    // Or user preference from tenant settings
  }
}

Benefits:

  • Cost optimization (route simple tasks to cheaper models)
  • High availability (auto-failover)
  • Easy migration (swap providers without changing application code)

Pattern 2: Context Injection (User Memory)

What: Retrieve user-specific context (brand info, past conversations) and inject into AI prompts.

When: Building personalized AI experiences that "remember" user preferences.

Example:

// services/chat/context-injector.ts
export class ContextInjector {
  async buildPrompt(userMessage: string, tenantId: string): Promise<string> {
    const context = await this.getUserContext(tenantId)

    const systemPrompt = `
You are a social media assistant for ${context.brandInfo.name}.

Brand Voice: ${context.brandInfo.voice}
Target Audience: ${context.brandInfo.targetAudience}
Industry: ${context.brandInfo.industry}

Recent conversation:
${this.formatConversationHistory(context.conversationHistory)}

User's new message: ${userMessage}

Generate a helpful response that maintains brand voice and leverages past context.
`
    return systemPrompt
  }
}

Benefits:

  • Personalized AI responses
  • Consistency across conversations
  • Foundation for long-term AI memory

Pattern 3: Unified Social Media Adapter

What: Abstract platform-specific APIs (Facebook, LinkedIn, X) behind a common interface.

When: Integrating multiple social platforms without scattering platform logic across codebase.

Example:

// libs/social/unified-adapter.ts
export interface SocialPost {
  text: string
  media?: MediaFile[]
  platforms: Platform[]
}

export interface SocialAdapter {
  publish(post: SocialPost): Promise<PublishResult>
  getAnalytics(postId: string): Promise<Analytics>
}

export class MetaAdapter implements SocialAdapter {
  async publish(post: SocialPost): Promise<PublishResult> {
    // Facebook Graph API specific logic
    const response = await fetch('https://graph.facebook.com/v18.0/me/feed', {
      method: 'POST',
      headers: { Authorization: `Bearer ${token}` },
      body: JSON.stringify({ message: post.text })
    })
    return this.normalizeResponse(response)
  }
}

export class SocialMediaGateway {
  private adapters: Map<Platform, SocialAdapter>

  async publishToAll(post: SocialPost): Promise<PublishResult[]> {
    const results = await Promise.allSettled(
      post.platforms.map(platform =>
        this.adapters.get(platform).publish(post)
      )
    )
    return results
  }
}

Benefits:

  • Add new platforms without changing core logic
  • Centralized error handling and retry logic
  • Easier testing (mock adapters)

Pattern 4: Background Job Queue (Scheduling)

What: Decouple long-running tasks (scheduled posts, image generation) from synchronous request handling.

When: Tasks that take >2 seconds, need retries, or are scheduled for future execution.

Example:

// services/queue/post-scheduler.ts
import { Queue, Worker } from 'bullmq'

const postQueue = new Queue('social-posts', { connection: redis })

// Enqueue job
export async function schedulePost(
  post: SocialPost,
  scheduledTime: Date,
  tenantId: string
) {
  await postQueue.add('publish', {
    post,
    tenantId
  }, {
    delay: scheduledTime.getTime() - Date.now(),
    attempts: 3,
    backoff: { type: 'exponential', delay: 2000 }
  })
}

// Worker processes jobs
const worker = new Worker('social-posts', async (job) => {
  const { post, tenantId } = job.data

  try {
    const result = await socialGateway.publishToAll(post)
    await db.posts.update({ id: post.id, status: 'published' })
    await notifyUser(tenantId, 'Post published successfully!')
  } catch (error) {
    await notifyUser(tenantId, `Post failed: ${error.message}`)
    throw error // Triggers retry
  }
}, { connection: redis })

Benefits:

  • Reliable delivery (survives server restarts)
  • Automatic retries with exponential backoff
  • Horizontal scaling (add more worker processes)

Pattern 5: Tenant Context Middleware

What: Extract tenant_id from JWT at API Gateway, pass to all services, enforce in all database queries.

When: Building multi-tenant SaaS with shared database (pool model).

Example:

// middleware/tenant-context.ts
export function tenantContextMiddleware(req, res, next) {
  // Extract tenant_id from JWT
  const token = req.headers.authorization?.split(' ')[1]
  const decoded = jwt.verify(token, SECRET)

  // Attach to request
  req.tenantId = decoded.tenantId

  // Set PostgreSQL session variable (for RLS)
  await db.query(`SET LOCAL app.current_tenant = '${req.tenantId}'`)

  next()
}

// All queries automatically filtered by RLS
app.get('/api/posts', async (req, res) => {
  // No need to manually filter by tenant_id - RLS does it
  const posts = await db.query('SELECT * FROM posts')
  res.json(posts)
})

Benefits:

  • Zero chance of cross-tenant data leakage
  • Defense-in-depth (app + database enforce isolation)
  • Simpler query code (no WHERE tenant_id everywhere)

Anti-Patterns to Avoid

Anti-Pattern 1: Tight Coupling to AI Provider

What goes wrong: Hardcoding OpenAI SDK calls throughout codebase.

Why bad:

  • Vendor lock-in (can't switch providers without massive refactor)
  • No fallback when provider is down
  • Difficult to A/B test different models

Instead: Use AI Provider Gateway pattern (see above).

Warning Signs:

// BAD - OpenAI SDK scattered everywhere
import OpenAI from 'openai'

async function handleChat(message: string) {
  const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY })
  const response = await openai.chat.completions.create({ ... })
  return response.choices[0].message.content
}

Fix:

// GOOD - Abstract provider behind interface
import { aiGateway } from '@/libs/ai/provider-gateway'

async function handleChat(message: string) {
  return await aiGateway.generateText(message, {
    preferredProvider: 'openai',
    fallback: true
  })
}

Anti-Pattern 2: Missing Tenant Isolation

What goes wrong: Forgetting to filter queries by tenant_id.

Why bad:

  • Data leakage between customers (catastrophic security breach)
  • Compliance violations (GDPR, SOC2)
  • Potential lawsuit and business destruction

Instead:

  • Use Tenant Context Middleware pattern
  • Enable PostgreSQL Row-Level Security
  • Write integration tests that verify isolation

Warning Signs:

// BAD - No tenant filtering
app.get('/api/posts', async (req, res) => {
  const posts = await db.query('SELECT * FROM posts')
  res.json(posts) // Returns ALL tenants' posts!
})

Fix:

// GOOD - Explicit filtering + RLS
app.get('/api/posts', tenantContextMiddleware, async (req, res) => {
  const posts = await db.query(
    'SELECT * FROM posts WHERE tenant_id = $1',
    [req.tenantId]
  )
  res.json(posts)
})

Anti-Pattern 3: Synchronous Long-Running Tasks

What goes wrong: Publishing to social media in synchronous API request.

Why bad:

  • Request timeouts (platforms can take 5-10 seconds)
  • No retry on failure
  • Poor user experience (blocked waiting)

Instead: Use Background Job Queue pattern.

Warning Signs:

// BAD - Blocking request until all platforms publish
app.post('/api/publish', async (req, res) => {
  await publishToFacebook(post) // 5 seconds
  await publishToLinkedIn(post) // 3 seconds
  await publishToTwitter(post) // 2 seconds
  res.json({ success: true }) // User waits 10+ seconds!
})

Fix:

// GOOD - Enqueue job, return immediately
app.post('/api/publish', async (req, res) => {
  const jobId = await postQueue.add('publish', { post, tenantId })
  res.json({ jobId, status: 'queued' }) // Returns in <100ms
  // User gets real-time update via WebSocket when done
})

Anti-Pattern 4: No AI Streaming

What goes wrong: Waiting for entire AI response before showing anything to user.

Why bad:

  • Poor UX (5-10 second blank screen while AI generates)
  • Users think app is broken
  • Modern AI chat UX expectation is streaming

Instead: Stream AI responses token-by-token via SSE.

Warning Signs:

// BAD - Wait for full response
const completion = await openai.chat.completions.create({ ... })
res.json({ message: completion.choices[0].message.content })

Fix:

// GOOD - Stream tokens as they arrive
const stream = await openai.chat.completions.create({
  stream: true,
  ...
})

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content
  if (content) {
    res.write(`data: ${JSON.stringify({ content })}\n\n`)
  }
}
res.end()

Anti-Pattern 5: Platform-Specific Logic in Core Services

What goes wrong: Chat Service has if (platform === 'facebook') { ... } logic.

Why bad:

  • Core service becomes bloated with platform quirks
  • Adding new platforms requires modifying core logic
  • Difficult to test

Instead: Use Social Media Adapter pattern with platform-specific implementations.

Warning Signs:

// BAD - Platform logic in core service
async function publishPost(post, platforms) {
  for (const platform of platforms) {
    if (platform === 'facebook') {
      // Facebook-specific logic
    } else if (platform === 'linkedin') {
      // LinkedIn-specific logic
    } // ...grows forever
  }
}

Fix:

// GOOD - Adapters encapsulate platform logic
const adapters = {
  facebook: new MetaAdapter(),
  linkedin: new LinkedInAdapter()
}

async function publishPost(post, platforms) {
  return Promise.all(
    platforms.map(p => adapters[p].publish(post))
  )
}

Scalability Considerations

Concern At 100 users At 10K users At 100K users
Database Single Postgres (Supabase free tier) Postgres with read replicas Connection pooling (PgBouncer), read replicas, partitioning
AI API Costs $50-200/month $2K-5K/month (need cost controls) $20K+/month (cache frequent responses, use cheaper models for simple tasks)
WebSocket Connections Single Node.js server 2-3 servers with sticky sessions Redis Pub/Sub for cross-server messaging, dedicated WebSocket servers
Job Queue Single BullMQ worker 3-5 workers (horizontal scaling) Worker autoscaling based on queue depth
File Storage Supabase Storage (free tier) S3/R2 with CDN CDN + image optimization pipeline
Rate Limiting In-memory (simple) Redis-backed rate limiting Distributed rate limiting with sliding window
Social API Quotas Single app credentials Per-user OAuth (distributes quota) Enterprise API access + request batching

Key Scaling Milestones

Phase 1 (MVP, 0-100 users):

  • Monolithic Next.js app
  • Supabase Cloud (free tier)
  • Single AI provider (OpenAI)
  • No job queue (simple setTimeout)

Phase 2 (1K users):

  • Separate BullMQ worker process
  • Multi-AI provider support
  • Redis for session/cache
  • Social API integration (1-2 platforms)

Phase 3 (10K users):

  • Horizontal scaling (2-3 servers)
  • Read replicas for database
  • CDN for static assets
  • Advanced AI memory (vector search)

Phase 4 (100K+ users):

  • Microservices architecture
  • Dedicated WebSocket servers
  • Database partitioning by tenant
  • Enterprise social API access

Build Order and Dependencies

Dependency Graph

Phase 1 (Core Chat Experience):
  1. Multi-tenant Auth + API Gateway ───┐
  2. User Context Store (basic)        │
  3. Chat Service                      │
  4. AI Orchestrator (single provider) │
  └──> MVP: Web chat with AI responses

Phase 2 (Social Publishing):
  5. Social API Gateway (1 platform - LinkedIn)
  6. Job Queue Service (BullMQ)
  7. AI Orchestrator (multi-provider)
  └──> Feature: Schedule + publish posts

Phase 3 (Multi-Channel):
  8. Telegram Bot integration
  9. Image Generation Pipeline
  10. Advanced User Context (vector DB)
  └──> Feature: Cross-channel, rich media

Phase 4 (Scale + Polish):
  11. WhatsApp Bot integration
  12. Analytics dashboard
  13. Performance optimizations
  └──> Production-ready SaaS

Critical Path

Must Build First:

  1. Multi-tenant authentication (foundation for all tenant isolation)
  2. API Gateway (single entry point, tenant context middleware)
  3. User Context Store (AI needs brand info to personalize responses)
  4. Chat Service (core product experience)

Can Build Later:

  • Job Queue (use simple setTimeout for MVP scheduling)
  • Image Generation (nice-to-have, not core)
  • Multiple social platforms (start with 1, add more incrementally)

Parallel Tracks

Track A (Chat Experience):

  • Auth → Context Store → Chat Service → AI Orchestrator

Track B (Publishing):

  • Social API Gateway → Job Queue

Can develop independently, integrate when both ready.


Technology Stack Recommendations

Component Recommended Alternative Why
Frontend Next.js 14+ (App Router) Remix, SvelteKit Best DX for React SSR, API routes, built-in streaming
Backend Next.js API Routes (Phase 1) → Microservices (Phase 3+) Express, Fastify Start monolith, extract services later
Database PostgreSQL (Supabase Cloud) Neon, AWS RDS Built-in auth, storage, RLS, real-time subscriptions
Cache/Queue Redis (Upstash for serverless) Valkey, KeyDB Standard for session, cache, BullMQ backend
Job Queue BullMQ Trigger.dev, Inngest Most mature Node.js queue, Redis-backed
AI Providers OpenAI (primary), Anthropic (fallback) Google Gemini, Groq OpenAI most reliable, Anthropic best reasoning
AI Abstraction LiteLLM or custom gateway LangChain LiteLLM simpler for multi-provider, LangChain for complex chains
Social APIs Outstand (unified API) Ayrshare, Sociality.io Reduces integration complexity, faster iteration
WebSocket Server-Sent Events (SSE) Socket.io, Pusher Simpler for one-way streaming, built into HTTP
File Storage Supabase Storage or Cloudflare R2 AWS S3, Vercel Blob R2 zero egress fees, Supabase integrated with DB
Vector DB Supabase pgvector Pinecone, Qdrant Same database as core data, simpler architecture
Image Processing Sharp (Node.js) Jimp, ImageMagick Fastest, native performance
Auth NextAuth.js v5 or Supabase Auth Clerk, Auth0 NextAuth flexible, Supabase integrated
Deployment Vercel (frontend) + VPS (workers) Railway, Fly.io Vercel best Next.js DX, VPS for background workers

Sources

AI Social Media Management Architecture

Headless Architecture

Multi-AI Provider Integration

Real-Time Chat Architecture

Social Media API Integration

Background Job Queues

User Context & Memory

Multi-Tenant Architecture

AI Image Generation

Telegram/WhatsApp Integration