# Pitfalls Research

**Domain:** LLM-powered bulk content generation / CSV output for design tools
**Project:** PostGenerator — Instagram carousel bulk generation for B2B Italian SME marketing
**Researched:** 2026-03-07
**Confidence:** HIGH (most pitfalls verified against official docs or multiple confirmed sources)

---

## Critical Pitfalls

### Pitfall 1: LLM Output That "Looks Valid" But Isn't (Soft Failures)

**What goes wrong:**
Claude returns HTTP 200 with a valid response, JSON parses successfully, but the content is wrong: truncated mid-sentence, repeated slide text, missing slides in a carousel, or Italian that reads as translated-from-English. These "soft failures" never trigger error handling logic because there is no exception — only bad output silently written to files.

**Why it happens:**
Developers test the happy path, see valid JSON, and assume correctness. Validation is wired only for parse errors, not for semantic/structural correctness. In bulk mode, one bad generation gets buried in a batch of 20 and is only caught by the end user.

**How to avoid:**
- Implement a two-level validation layer: (1) JSON schema validation via Pydantic on parse, (2) business rule validation on content — check slide count matches requested count, each slide has non-empty text fields, text lengths are within Canva field limits, no repeated content across slides.
- Log structured validation failures separately from API errors so you can track quality drift.
- For Italian content: detect obvious English leakage by checking for common English stop words ("the", "and", "of") in output fields that should be Italian.

**Warning signs:**
- Output carousels where multiple slides have identical or near-identical text.
- Generated `titolo` fields that are suspiciously short (< 5 chars) or over Canva limits.
- CSV rows where some cells are empty when the schema required content.
- User reports "Canva mappa un campo ma c'e' testo mancante".

**Phase to address:** Phase 1 (Core generation pipeline) — build validation alongside the first API call, not as a later addition.

---

### Pitfall 2: Canva Bulk Create CSV — Column Names Must Exactly Match Template Placeholders

**What goes wrong:**
Canva Bulk Create requires CSV column headers to exactly match the placeholder names defined in the Canva template design. A mismatch (case, space, accent, or encoding difference) causes the field to silently fail to map — the placeholder stays empty in the generated design. There is no error message; the design just looks incomplete.

**Why it happens:**
Developers generate CSV column names programmatically from the content schema, but the Canva template was built manually with slightly different placeholder names. The user then has to manually re-map every field in the Canva UI, defeating the automation benefit.

**How to avoid:**
- Define placeholder names as a project-level constant (e.g. `CANVA_FIELDS = ["titolo", "sottotitolo", "testo_1", ...]`) that is shared between the template documentation and the CSV generator code.
- Include a "validate against template" step in the generation pipeline: before writing the CSV, verify every column name matches the expected constant list.
- Document exact Canva placeholder names in the project (e.g., in a `CANVA_TEMPLATE.md` or in the prompt templates themselves).

**Warning signs:**
- User says "Canva non collega automaticamente i campi, devo farlo a mano ogni volta".
- CSV preview in the application shows correct data but Canva designs come out blank.
- After a prompt template change, the column name changes and existing Canva templates break.

**Phase to address:** Phase 1 — define and lock the Canva field schema before writing any generation code.

---

### Pitfall 3: CSV UTF-8 BOM Encoding Breaks Canva / Excel Import

**What goes wrong:**
The CSV is correctly UTF-8 encoded, but Canva or the user's spreadsheet tool misinterprets it because a BOM (Byte Order Mark) is present or absent. Italian accented characters (à, è, é, ì, ò, ù) appear as `Ã `, `Ã¨` etc. in the design. Alternatively, the file is generated without BOM, and when the user opens it in Excel on Windows before uploading to Canva, Excel re-encodes it to Windows-1252, corrupting the accented characters before they reach Canva.

**Why it happens:**
Python's `csv` module with `encoding='utf-8'` produces UTF-8 without BOM. Excel on Windows expects UTF-8 with BOM (`utf-8-sig`) to auto-detect encoding. Canva itself expects plain UTF-8. This creates a two-failure-mode trap: BOM for Excel users, no-BOM for Canva direct upload.

**How to avoid:**
- Generate CSV with `encoding='utf-8-sig'` (UTF-8 with BOM). Canva ignores the BOM; Excel on Windows correctly reads it. This is the "safe default" for this use case.
- Always include a charset declaration in the HTTP response header when serving the download: `Content-Type: text/csv; charset=utf-8`.
- Test the download with actual Italian content containing `àèéìòù` before declaring the feature complete.
- Add a note in the UI: "Apri il CSV in Google Sheets, non in Excel, per evitare problemi di encoding".

**Warning signs:**
- Any accented character appearing as multiple garbled characters in Canva designs.
- User reports "i caratteri speciali sono sbagliati".
- CSV looks fine in a code editor but breaks when opened in Excel.

**Phase to address:** Phase 1 — encoding must be correct from the first CSV output, not retrofitted later.

---

### Pitfall 4: FastAPI `root_path` Double-Path Bug Behind Nginx Subpath Proxy

**What goes wrong:**
FastAPI deployed at `lab.mlhub.it/postgenerator/` requires `root_path` configuration to generate correct OpenAPI/Swagger URLs. However, if `root_path` is set both in `FastAPI()` constructor AND in Uvicorn, the prefix is applied twice, producing paths like `/postgenerator/postgenerator/openapi.json` which returns 404. The API may work correctly while the docs are broken, masking the configuration error.

**Why it happens:**
The FastAPI `root_path` mechanism is designed for "stripping path proxies" that remove the prefix before forwarding. If the nginx config forwards the full path (including `/postgenerator/`), FastAPI needs root_path to know its prefix but must NOT double-apply it. The interaction between FastAPI app-level root_path and Uvicorn-level root_path is a confirmed multi-year bug with many GitHub issues.

**How to avoid:**
- Set `root_path` only via Uvicorn (`--root-path /postgenerator`) or as an environment variable, NOT in the `FastAPI()` constructor.
- Configure nginx to strip the prefix before forwarding to the container: `proxy_pass http://container:8000/;` (note trailing slash — nginx strips the location prefix).
- Test OpenAPI docs at `lab.mlhub.it/postgenerator/docs` explicitly during initial deploy, not just API calls.
- Use `X-Forwarded-Prefix` header instead of root_path if nginx stripping is not feasible.

**Warning signs:**
- Swagger UI loads but API calls from the docs return 404.
- `/openapi.json` returns 404 while direct API endpoint calls work.
- Browser network tab shows requests to `../postgenerator/postgenerator/...`.

**Phase to address:** Phase 1 (deployment scaffolding) — verify proxy configuration before any feature work.

---

### Pitfall 5: Bulk Generation Without Per-Item State — All-or-Nothing Failure

**What goes wrong:**
A user requests 20 carousels. The backend loops through 20 API calls sequentially. The 15th call hits a rate limit or network timeout. The entire batch fails, the partial results are lost, and the user must start over from scratch.

**Why it happens:**
Simple sequential loop with a try/except at the top level. No intermediate state is persisted. This is acceptable for a single-item generation but catastrophic for bulk operations where each item has real API cost.

**How to avoid:**
- Implement per-item status tracking: each carousel gets a status record (pending / processing / success / failed) stored in the file system or an in-memory dict with a job ID.
- On failure, mark the item as `failed` and continue processing the rest. Allow retry of only failed items.
- Return partial results: if 15/20 succeed, deliver those 15 in the CSV.
- Persist the job state to disk (JSON file per job) so that server restart does not lose progress.

**Warning signs:**
- Bulk requests timeout and the user sees no output.
- Backend logs show one exception that kills the loop.
- User has to re-enter all 20 topics to retry because no intermediate state exists.

**Phase to address:** Phase 1 for basic per-item error isolation; Phase 2+ for full job persistence and retry UI.

---

### Pitfall 6: Rate Limit Errors Treated as Generic Errors — No Backoff

**What goes wrong:**
Claude API returns HTTP 429 (rate limit exceeded). The backend catches `Exception`, logs "generation failed", and either retries immediately (hammering the API) or returns an error to the user. In Tier 1 (50 RPM, 8,000 OTPM for Sonnet), a bulk batch of 10-20 carousels can easily hit the OTPM ceiling.

**Why it happens:**
Developers test with 1-2 items and never hit rate limits. Error handling is added generically. The `retry-after` header in the 429 response — which tells you exactly how long to wait — is ignored.

**How to avoid:**
- Implement specific 429 handling that reads the `retry-after` response header and waits that exact duration before retrying.
- Use exponential backoff with jitter for other transient errors (5xx), but honor `retry-after` precisely for 429.
- For bulk jobs: add a configurable delay between consecutive API calls (e.g. 2-3 seconds) to stay within OTPM limits even in Tier 1.
- Monitor `anthropic-ratelimit-output-tokens-remaining` response header to throttle proactively.
- Consider the Batches API (50% cheaper, separate rate limits) for non-interactive bulk generation.

**Warning signs:**
- Backend logs full of 429 errors during any batch larger than 5 items.
- Generation speed suddenly drops to zero mid-batch.
- Costs appear lower than expected (actually requests are failing, not succeeding).

**Phase to address:** Phase 1 — implement from the first API integration, not as a later optimization.

---

### Pitfall 7: Prompt Templates With Hard-Coded Assumptions That Break Silently

**What goes wrong:**
The prompt template instructs Claude to "generate 5 slides". Later, the user configures 7 slides per carousel. The template still says 5, but the schema expects 7. Claude generates 5 slides. The schema validator accepts it (minimum not enforced). CSV has 5 slide columns with data and 2 empty. Canva design has 2 blank slides.

**Why it happens:**
Prompt template is written with specific numbers embedded as literals rather than as injected variables. When configuration changes, only the schema/code is updated, not the template. File-based prompt management makes this disconnect invisible — there is no compile-time check.

**How to avoid:**
- Make all variable parameters injectable: `Genera {{num_slides}} slide`, not "Genera 5 slide".
- Implement a template validation step at startup: parse all templates, identify all `{{variable}}` placeholders, and verify every placeholder has a corresponding runtime value.
- Use a template rendering test in CI: render each template with test values and verify output matches expected format.
- Keep a `TEMPLATE_VARIABLES.md` that documents every variable each template expects.

**Warning signs:**
- Generated carousels have inconsistent slide counts.
- Template file was last modified weeks ago but configuration was changed last week.
- New team member added a config option but forgot to update the prompt template.

**Phase to address:** Phase 1 (prompt system foundation) — variable injection must be architectural, not an afterthought.

---

### Pitfall 8: Italian Language — Prompts Written in English Produce "Translated" Italian

**What goes wrong:**
The system prompt is written in English and asks Claude to "generate content in Italian". Claude generates Italian that is grammatically correct but sounds translated: unnatural phrasing, English idioms translated literally, formal register too stiff for SME B2B social media, or UK English business vocabulary instead of Italian business vocabulary.

**Why it happens:**
Claude is primarily English-trained. When given English instructions to produce Italian output, it "thinks in English and translates". The result passes spell-check but fails the native speaker smell test. B2B content for Italian entrepreneurs has specific vocabulary and register expectations (professional but not academic, action-oriented, concrete benefits).

**How to avoid:**
- Write the system prompt IN Italian (not "write in Italian" from an English prompt). Italian-language instructions produce more natural Italian output.
- Provide Italian examples in the few-shot section: show Claude an example carousel with the exact tone, vocabulary, and structure you want.
- Define explicit tone guidelines in Italian: e.g. "Usa il tu, sii diretto, parla di benefici concreti, evita il jargon tecnico".
- Include a list of Italian B2B vocabulary to use/avoid.
- Note: Italian prompts cost ~2x more tokens than English equivalents — factor this into cost estimates.

**Warning signs:**
- Generated Italian text uses "fare business" when native would say "fare affari".
- Carousel titles that sound like SEO headlines rather than Instagram hooks.
- Formality that oscillates randomly (tu/lei mixed, formal/informal register).
- User feedback: "sembra scritto con Google Translate".

**Phase to address:** Phase 1 — prompt language and tone are foundational decisions, hard to fix post-deployment without regenerating all content.

---

### Pitfall 9: React Frontend API URL Hardcoded to Absolute Path — Breaks Behind Subpath Proxy

**What goes wrong:**
React frontend makes API calls to `/api/generate`. Behind nginx at `lab.mlhub.it/postgenerator/`, the actual path is `lab.mlhub.it/postgenerator/api/generate`. The frontend sends requests to `lab.mlhub.it/api/generate` which is a different nginx location (or 404). Works in local development, breaks immediately in production.

**Why it happens:**
Developers use Create React App / Vite proxy in development where `/api` works fine. In production, the subpath prefix is not accounted for. `REACT_APP_API_URL` environment variable is set to `/api` (absolute) instead of a relative or base-path-aware URL.

**How to avoid:**
- Use the Vite/CRA `VITE_API_BASE_URL` env var set to an empty string for development (proxy handles it) and `/postgenerator/api` for production builds.
- Alternatively: serve the React app and FastAPI from the same container with nginx internal routing — browser sees single origin, no CORS, no subpath math.
- Test the production build (not dev server) against the nginx proxy during Phase 1 deployment, before any feature work.
- In nginx config, ensure `/postgenerator/api/` proxies to FastAPI and `/postgenerator/` serves the React static files.

**Warning signs:**
- API calls return 404 or hit the wrong service in production but work in `npm run dev`.
- Browser network tab shows requests to `/api/generate` without the `/postgenerator` prefix.
- CORS errors in production (requests going to wrong origin because URL resolution failed).

**Phase to address:** Phase 1 deployment scaffolding — must be tested before any other development.

---

## Technical Debt Patterns

| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|----------|-------------------|----------------|-----------------|
| Store all files in flat directory without job subdirectories | Simple to implement | Hundreds of files mixed together, no cleanup possible | Never — use job-scoped directories from day 1 |
| Hardcode slide count (5) in prompt template | Fast to write | Config changes break output silently | Never — always inject as variable |
| Single try/except around entire generation loop | Simple error handling | One failure kills entire batch | Never for bulk — per-item isolation is required |
| Write CSV as UTF-8 without BOM | Python default | Italian accents corrupt in Excel | Never — use `utf-8-sig` always |
| Set `root_path` in both FastAPI and Uvicorn | Seems comprehensive | Double-path 404 bug in docs and some calls | Never — set in one place only |
| Generate all carousels before validating any | Defers complexity | User waits for all, then gets bulk failure | Acceptable in MVP if per-item failure isolation exists |
| English-language system prompt with "write in Italian" | Easier to write | Translated-sounding Italian output | Never for public-facing product |

---

## Integration Gotchas

| Integration | Common Mistake | Correct Approach |
|-------------|----------------|------------------|
| Claude API structured outputs | Using schema with `minimum`/`maximum` constraints and expecting them to be enforced | Constraints are stripped from schema sent to Claude; use Pydantic validation post-response to enforce ranges |
| Claude API rate limits | Treating 429 as generic error, retrying immediately | Read `retry-after` header; honor exactly; add inter-request delay for bulk jobs |
| Canva Bulk Create | Column names generated from code schema diverge from template placeholders | Lock column names as constants shared between code and Canva template documentation |
| Canva Bulk Create | Uploading CSV with row count > 300 | Canva Bulk Create supports max 300 rows per upload batch; split large batches |
| Claude API OTPM | Generating 20 carousels at full speed in Tier 1 (8,000 OTPM) | Add configurable delay between calls; consider Batches API for non-interactive generation |
| FastAPI + nginx subpath | Setting `root_path` in FastAPI constructor when nginx does NOT strip prefix | Set root_path only in Uvicorn; configure nginx to strip prefix OR forward full path but not both |
| React + FastAPI in same Docker container | CORS configuration in FastAPI needed even for same-origin | Use nginx internal routing so browser sees single origin; CORS becomes irrelevant |

---

## Performance Traps

| Trap | Symptoms | Prevention | When It Breaks |
|------|----------|------------|----------------|
| Sequential API calls for bulk | 20 carousels take 20x single time; UI shows "loading" with no feedback | Add progress tracking per item; consider async generation with status polling | Any bulk request > 3 items |
| No prompt caching for repeated system prompt | Each call sends full system prompt, consuming ITPM quota | Use `cache_control: ephemeral` on system prompt; same prompt cached for ~5 min | At Tier 1 OTPM limits with 5+ items in batch |
| CSV generated in memory for large batches | Memory spike, potential OOM in 256MB container | Stream CSV rows to disk as each carousel is generated | Batches > 50 carousels |
| File storage without cleanup | Disk fills up on VPS over time | Implement TTL-based cleanup for generated files (e.g., delete after 24h) | After ~1000 generation jobs depending on file size |

---

## Security Mistakes

| Mistake | Risk | Prevention |
|---------|------|------------|
| Exposing Claude API key in frontend environment variables | Key leaked via browser DevTools, used for unauthorized API calls | Keep API key server-side only; frontend never sees it |
| No validation of user-provided topic/industry input before injecting into prompt | Prompt injection: user crafts input that overrides system prompt instructions | Sanitize and length-limit user inputs; wrap user content in explicit delimiters in prompt |
| Storing generated CSV files accessible at predictable URLs | Users can download others' generated content | Use UUID-based job IDs for file paths; validate job ownership before serving |
| No API rate limiting on the FastAPI endpoint | Unlimited calls = unlimited Claude API cost | Implement per-IP or per-session rate limiting on the generation endpoint |

---

## UX Pitfalls

| Pitfall | User Impact | Better Approach |
|---------|-------------|-----------------|
| No progress feedback during bulk generation | User sees spinner for 60+ seconds, assumes crash, refreshes, loses job | Show per-carousel progress (item 3/10 completed) with estimated time remaining |
| "Generation failed" error with no actionable info | User does not know if it was their input, the API, or a bug | Distinguish: "API limit reached, retry in X seconds" vs "Input too long" vs "Unexpected error (ID: xxx)" |
| CSV download triggers browser "open with" dialog | User opens CSV in Excel, encoding corrupts Italian text, blames the tool | Set `Content-Disposition: attachment; filename="carousels.csv"` and add note about Google Sheets |
| Generated content not previewable before CSV download | User discovers quality issues only after importing to Canva | Show a text preview of at least the first carousel before offering CSV download |
| No way to regenerate a single carousel | User must redo entire batch to fix one bad result | Allow per-item regeneration from the results view |

---

## "Looks Done But Isn't" Checklist

- [ ] **CSV encoding:** Contains actual Italian accented characters (àèéìòù) — verify download in both browser and Excel, not just code editor
- [ ] **Canva field mapping:** Test actual Canva import with the generated CSV, not just "the columns look right"
- [ ] **Rate limit handling:** Test with a batch of 10+ items to trigger actual rate limits in Tier 1
- [ ] **Subpath routing:** Test production build (Docker container) at nginx subpath, not local `npm run dev`
- [ ] **FastAPI docs:** Verify Swagger UI at `/postgenerator/docs` works AND API calls from within Swagger work
- [ ] **Italian quality:** Have a native Italian speaker review generated content, not just verify it is grammatically Italian
- [ ] **Partial failure:** Kill the process mid-batch and verify partial results are not lost
- [ ] **Large batch:** Test with 20 carousels (near Tier 1 limits) for both correctness and timing
- [ ] **Prompt variable injection:** Change slide count in config, verify prompt template reflects the change, verify output slide count matches

---

## Recovery Strategies

| Pitfall | Recovery Cost | Recovery Steps |
|---------|---------------|----------------|
| CSV encoding corruption | LOW | Change Python writer to `utf-8-sig`, regenerate affected CSVs |
| Canva field name mismatch | MEDIUM | Redefine constant, update all templates referencing old names, regenerate |
| FastAPI double root_path | LOW | Remove `root_path` from FastAPI constructor, set only in Uvicorn, redeploy |
| Italian quality issues in prompt | HIGH | Rewrite system prompt in Italian, add few-shot examples, re-evaluate all previously generated content |
| Bulk pipeline data loss (no per-item state) | HIGH | Requires architectural change to generation loop; cannot fix without refactor |
| Prompt template variable mismatch | MEDIUM | Add validation step, fix template, test all templates in sequence |

---

## Pitfall-to-Phase Mapping

| Pitfall | Prevention Phase | Verification |
|---------|------------------|--------------|
| Soft failures in LLM output (Pitfall 1) | Phase 1: Generation pipeline | Validation rejects malformed output in unit tests |
| Canva column name mismatch (Pitfall 2) | Phase 1: Define Canva field constants | Manual Canva import test with generated CSV |
| CSV UTF-8 BOM encoding (Pitfall 3) | Phase 1: First CSV output | Open downloaded CSV in Excel on Windows — no garbled chars |
| FastAPI root_path double bug (Pitfall 4) | Phase 1: Deployment scaffolding | Swagger UI works at `/postgenerator/docs` in Docker |
| Bulk batch all-or-nothing failure (Pitfall 5) | Phase 1: Per-item error isolation | Force mid-batch failure, verify partial results saved |
| Rate limit no-backoff (Pitfall 6) | Phase 1: API client layer | Batch of 10 items completes without 429 crashing the job |
| Prompt template hardcoded values (Pitfall 7) | Phase 1: Prompt template system | Change slide count config, verify template output changes |
| Italian quality / translated-sounding (Pitfall 8) | Phase 1: Prompt engineering | Native Italian speaker review before any user testing |
| React API URL subpath bug (Pitfall 9) | Phase 1: Deployment scaffolding | Production build test at nginx subpath URL |

---

## Sources

- [Claude API Rate Limits — official documentation](https://platform.claude.com/docs/en/api/rate-limits) (HIGH confidence — official, accessed 2026-03-07)
- [Claude Structured Outputs — official documentation](https://platform.claude.com/docs/en/build-with-claude/structured-outputs) (HIGH confidence — official, accessed 2026-03-07)
- [Canva Bulk Create — Help Center](https://www.canva.com/help/bulk-create/) (MEDIUM confidence — official product page, row/column limits confirmed)
- [Canva Bulk Create Data Autofill — Help Center](https://www.canva.com/help/bulk-create-data-autofill/) (MEDIUM confidence — 150 field limit confirmed)
- [FastAPI Behind a Proxy — official docs](https://fastapi.tiangolo.com/advanced/behind-a-proxy/) (HIGH confidence — official)
- [FastAPI root_path double-path issue — GitHub Discussion #9018](https://github.com/fastapi/fastapi/discussions/9018) (HIGH confidence — confirmed bug with multiple occurrences)
- [FastAPI Incorrect root_path duplicated prefixes — GitHub Discussion #11977](https://github.com/fastapi/fastapi/discussions/11977) (HIGH confidence)
- [Building Reliable LLM Pipelines: Error Handling Patterns — ilovedevops](https://ilovedevops.substack.com/p/building-reliable-llm-pipelines-error) (MEDIUM confidence — community, corroborates official rate limit docs)
- [Retries, fallbacks, circuit breakers in LLM apps — Portkey](https://portkey.ai/blog/retries-fallbacks-and-circuit-breakers-in-llm-apps/) (MEDIUM confidence — practitioner guide)
- [Non-English Languages Prompt Engineering Trade-offs — LinkedIn](https://www.linkedin.com/pulse/non-english-languages-prompt-engineering-trade-offs-giorgio-robino) (MEDIUM confidence — Italian token cost 2x confirmed)
- [Evalita-LLM: Benchmarking LLMs on Italian — arXiv](https://arxiv.org/html/2502.02289v1) (MEDIUM confidence — research confirming Italian LLM quality issues)
- [Canva CSV upload tips — Create Stimulate](https://createstimulate.com/blogs/news/canva-tips-for-uploading-csv-files-using-bulk-create) (LOW confidence — community blog, limited technical detail)
- [Opening CSV UTF-8 files in Excel — Microsoft Support](https://support.microsoft.com/en-us/office/opening-csv-utf-8-files-correctly-in-excel-8a935af5-3416-4edd-ba7e-3dfd2bc4a032) (HIGH confidence — official Microsoft, confirms BOM requirement)

---
*Pitfalls research for: PostGenerator — LLM-powered Instagram carousel bulk generation (B2B Italian SME)*
*Researched: 2026-03-07*