ScriptBuddyAI
Voice memos in → platform-shaped scripts out. Streaming, multi-format, opinionated.
PROBLEM
Anyone who talks for a living has hours of voice notes and zero time to reshape them. Off-the-shelf Whisper gives you transcripts. Nobody publishes transcripts.
DECISIONS
- OpenAI Whisper over self-hosted: faster TTM, predictable cost at this volume.
- Vercel Edge for the streaming endpoint — SSE all the way to the browser.
- Prompt-template chain instead of fine-tuning: cheaper to iterate, easier to audit.
- Supabase RLS so user audio is isolated without writing an auth service.
ARCHITECTURE
browser │ upload .m4a (chunked, resumable) ▼ Vercel Edge fn ── signed URL ──> S3 │ ▼ Whisper API ── transcript ──> prompt chain │ ┌───────────┼───────────┐ ▼ ▼ ▼ X thread LinkedIn YouTube (SSE) (SSE) script ▼ Supabase ─ RLS per user ─ history
OUTCOME
- Public product live with auth, pricing, and script generation flows.
- Typical 5-minute memo to multi-format drafts: under ~30 seconds in normal runs.
- Generation cost kept to low cents per script at current usage patterns.