AI Product Features · AI Workflow Deep Dive

Quick Facts

At a Glance

Basic Concepts

"AI feature" rarely means a raw chatbot — usually it's a focused capability inside an existing flow.
Latency & cost are first-class design concerns, not afterthoughts.
Evals before launch. If you can't measure it, don't ship it.
Streaming UX hides latency. Most user-facing AI features stream tokens.
Have a non-AI fallback — model APIs go down or get rate-limited.

Patterns

The Common AI Features

Chatbots & Assistants

The default first AI feature most products ship. Variants:

Support bot — RAG over docs + ticket history; deflects tier-1 tickets.
In-product assistant — knows the user's account context, can take actions.
Persona / character chatbots — entertainment, companionship.

Hard parts: scoping ("don't answer off-topic"), safety, escalation to humans, conversation memory.

Semantic / "Natural Language" Search

Replace keyword search with embeddings — users ask questions, you return relevant docs/products/items. Often combined with re-ranking and metadata filters. Backed by a vector DB.

Best as hybrid (keyword + semantic) — pure semantic loses on rare exact terms (SKU, error code, names).

Summarization

Long documents → executive summary.
Meeting transcripts → action items.
Email threads → "what changed?"
Comments / reviews → sentiment + themes.

Often the easiest AI win — high user value, simple prompt, predictable cost.

Recommendations & Personalization

Two flavors today:

Classical: matrix factorization / two-tower / GBT — still wins on most marketplaces.
LLM-augmented: generate explanations ("Because you liked X…"), re-rank with reasoning, conversational discovery.

Generation (Text, Image, Audio, Code)

Marketing copy & product descriptions.
Image generation — banners, thumbnails, style transfer (Stable Diffusion, Flux, DALL-E, Imagen).
Voice / TTS — ElevenLabs, OpenAI TTS, Cartesia.
Code-as-feature — Notion AI in formulas, Excel Copilot.

Classification & Extraction

Often the most cost-effective LLM use:

Tag tickets, route emails, label content.
Extract structured data from PDFs / contracts / invoices.
Score sentiment, urgency, intent.

Where a classical model would need labeled data, a small / cheap LLM with a good prompt often gets you to "good enough" the first day.

In-Product Agents

"Do this for me." The agent navigates the app, fills forms, calls APIs, shows the user the result. New, risky, and high-leverage. Examples: Linear's draft-issue agent, GitHub Copilot Workspace, Notion AI.

Architecture

Building Blocks & Concerns

The Typical Stack

User → UI (streaming) → Your API
              → Prompt template + context
              → Framework orchestration
              → Retrieval from Vector DB
              → Tool calls (your APIs, search, DB)
              → Provider (Claude / GPT / Gemini)
              → Stream tokens back

Latency & UX

Stream tokens — the user sees output start in <500ms, hiding the rest.
Optimistic UI — show "thinking…" with shape indicators.
Progressive disclosure — outline first, details on demand.
Cache aggressively — repeated questions / system prompts benefit from prompt caching.

Safety & Guardrails

Input filters — strip PII, block prompt injection patterns.
Output validators — JSON schema, allowed-topic checks, profanity filters.
Tool authorization — never let the model call write APIs unscoped.
Rate limits per user — both abuse prevention and cost control.
Tools: NeMo Guardrails, Guardrails AI, Lakera, Protect AI.

Evals & Quality

Before launch, build a small (50–500 examples) golden dataset. Score offline:

Exact match for structured tasks.
LLM-as-judge for free-form (with caveats).
Human review for nuanced subjective quality.
Tools: Promptfoo, Braintrust, Langfuse, OpenAI Evals, DeepEval.

Cost & Pricing

Track tokens per user / per feature — surprises are common.
Tier model use — Haiku/Flash/Mini for simple, Sonnet/Pro for hard.
Cap per request — max_tokens protects against runaway responses.
Prompt caching can cut bills 70–90% on repeated context.
Pricing model — pass-through, included in plan, usage-based add-on.

Continue

Other AI Workflow Areas

Code Generation Code Review Documentation Debugging Data & Insights ↑ Back to AI Landscape