AI Workflow · 5 of 6

AI Product Features

Embedding AI in the product, not just the dev process. Chatbots, semantic search, summarization, recommendations, generation — the layer where AI actually meets the end user.

ChatbotsSemantic SearchSummarizationRecommendationsWorkflow 5
← Back to AI Landscape
Quick Facts

At a Glance

Basic Concepts

  • "AI feature" rarely means a raw chatbot — usually it's a focused capability inside an existing flow.
  • Latency & cost are first-class design concerns, not afterthoughts.
  • Evals before launch. If you can't measure it, don't ship it.
  • Streaming UX hides latency. Most user-facing AI features stream tokens.
  • Have a non-AI fallback — model APIs go down or get rate-limited.
Patterns

The Common AI Features

Chatbots & Assistants

The default first AI feature most products ship. Variants:

  • Support bot — RAG over docs + ticket history; deflects tier-1 tickets.
  • In-product assistant — knows the user's account context, can take actions.
  • Persona / character chatbots — entertainment, companionship.

Hard parts: scoping ("don't answer off-topic"), safety, escalation to humans, conversation memory.

Semantic / "Natural Language" Search

Replace keyword search with embeddings — users ask questions, you return relevant docs/products/items. Often combined with re-ranking and metadata filters. Backed by a vector DB.

Best as hybrid (keyword + semantic) — pure semantic loses on rare exact terms (SKU, error code, names).

Summarization
  • Long documents → executive summary.
  • Meeting transcripts → action items.
  • Email threads → "what changed?"
  • Comments / reviews → sentiment + themes.

Often the easiest AI win — high user value, simple prompt, predictable cost.

Recommendations & Personalization

Two flavors today:

  • Classical: matrix factorization / two-tower / GBT — still wins on most marketplaces.
  • LLM-augmented: generate explanations ("Because you liked X…"), re-rank with reasoning, conversational discovery.
Generation (Text, Image, Audio, Code)
  • Marketing copy & product descriptions.
  • Image generation — banners, thumbnails, style transfer (Stable Diffusion, Flux, DALL-E, Imagen).
  • Voice / TTS — ElevenLabs, OpenAI TTS, Cartesia.
  • Code-as-feature — Notion AI in formulas, Excel Copilot.
Classification & Extraction

Often the most cost-effective LLM use:

  • Tag tickets, route emails, label content.
  • Extract structured data from PDFs / contracts / invoices.
  • Score sentiment, urgency, intent.

Where a classical model would need labeled data, a small / cheap LLM with a good prompt often gets you to "good enough" the first day.

In-Product Agents

"Do this for me." The agent navigates the app, fills forms, calls APIs, shows the user the result. New, risky, and high-leverage. Examples: Linear's draft-issue agent, GitHub Copilot Workspace, Notion AI.

Architecture

Building Blocks & Concerns

The Typical Stack
User → UI (streaming) → Your API
              → Prompt template + context
              → Framework orchestration
              → Retrieval from Vector DB
              → Tool calls (your APIs, search, DB)
              → Provider (Claude / GPT / Gemini)
              → Stream tokens back
Latency & UX
  • Stream tokens — the user sees output start in <500ms, hiding the rest.
  • Optimistic UI — show "thinking…" with shape indicators.
  • Progressive disclosure — outline first, details on demand.
  • Cache aggressively — repeated questions / system prompts benefit from prompt caching.
Safety & Guardrails
  • Input filters — strip PII, block prompt injection patterns.
  • Output validators — JSON schema, allowed-topic checks, profanity filters.
  • Tool authorization — never let the model call write APIs unscoped.
  • Rate limits per user — both abuse prevention and cost control.
  • Tools: NeMo Guardrails, Guardrails AI, Lakera, Protect AI.
Evals & Quality

Before launch, build a small (50–500 examples) golden dataset. Score offline:

  • Exact match for structured tasks.
  • LLM-as-judge for free-form (with caveats).
  • Human review for nuanced subjective quality.
  • Tools: Promptfoo, Braintrust, Langfuse, OpenAI Evals, DeepEval.
Cost & Pricing
  • Track tokens per user / per feature — surprises are common.
  • Tier model use — Haiku/Flash/Mini for simple, Sonnet/Pro for hard.
  • Cap per requestmax_tokens protects against runaway responses.
  • Prompt caching can cut bills 70–90% on repeated context.
  • Pricing model — pass-through, included in plan, usage-based add-on.
Continue

Other AI Workflow Areas