Knowledge base
Wilow answers from your content. There are three ways to feed it that content — pick whichever matches your situation, mix freely. The bot doesn't care where a fact came from; it just retrieves the relevant pieces and grounds its reply on them.
Three sources, one bot
| Source | Where in admin | What it's good for |
|---|---|---|
| Snippets | Knowledge → ⊕ New snippet | Short, hand-curated facts. Hours, return policy, shipping rules, common one-liners. |
| Documents | Documents | PDFs / Markdown / plain text. Manuals, T&Cs, FAQs you already have written. |
| Crawls | Website crawl | Pull from a public URL — your help center, marketing pages, blog. Hands-off but coarser than the other two. |
All three flow into the same retrieval pipeline. When a visitor asks something, Wilow pulls the most relevant pieces from across all three and uses them to answer. You don't have to pick one.
When to use which
The decision tree:
- One sentence answer? Snippet. Fastest to author, easiest to edit, the bot grounds on it cleanly.
- You already have it as a PDF / .md file? Document. We chunk it for you.
- It's a public web page that's already correct? Crawl. Set it and forget — the auto-sync option re-pulls on a schedule.
- It's longer than a paragraph but shorter than a doc? Either snippet or document — author preference. Many people paste a 500-word section straight into a snippet.
If the same fact lives in two places (a snippet and a doc, say), the bot may cite either — both are valid sources. That's fine, but if one is stale, prune it.
Snippets
The fastest path. Knowledge → ⊕ New snippet opens a title + content editor. The title is just for your admin view; the bot reads only the content (plus the title as a soft hint).
Active snippets are searched at every turn. Inactive ones stay in the list but don't get retrieved — useful for seasonal content (holiday hours) you want to keep around but not surface today.
A snippet works best when it answers one question. "Refund policy" with the actual policy text inside beats "Everything about returns" with seven sub-topics — the retrieval pipeline scores each piece independently, and a focused snippet scores higher on the matching question than a kitchen-sink one does.
Documents
Upload PDFs, Markdown, or plain text under Documents → ⊕ Upload. We extract text, chunk it into ~500-token pieces with overlap, and index each chunk. The bot retrieves chunks, not whole documents — so a 50-page manual is fine; the bot pulls just the relevant 1–2 chunks per turn.
Files we accept: .pdf, .docx, .doc, .txt, .md, .html. Limits: 20 MB per file; the per-account file count cap depends on your workspace — talk to us if you need it raised. Scanned-image PDFs without an OCR layer won't extract text — run them through OCR first.
Edit individual chunks from Knowledge → Documents — click into a row to see its chunks — if you spot extraction artifacts (page-header noise, broken hyphenation, etc.). Disabling a chunk leaves the document intact but removes that one piece from retrieval.
Crawls
Website crawl → ⊕ New crawl takes a starting URL, walks links within the same registrable domain, extracts the readable content of each page, and turns it into chunks indexed the same way as documents. Good for pulling a help center or marketing site without copy-pasting page by page.
Sensible defaults out of the box: respects robots.txt, follows internal links only, throttles to one request per second, caps total pages.
Auto-sync re-runs the crawl on a schedule (daily/weekly). Pages that changed get re-indexed; pages that vanished from the source get pruned from your knowledge base. Turn it on if your help center is a living thing, off if you crawled once and never want it touched again.
The chunk-level edit affordance applies to crawled pages too — if a page is mostly noise (cookie banner text, footer chrome) you can disable that chunk.
How retrieval picks what to cite
For each visitor turn, Wilow pulls roughly the top-5 most relevant chunks across snippets + document chunks + crawled pages, sends them to the model alongside the question, and asks the model to ground its reply on them. Citations the visitor sees are the chunks the cross-encoder rates as actually answering the question — not all 5 retrieved.
Two consequences worth knowing:
- More content isn't always better. If your KB is full of marginally related material, retrieval has to fight harder to find the right chunk and the model may ground on the wrong one. Prune ruthlessly.
- The bot can refuse. If the top retrieved pieces aren't relevant enough, the model declines politely and (when configured) suggests handoff. That's the desired behavior — better a "I don't know, let me connect you" than a confidently wrong answer pulled from a tangentially-related chunk.
When the bot doesn't seem to use your content
The most common cause is wording: you wrote the snippet using your internal vocabulary, the visitor asks using theirs, and the embedding similarity is too low. Two cheap fixes:
- Rewrite the snippet with the visitor's likely phrasing. If visitors say "is shipping free over $50", don't title the snippet "Threshold conditions for unconditional logistical fulfilment". Use their words.
- Add a few rephrasings to the content. "Free shipping over $50. Yes, shipping is free if your order totals more than $50." Costs 20 tokens and rescues the retrieval on the alternate phrasing.
The Knowledge gaps page (admin) flags questions Wilow couldn't answer well — start there if you're hunting holes.
Pitfalls
- Embeddings rebuild on edit. When you edit a snippet or chunk, we re-embed it on save. There's a brief window (seconds) where retrieval may still hit the old embedding; refresh once to confirm the change took.
- Inactive ≠ deleted. Disabling something keeps the row but removes it from retrieval. Deleting is permanent.
- Don't paste secrets. Snippets are visible to anyone with admin access to your account. API keys, internal URLs, real customer data — keep those out.