Documents — upload PDFs, DOCX, Markdown, HTML, text

The Documents page is for everything that isn't on your website but is a source of truth — internal SOPs, contracts, policy PDFs, warranty terms, technical specs. Drop the file in, we parse it into text chunks, embed each chunk for retrieval, and from then on Wilow treats the contents as searchable knowledge alongside your snippets and crawled pages.

What you can upload

We accept these formats:

  • PDF (.pdf) — text-based PDFs work best. Scanned image-only PDFs need OCR upstream; we don't OCR for you.
  • Word (.docx, .doc) — modern Word docs preferred (.docx).
  • Plain text (.txt)
  • Markdown (.md)
  • HTML (.html)

If you upload something else, the file is rejected on the spot — the parsers don't try and guess.

How to upload

  1. Admin → Knowledge → Documents.
  2. Upload document.
  3. Drag a file onto the dropzone, or click to pick from your filesystem.
  4. Wait. The status goes uploadedprocessingready (or failed).

You can upload multiple files in a row; each one is processed independently. The list polls every few seconds while anything is processing so you don't have to refresh.

What the pipeline does

Once a file lands:

  1. Extract — pull text out (PDF text layer, DOCX content, Markdown/HTML body). Layout is discarded; we keep paragraph structure but not visual styling.
  2. Chunk — split into retrieval-sized pieces (usually a few hundred tokens each). Chunk boundaries respect headings and paragraph breaks so a chunk reads as a coherent excerpt.
  3. Embed — every chunk gets an embedding vector for semantic search. This is what lets Wilow find the right paragraph when a visitor asks an indirect question.

When the file shows ready, every chunk is live. Wilow can pull from it on the next visitor message.

Editing chunks

Click into a document to see its chunks. Each one is editable inline — fix a typo, trim a noisy header/footer that the PDF parser pulled in, or rewrite for clarity. Saving an edit re-embeds just that chunk; nothing else moves.

Use this sparingly. The pipeline mostly gets it right and editing hundreds of chunks per document is busywork. The case it's worth: a big document with one or two specific paragraphs that the bot is quoting wrong, or one chunk whose wording is causing confusion.

When uploads fail

A failed status surfaces the parser error inline. The common cases:

  • Scanned PDF — no text layer. Re-export as text-based PDF, or pre-OCR with your tool of choice.
  • Encrypted PDF — password-protected. Decrypt before uploading.
  • Corrupted file — the PDF was truncated mid-export. Re-export and try again.
  • .doc with weird embedded objects — old Word docs with embedded Excel sheets often confuse the parser. Save as .docx or paste the content into a fresh file.

Hit Reprocess to retry — useful if the failure was transient (e.g. the embedding provider hiccupped).

Replacing or deleting

  • Replace — delete the old document and upload the new file. The pipeline doesn't do in-place updates; replacement is a delete+upload.
  • Delete — confirmation dialog tells you how many chunks will be lost. Permanent; no undo. The bot stops referencing the document on next request.

Documents vs. snippets vs. crawl

  • Snippets — short, hand-curated Q&As. You write them; you own the wording. Best for the questions you know visitors ask.
  • Documents — long-form source-of-truth files. The bot extracts facts from them on demand.
  • Crawl — your public website, scraped page-by-page.

You don't have to pick one. Most accounts use all three; the bot weights what it returns based on the question.

Common questions

  • What formats can I upload? PDF, DOCX, DOC, TXT, MD, HTML.
  • My document says "failed". Click the row to see the parser error. Most failures are scanned-only PDFs or encryption — see troubleshooting.
  • How long does processing take? Seconds for small text files; a minute or two for a 200-page PDF. Chunking + embedding is the slow step.
  • How do I edit a chunk? Click into the document → click the chunk → edit → save. Re-embedding is automatic.
  • Does deleting a document delete leads/conversations that referenced it? No. Past conversations stay intact; the document just stops appearing as a source for new questions.
  • Can I bulk-upload? Drag multiple files in one go — each is processed independently.
  • Why is my file rejected? It's not in the supported list (PDF, DOCX, DOC, TXT, MD, HTML) or it exceeds the per-file size limit (which you'll see on the upload prompt).

See also knowledge base for snippets, crawl for your website, and merge proposals for the dedup queue.

Where to find us

Stuck? Email [email protected].