Merge proposals — review snippet dedup suggestions
When the crawl or another ingestion path tries to add a new snippet that looks a lot like one you already have, Wilow doesn't silently overwrite or silently double-store. Instead it queues a merge proposal: side-by-side existing vs. incoming, with a similarity score, and a decision for you to make.
Without this queue, repeated re-crawls or document re-uploads would double up your knowledge base over time. With it, every dedup decision is yours.
When proposals show up
The most common triggers:
- Re-crawling your site after some pages changed — the new text is similar to the existing snippet but not identical.
- Uploading documents whose content overlaps with already-crawled pages.
- LLM-extracted snippets from a fresh crawl that overlap with hand-written snippets you wrote earlier.
A pending proposal is flagged with a count badge in the sidebar so you don't lose track.
What you see per row
- Existing snippet — what's in your knowledge base today.
- Incoming snippet — what the new pipeline wants to add.
- Similarity — a percentage. Higher = more alike. The threshold for triggering a proposal is tuned to surface real overlap and avoid false positives.
- Source — where the incoming snippet came from (crawl URL, document name, etc.).
Your options per proposal
Pick one — they're mutually exclusive:
- Keep existing — discard the incoming snippet. Use when your curated wording is better and the incoming version doesn't add new information.
- Replace with new — discard the existing snippet and store the incoming one. Use when the new content is fresher or more correct.
- Merge — concatenate the two. Use when both have unique bits worth keeping. The merged snippet will be longer; check it's not becoming a wall of text that loses focus.
- Reject incoming — same effect as Keep existing. Surfaces in the audit log as an explicit "no" rather than an implicit one.
After deciding, the proposal disappears from the queue.
When to merge vs. replace
Rule of thumb:
- Replace — the page got updated; the new content supersedes the old. Default for re-crawls of pages whose facts changed (pricing page, address change).
- Merge — two different angles on the same topic, both useful. E.g. the existing snippet is "what we do" and the incoming is "how it works".
- Keep existing — you carefully hand-wrote the existing snippet and don't want LLM-extracted prose to overwrite your voice.
If you can't decide, Keep existing is the safer default — you're preserving curation, and you can always edit the existing snippet manually if you want to fold in the new content.
What if I just ignore the queue?
Nothing breaks. Pending proposals stay pending. The existing snippet keeps serving; the incoming snippet is parked. The downside is your queue grows over time and you have a useful signal — "this content is near-duplicate" — that nobody is acting on. A weekly sweep is plenty of frequency.
Common questions
- What's a merge proposal? A suggestion to dedup a near-duplicate snippet that just got ingested.
- Where do proposals come from? Mostly re-crawls and document uploads. See crawl and documents.
- What does the similarity percentage mean? How alike the two snippets are by embedding distance. The threshold is tuned to flag real overlap.
- Can I review proposals in bulk? No — each is a discrete decision because the right answer differs per pair. Walk the list one at a time.
- Does ignoring the queue break anything? No, but pending proposals don't go away — your queue builds up, and the dedup signal goes unused.
- Can I undo a decision? No — once you click an action, the proposal is resolved. The merged/replaced snippet is editable manually from the knowledge base page if you change your mind.
See also knowledge base for direct snippet editing, crawl, and documents.
Where to find us
Stuck? Email [email protected].