PrivacyApril 25, 2026·8 min read

Why every PDF tool on Persimmon runs in your browser

When you upload a PDF to a stranger's server, you're handing them every word, signature, and number on the page. Here's why we refused to build it that way.

Most PDFs you handle in a given month are some of the most sensitive documents you own — tax returns, signed contracts, scanned IDs, medical bills, lease agreements. The default way to merge or split one of those is to upload it to a server you have never heard of, owned by a company you cannot find on a map.

Persimmon does not work that way. Every PDF tool here — merge, split, convert to images, build from images — runs entirely inside the tab you have open right now. The file never leaves your machine. There is no upload, no temporary server-side copy, no cleanup job that may or may not have actually run. The processing happens in JavaScript and WebAssembly, on your CPU, using your memory. We could not access your document if we wanted to.

That decision is not a marketing line. It shaped how the entire site is built, and it ruled out a handful of features we would otherwise have shipped. The rest of this post is the honest reasoning behind it.

The default trust model is broken

Search for “merge PDF online” and the first ten results will all do the same thing. You drag a file in. It hits an HTTP endpoint. A server somewhere reads the file, processes it, writes the output to a temp directory, returns a download URL, and — according to the privacy policy — deletes everything an hour later.

The deletion claim is the load-bearing piece of trust in that whole flow, and there is no way to verify it. You are taking the operator at their word that:

The processing host actually runs the deletion job on schedule.
Backups, replicas, and CDN caches do not retain a copy beyond the stated window.
No employee, contractor, or third-party SDK with file system access ever read the document while it was on disk.
The transfer itself was not silently logged by an analytics or advertising script running in the same tab.

Most PDF-tool sites are funded by ads, and the ad networks they embed are extremely aggressive about profiling. The combination of “here is a file with your full legal name and address” and “here is an ad SDK” on the same page is a privacy posture that should make anyone uneasy. Even if no individual actor has bad intent, the architecture stacks the risks against the person uploading the file.

For a free, ad-supported tool site, the only honest answer is to design the upload out of the system entirely. If the file never moves, none of those failure modes can happen.

What changed: WebAssembly and pdf-lib

Browser-side PDF manipulation was not seriously practical until pretty recently. PDFs are a structured binary format with their own internal compression, font tables, and reference graphs. Doing real work on them — not just rendering, but editing — used to require either a native binary or a server-side process running PDFium, Ghostscript, or Apache PDFBox.

Two things changed that. First, WebAssembly stabilized in browsers around 2019, which gave the JavaScript runtime a near-native execution target for the kind of buffer-heavy work PDF manipulation requires. Second, a small handful of well-maintained pure-JS libraries reached production quality:

pdf-lib — pure JavaScript, no WASM, handles structural manipulation: merging files, removing pages, embedding fonts and images, filling form fields. Roughly 4MB unminified, ships as a single npm package.
pdfjs-dist — the library Mozilla uses to render PDFs inside Firefox itself. Does the heavy lifting when we need to rasterize a page to an image. Much larger, but only loaded on the tools that actually need it.
jsPDF — the older library; still useful for generating PDFs from scratch on the client.

With those three, every PDF feature on Persimmon is reachable. None of them require a server. None of them require a plugin. They run on the same JavaScript engine that ran every other tab you have open.

What an in-browser merge actually looks like

The full merge implementation is small. After the user drops their files, the relevant logic looks something like this:

typescript

import { PDFDocument } from "pdf-lib";

async function mergePdfs(files: File[]): Promise<Blob> {
  const merged = await PDFDocument.create();

  for (const file of files) {
    const bytes = await file.arrayBuffer();
    const src = await PDFDocument.load(bytes, { ignoreEncryption: false });
    const pages = await merged.copyPages(src, src.getPageIndices());
    pages.forEach((p) => merged.addPage(p));
  }

  const out = await merged.save();
  return new Blob([out], { type: "application/pdf" });
}

Every line of that runs locally. file.arrayBuffer() reads the bytes off disk through the browser's File API. PDFDocument.load parses the cross-reference table and object stream in memory. copyPages and addPage wire up the new document graph. The resulting Blob gets handed to URL.createObjectURL, which generates a local URL the browser can serve as a download. Nothing in that path crosses a network boundary.

Open the network tab

If you do not believe us, open the DevTools network tab on persimmon.tools/pdf-merge, drop in a file, and run the merge. You will see exactly zero outbound requests with the file as a payload. The same is true for split, convert, and every other PDF tool on the site.

The honest trade-offs

Refusing to use a server is not free. There are real things that client-side PDF tooling cannot do well in 2026, and we would rather be specific about them than pretend the trade does not exist.

Memory ceilings. Most browsers cap a single typed array at around 2GB, and practical performance starts to degrade well before that. A 1.5GB scanned-document PDF will work; a 4GB file will not. For most personal documents, this is a non-issue. For some archival workflows, it is genuinely a wall.

OCR is heavy. Tesseract.js exists and runs in the browser, but the model files are tens of megabytes, the cold start is slow, and accuracy on photographed documents is well below what a cloud service can offer. We have chosen not to build a client-side OCR tool yet; if you need to extract text from scanned pages, a purpose-built service is currently the better answer.

No background processing. When a server processes a merge, it can email you the result an hour later. The browser cannot. If you close the tab, the work is gone. For 99% of the use cases these tools cover, this is fine — the operations finish in seconds.

Old browsers. We do not support Internet Explorer, which has been retired since 2022, or browsers that lack a working WebAssembly runtime and the modern File API. That covers roughly 0.1% of global traffic.

Privacy as architecture, not policy

The thing we want to be clear about is that this is structural, not promised. A privacy policy is a written guarantee that an operator will behave a certain way; the operator can change it, and you have no real way to audit whether they followed it last Tuesday.

An architectural guarantee is different. It is a property of how the code runs. When the file never leaves your device, no future change to a privacy policy can retroactively expose what you uploaded yesterday, because there is no “uploaded” at all. The deletion job cannot fail because there is nothing to delete.

If the document is sensitive enough that you would not put it on a cloud drive without a serious reason, you should also not be uploading it to a free PDF tool. Whatever the homepage says about encryption, the file landed on someone's disk for at least long enough to be processed, and the operator chose to write the code so that this had to happen. That choice is reversible. We reversed it.

What this means in practice

Pick any of the PDF tools on the site, drop in something you would rather not share with strangers, and run it. The file does not move. The result is built on your machine, downloaded directly from your machine, and forgotten the moment you close the tab. That is the whole product.

If you find a case where this guarantee breaks down — a tool that does make a network call with your file as the payload — we would consider that a bug, not a feature, and we would like to hear about it. The contact page has an email. We read every message.

Keep reading

Design

The math behind WCAG contrast checking

Why “4.5:1” is not a slider value, why averaging RGB does not work, and what the new APCA model fixes.

Engineering

Choosing v4 UUIDs vs ULIDs vs nanoid

Three identifier formats, three different bets on what matters. Pick the wrong one and the cost shows up months later as either ugly URLs or a slow database.