Speak rambling. Ship clean prompts.
Locally.

Your voice is yours. Local Whisper transcribes; an on-device LLM reshapes the rambling into clean prose, bullets, or Markdown — all on your GPU.

next  ↗

Speak on one machine. Paste into the agent on another.

LAN-only · multi-device · v1.x

Voice to text · on device · private

How it works

From hotkey to clean paste, in seconds.

The overlay shows up wherever you're typing. Speech recognition runs on your own GPU. A second hotkey reshapes the raw transcript — and turns it into a real prompt for whichever AI you're building with.

login.tsx
Ready
Ctrl+Shift+Space
press to record
F9
restructure
coder template·ready to paste intoCursor

Privacy isn't a setting. It's the path your audio takes.

Most dictation tools route your voice through their servers, their models, and their disks. VocaPulse doesn't. Compare the two paths — the difference is structural, not a policy you're asked to trust.

Cloud dictation

typical path

  1. You speak

  2. Audio uploaded

    leaves your machine

  3. Vendor server

    us-east-1 / unknown region

  4. Their transcription model

  5. Their LLM cleanup

  6. Stored on their disk

    subject to their retention

  7. Text returned to you

VocaPulse

local path

  1. You speak

  2. Audio in RAM

    volatile, never written

  3. Whisper · on your GPU

  4. Structuring LLM · on your GPU

  5. Text in your editor

  6. Audio discarded

    frame-by-frame, in memory

Network is used elsewhere — app updates, model downloads, license check, and (opt-in) end-to-end-encrypted device sync.

From the founder

Manuel Evers

Nürnberg, 2026

I caught myself dictating something private into a tool whose privacy posture I didn't actually understand. Audio was clearly leaving my machine. The music I had on auto-paused mid-sentence and never resumed. The cherry on top: it pasted into whichever window happened to have focus — except the way I work is Claude Code on the laptop, Cursor on the desktop, half a dozen agents in flight, voice and hands constantly switching frames. The tool wasn't built for that. It was built to ship.

I've spent twenty years close to enterprise software — Atlassian deployments, GDPR audits, the meetings where someone always asks where the data sits. The lesson I took wasn't “enterprises are a market.” It was: privacy as a policy you trust on faith breaks. Privacy as architecture doesn't. That difference is structural, and you can feel it the moment a tool is built one way versus the other.

Modern laptops can run Whisper and a small structuring model on the GPU without breaking a sweat. The cloud round-trip is a business decision, not an engineering one. So I built the version I wanted to use. Audio stays in RAM. Transcription happens on your machine. Structuring happens on your machine. And the vision I'm building toward — because I need it personally — is dictating into the laptop you carry and having the structured output land in the agent on the desktop you're sitting at: peer-to-peer over your LAN, encrypted, no cloud relay.

Honest reason I care: I don't trust how data ages. Not the company that holds it today, not the regulation around it tomorrow, not the model it might quietly train next year. If you flow between agents the way I do — and you'd rather your voice didn't follow — this is for you.

What makes it different

Three reasons it isn't just another voice tool.

Privacy. Structure. Cross-device routing. The first two ship today. The third turns this from a single-machine dictation tool into a voice layer that follows you between your laptop and your desktop — without the cloud round-trip.

Privacy first

Your voice never leaves your machine.

Whisper, the structuring LLM, and the prompt templating all run on your hardware. There is no cloud route. We can't see what you say. We don't want to.

  • No telemetry, no analytics
  • Audio held in volatile memory only
  • Encrypted local history (AES-256-GCM)
verify yourself · audio pipeline
  • Your audio is never transmitted.

    $ tcpdump -i any host vocapulse.app0 packets during recording
  • Speech recognition runs on your hardware.

    $ nvidia-smiwhisper model on your GPU
  • Structuring runs on your hardware.

    $ nvidia-smistructure model on your GPU

Verifiable on your own machine. Not a privacy policy.

Network used elsewhere: app updates, model downloads, license check, and (opt-in) end-to-end-encrypted device sync.

Speak rambling. Ship structured.

An on-device LLM rewrites the way you actually talk.

You stop mid-sentence. You self-correct. You think out loud. The structuring model turns all of it into prose, bullet points, or Markdown — before it lands in your editor.

  • Three output templates today
  • Selectable per hotkey
  • Runs on consumer GPUs (4 GB+ VRAM), falls back to CPU

You said

"Um so I think we should probably refactor the auth thing and like add rate limiting maybe, and yeah we definitely should update the tests too"

You get

  • – Refactor auth module
  • – Add rate limiting
  • – Update tests

template: bullets · prose · markdown

Multi-device routing

Coming v1.X

Speak on the laptop you carry. Paste into the agent on the desktop you sit at.

The audio, the transcript, and the structured output never leave your local network. Peer-to-peer, encrypted, no relay through anyone's cloud — yours included. Pair the devices once; pick the destination per hotkey from then on.

  • Pick the destination device per hotkey
  • End-to-end encrypted on your LAN
  • macOS ↔ Linux pairing
your network · two devices

MacBook · dictating

recording

⌘ + ⇧ + space

LAN
encrypted

Desktop · pasting

Claude Code

▎structured paste

Audio captured on one machine, structured on its GPU, pasted on the other — over your LAN, never the cloud. Pick the destination per hotkey.

You already think out loud. Now ship the result.

Hit a hotkey. Talk through what you want — fillers, false starts, mid-sentence corrections and all. The on-device LLM turns it into the format you actually need.

What you said

42 seconds · raw

okay so I'm thinking about the auth refactor — the one we talked about last week. I want to, um, pull the rate limiting out of the middleware because right now it's hardcoded and we can't… yeah we can't test it in isolation. So step one is extract that into its own module, then add a config object so the limits are overridable per route, and uhhh I guess the third thing is we need to actually write tests for the new module because the old ones are bound to the middleware shape.

Hotkey released — structuring on your GPU

bullets→ paste into Linear, Notion, your own notes
  • – Extract rate limiting from auth middleware into its own module
  • – Add config object: per-route limit overrides
  • – Rewrite tests against the new module shape
prose→ paste into a PR description, an email, a doc

Refactor the auth module by extracting rate limiting into a standalone module, exposing a config object so per-route limits can be overridden, then rewriting the tests against the new module shape rather than the middleware.

coder prompt→ paste into Cursor, Claude, Gemini, ChatGPT

You are an expert TypeScript engineer. Refactor the auth middleware: (1) extract rate limiting into a standalone module, (2) expose a config object for per-route overrides, (3) rewrite tests against the new module shape. Keep the public middleware API stable.

Three templates ship today. More are coming — and the persona-templated prompts in v1.X turn this into the voice layer for whatever AI you build with.

Join the waitlist

Get in before public launch.

Two ways in. Pick one — or just join the list and decide later.

  • Beta · 25 spots · free

    Pre-release builds in Q3 2026. You test; we listen and ship fixes weekly. Free during beta; six months Pro on the house when you graduate.

  • Early access · €7.99 / month

    Paid product before public launch. Locked at €7.99 for the first 12 months — even after public launch raises the price to €9.99.

  • Public launch · €9.99 / month

    Q3 / Q4 2026. Sign up after launch and this is your starting price.

No card required to join. We email you once when beta opens, once when early access opens, and never for anything else without asking.

Frequently asked

Answers, before you join the list.

No. Microphone capture, speech recognition, and structuring all run locally on your machine. Voice samples exist only in volatile memory during processing and are discarded immediately after. Your audio and your transcripts never reach any server. The desktop app does open a small set of explicitly-scoped network connections — none of which carry voice or transcript content: license verification (~once a day, sends a peppered hash of your machine-ID and an opaque device token), device activation when you set up a new machine, application update checks, on-demand language-model downloads, support tickets you actively submit, and (planned, opt-in) end-to-end-encrypted device sync. Each is documented in /datenschutz §8 and §9.