Speak rambling. Ship clean prompts.
Locally.
Your voice is yours. Local Whisper transcribes; an on-device LLM reshapes the rambling into clean prose, bullets, or Markdown — all on your GPU.
Speak on one machine. Paste into the agent on another.
LAN-only · multi-device · v1.x
Voice to text · on device · private
How it works
From hotkey to clean paste, in seconds.
The overlay shows up wherever you're typing. Speech recognition runs on your own GPU. A second hotkey reshapes the raw transcript — and turns it into a real prompt for whichever AI you're building with.
Privacy isn't a setting. It's the path your audio takes.
Most dictation tools route your voice through their servers, their models, and their disks. VocaPulse doesn't. Compare the two paths — the difference is structural, not a policy you're asked to trust.
Cloud dictation
typical path
You speak
Audio uploaded
leaves your machine
Vendor server
us-east-1 / unknown region
Their transcription model
Their LLM cleanup
Stored on their disk
subject to their retention
Text returned to you
VocaPulse
local path
You speak
Audio in RAM
volatile, never written
Whisper · on your GPU
Structuring LLM · on your GPU
Text in your editor
Audio discarded
frame-by-frame, in memory
Network is used elsewhere — app updates, model downloads, license check, and (opt-in) end-to-end-encrypted device sync.
From the founder
Manuel Evers
Nürnberg, 2026
I caught myself dictating something private into a tool whose privacy posture I didn't actually understand. Audio was clearly leaving my machine. The music I had on auto-paused mid-sentence and never resumed. The cherry on top: it pasted into whichever window happened to have focus — except the way I work is Claude Code on the laptop, Cursor on the desktop, half a dozen agents in flight, voice and hands constantly switching frames. The tool wasn't built for that. It was built to ship.
I've spent twenty years close to enterprise software — Atlassian deployments, GDPR audits, the meetings where someone always asks where the data sits. The lesson I took wasn't “enterprises are a market.” It was: privacy as a policy you trust on faith breaks. Privacy as architecture doesn't. That difference is structural, and you can feel it the moment a tool is built one way versus the other.
Modern laptops can run Whisper and a small structuring model on the GPU without breaking a sweat. The cloud round-trip is a business decision, not an engineering one. So I built the version I wanted to use. Audio stays in RAM. Transcription happens on your machine. Structuring happens on your machine. And the vision I'm building toward — because I need it personally — is dictating into the laptop you carry and having the structured output land in the agent on the desktop you're sitting at: peer-to-peer over your LAN, encrypted, no cloud relay.
Honest reason I care: I don't trust how data ages. Not the company that holds it today, not the regulation around it tomorrow, not the model it might quietly train next year. If you flow between agents the way I do — and you'd rather your voice didn't follow — this is for you.
What makes it different
Three reasons it isn't just another voice tool.
Privacy. Structure. Cross-device routing. The first two ship today. The third turns this from a single-machine dictation tool into a voice layer that follows you between your laptop and your desktop — without the cloud round-trip.
Privacy first
Your voice never leaves your machine.
Whisper, the structuring LLM, and the prompt templating all run on your hardware. There is no cloud route. We can't see what you say. We don't want to.
- No telemetry, no analytics
- Audio held in volatile memory only
- Encrypted local history (AES-256-GCM)
Your audio is never transmitted.
$ tcpdump -i any host vocapulse.app → 0 packets during recordingSpeech recognition runs on your hardware.
$ nvidia-smi → whisper model on your GPUStructuring runs on your hardware.
$ nvidia-smi → structure model on your GPU
Verifiable on your own machine. Not a privacy policy.
Network used elsewhere: app updates, model downloads, license check, and (opt-in) end-to-end-encrypted device sync.
Speak rambling. Ship structured.
An on-device LLM rewrites the way you actually talk.
You stop mid-sentence. You self-correct. You think out loud. The structuring model turns all of it into prose, bullet points, or Markdown — before it lands in your editor.
- Three output templates today
- Selectable per hotkey
- Runs on consumer GPUs (4 GB+ VRAM), falls back to CPU
You said
"Um so I think we should probably refactor the auth thing and like add rate limiting maybe, and yeah we definitely should update the tests too"
You get
- – Refactor auth module
- – Add rate limiting
- – Update tests
template: bullets · prose · markdown
Multi-device routing
Coming v1.XSpeak on the laptop you carry. Paste into the agent on the desktop you sit at.
The audio, the transcript, and the structured output never leave your local network. Peer-to-peer, encrypted, no relay through anyone's cloud — yours included. Pair the devices once; pick the destination per hotkey from then on.
- Pick the destination device per hotkey
- End-to-end encrypted on your LAN
- macOS ↔ Linux pairing
MacBook · dictating
recording
⌘ + ⇧ + space
Desktop · pasting
Claude Code
▎structured paste
Audio captured on one machine, structured on its GPU, pasted on the other — over your LAN, never the cloud. Pick the destination per hotkey.
You already think out loud. Now ship the result.
Hit a hotkey. Talk through what you want — fillers, false starts, mid-sentence corrections and all. The on-device LLM turns it into the format you actually need.
What you said
42 seconds · raw
okay so I'm thinking about the auth refactor — the one we talked about last week. I want to, um, pull the rate limiting out of the middleware because right now it's hardcoded and we can't… yeah we can't test it in isolation. So step one is extract that into its own module, then add a config object so the limits are overridable per route, and uhhh I guess the third thing is we need to actually write tests for the new module because the old ones are bound to the middleware shape.
Hotkey released — structuring on your GPU
- – Extract rate limiting from auth middleware into its own module
- – Add config object: per-route limit overrides
- – Rewrite tests against the new module shape
Refactor the auth module by extracting rate limiting into a standalone module, exposing a config object so per-route limits can be overridden, then rewriting the tests against the new module shape rather than the middleware.
You are an expert TypeScript engineer. Refactor the auth middleware: (1) extract rate limiting into a standalone module, (2) expose a config object for per-route overrides, (3) rewrite tests against the new module shape. Keep the public middleware API stable.
Three templates ship today. More are coming — and the persona-templated prompts in v1.X turn this into the voice layer for whatever AI you build with.
Join the waitlistGet in before public launch.
Two ways in. Pick one — or just join the list and decide later.
Beta · 25 spots · free
Pre-release builds in Q3 2026. You test; we listen and ship fixes weekly. Free during beta; six months Pro on the house when you graduate.
Early access · €7.99 / month
Paid product before public launch. Locked at €7.99 for the first 12 months — even after public launch raises the price to €9.99.
Public launch · €9.99 / month
Q3 / Q4 2026. Sign up after launch and this is your starting price.
No card required to join. We email you once when beta opens, once when early access opens, and never for anything else without asking.
Frequently asked
Answers, before you join the list.
No. Microphone capture, speech recognition, and structuring all run locally on your machine. Voice samples exist only in volatile memory during processing and are discarded immediately after. Your audio and your transcripts never reach any server. The desktop app does open a small set of explicitly-scoped network connections — none of which carry voice or transcript content: license verification (~once a day, sends a peppered hash of your machine-ID and an opaque device token), device activation when you set up a new machine, application update checks, on-demand language-model downloads, support tickets you actively submit, and (planned, opt-in) end-to-end-encrypted device sync. Each is documented in /datenschutz §8 and §9.