How your data flows, and who else touches it
A note on honesty: this page describes what the product does today. Where a safeguard or option is not built yet, we say so rather than implying it exists. If anything here is unclear, email demo@lexicanon.com.
Plain-language glossary
- Sub-processor
- An outside company we pass some of your data to so we can provide the service — for example the speech-to-text service that turns audio into words.
- Transcription (speech-to-text)
- Turning the recorded audio into written text, and labelling who spoke.
- AI model (LLM)
- The "large language model" that reads the transcript and writes the summary, decisions and action items.
- BYOK — "Bring Your Own Key"
- You use your own account with a transcription or AI provider. The data goes to your account under your contract with that provider, not ours.
- Self-hosted
- You run Lexicanon on your own servers instead of ours, so the storage and most processing stay inside your own infrastructure.
- Voiceprint
- A set of numbers (a mathematical "fingerprint" of a voice) used to recognise the same speaker across meetings. It is not a recording, and it never leaves your workspace.
- Workspace (organisation)
- Your company's private area. Data in one workspace is never visible to another.
The three ways Lexicanon can run
How much data leaves your control depends on which setup you choose.
In every mode the steps are the same — only where they run and whose provider account is used changes.
The journey of your data, step by step
- 1 · Capture. Your browser or desktop app records the meeting audio. No bot joins the call.
- 2 · Transcribe. The audio is turned into text, with a label for who spoke, by a speech-to-text service. In a self-hosted setup this can run on your own server, so the audio never leaves it.
- 3 · Live insights. While the meeting runs, the AI model produces a short, rolling summary so you can follow along.
- 4 · Write-up (when you stop). The finished transcript is sent to the AI model once more to produce the structured result — the summary, decisions and action items.
- 5 · Store. The transcript, the result, and (optionally) the compressed audio are saved inside your workspace. Only members of your workspace can see them.
- 6 · Read & export. You open or export the result from the app whenever you need it.
Who else touches your data (sub-processors)
Below is every outside service, grouped by what it does. "Trains on your data?" means: could this provider use your content to improve their own AI? "Where" is where the data is processed. "How to avoid it" shows how you can keep that data out of the service entirely.
| Service | Where it runs | Trains on your data? | How to avoid it |
|---|---|---|---|
| Speechmatics | European Union (Ireland) | Not publicly stated. Acts as a data processor under GDPR; we rely on their DPA. | Use BYOK or a different provider. |
| Microsoft Azure Speech | EU region (selectable) | No. Microsoft does not use it to train its models. | BYOK; pick your region. |
| Soniox | EU region available | No, and stores nothing by default. | BYOK. |
| Deepgram | European Union (EU endpoint) | No — we switch off their model-improvement program on every request. | BYOK. |
| AssemblyAI | European Union (EU endpoint) | By default, yes — and we're still completing the opt-out (a manual account-level request). Until that's done, prefer a no-training provider above for sensitive content. | BYOK; or use a no-training provider above. |
| Local transcription (runs on the server hosting Lexicanon) | Your own server (self-hosted) | No — the audio never leaves your server. | This is the avoid-everything option. |
| Service | Where it runs | Trains on your data? | How to avoid it |
|---|---|---|---|
| Anthropic (Claude) | United States (EU contracting entity for EEA customers) | No — contractually prohibited from training on what you send via the API. | BYOK; or pick another model. |
| OpenAI | United States (an EU endpoint exists under additional agreement) | No — API data is not used to train its models by default. | BYOK; or pick another model. |
| OpenRouter (a router that forwards to a model you choose) | Depends on the model it routes to | OpenRouter itself: no by default. The underlying model provider depends on your routing settings. | Use a direct provider (Anthropic/OpenAI) instead. |
| Service | What it does | Where |
|---|---|---|
| Hetzner | Hosts our managed service and stores your data. | Germany (EEA) |
| Cloudflare | DNS only — it resolves our domain names. It does not sit in front of your traffic or see meeting content. | Global DNS |
| Resend | Sends account and notification emails (e.g. invitations, alerts). Sees names, email addresses and message text. | European Union (Ireland) |
Where your data is processed (EU residency)
Honest summary: not every provider is EU-based today.
- Already in the EU: our hosting (Germany), email (Resend, Ireland), and all of our transcription options — Speechmatics, Azure, Soniox, Deepgram and AssemblyAI.
- Currently outside the EU: the Anthropic and OpenAI AI models run in the US (covered by Standard Contractual Clauses). BYOK, or an EU-resident AI option, can change this.
- Your options for EU-only processing: choose the EU-based transcription providers above, use BYOK to route through your own EU accounts, or self-host so audio and storage stay on your infrastructure. Note: the AI write-up step still calls an external AI provider — there is no fully-local AI model option yet.
How your data is protected
The measures below are built into the product today:
- Walled-off workspaces. Every request and every stored record is tied to your organisation. People in another workspace cannot reach your data, and the server refuses any request that crosses that line.
- Encrypted connections. All traffic to and from the service is encrypted (TLS).
- Locked-down servers. The application runs as a non-administrator user with the operating system's privileges stripped to the minimum (no extra permissions, no privilege escalation, a standard kernel sandbox).
- Encrypted keys. When you bring your own provider keys, they are encrypted before they are stored (AES-256-GCM), as long as the deployment has its encryption key configured.
- Audit trail. Sensitive actions — sign-ins, deletions, member and settings changes — are recorded per workspace and visible to your administrators.
- Voiceprints stay put. Voice recognition uses a numeric fingerprint, never the audio itself, and it never leaves your workspace.
Where your data lives and how long
- Location. In your workspace — on our servers in Germany for the managed service, or on your own servers if you self-host. Only your workspace members can see it.
- How long. We keep your data until you delete it. There is no automatic deletion schedule today.
- Deleting. Permanently deleting a meeting erases it completely — the transcript, the analysis, the audio recordings, and every related record — not just a hidden flag. To erase an entire workspace at once, contact us.
- Export. You can export any meeting as PDF, Word/HTML, Markdown or plain text, with or without the transcript.
Open items we're being transparent about
Rather than hide them, here are the gaps between what's ideal and what's built today:
- AssemblyAI training opt-out — AssemblyAI now runs in the EU for us, but it still trains on submitted audio by default. Opting out is a manual account-level request that we are in the process of completing. Until it's done, prefer a no-training provider (Azure, Soniox, Deepgram, or self-hosted) for sensitive content.
- Automatic time-based retention/expiry is not yet built; data is kept until you delete it. Whole-workspace erasure is handled on request.
- A fully offline mode (local AI model, zero external calls) does not exist yet; the AI write-up step always uses an external AI provider.