EINBLICK
Wie Journalia eine genaue Transkription sicherstellt — vom Audio zum Text
Wir arbeiten in zwei Schritten: zuerst transkribieren, dann die Notiz generieren. Hier beschreiben wir den ersten — und vielleicht wichtigsten — Schritt.
Leon Sandøy
Gründungsingenieur · 22. März 2026
When people ask what Journalia does, the simple answer is that we create clinical notes from conversations. But beneath the surface, two different processes are at work: first, we convert speech to text — transcription. Then we use that text to generate structured clinical notes. Both steps matter, but this post is about the first one. Because no matter how smart the note generation is, a basic principle applies: the quality of the note can never exceed the quality of the transcription.
Why transcription quality matters most
There's an old saying in computing: “garbage in, garbage out.” It applies in full force here. If the transcription hears “paracetamol” when the doctor says “codeine,” or mixes up who said what in a consultation, then it doesn't matter how advanced the note generation is. The result will be wrong.
With AI-powered speech recognition, there are several technical paths available, and the field is evolving quickly. New models, new approaches, new possibilities — but also new challenges. For us, it's about getting this right, every time, for every consultation.
The challenges we face
Audio quality starts with the microphone
The most basic factor is the audio recording itself. If the microphone doesn't capture the conversation properly — whether it's poorly positioned, the room has excessive echo, or there's background noise — the AI simply doesn't have enough to work with. It's like asking someone to read a text that's half covered: even the most attentive reader will guess wrong.
Clinical vocabulary is its own language
Healthcare professionals speak a specialised language that's extremely demanding for general speech recognition systems. It's not just Latin medical terms like “pneumothorax” or “myocardial infarction” — it's equally about the brand names and profession-specific terminology used daily in clinical practice.
- Medications: brand names that are unique to each market and language, rarely appearing in standard training data.
- Terms that vary by profession: a physiotherapist discusses mobilisation, traction tests and trigger-point treatment. A GP talks about spirometry, audiometry and differential diagnoses. A psychologist about mentalisation-based therapy and affect consciousness.
Who said what?
In a clinical consultation, it's not enough to know whatwas said — it's equally important to know whosaid it. When the doctor asks “Do you have chest pain?” and the patient responds “Yes, it's been like that for three weeks,” that information must be attributed to the right person for the note to be correct.
Multiple languages, one consultation
Many consultations don't happen exclusively in one language. Patients may speak different languages, and some consultations involve interpreters where the conversation switches between languages. Automatic language detection isn't trivial — especially with accented speech or mid-sentence language switching.
Speed versus accuracy
Not every situation calls for the same trade-off. Sometimes the clinician needs a note quickly — between patients, with five minutes until the next consultation. Other times, accuracy matters more than speed. There's no single model that excels at everything.
Privacy — for the patient, not for us
Perhaps the most important aspect is the least technical: patients' right to privacy. People don't want audio recordings of their most vulnerable moments stored anywhere. This is a non-negotiable requirement.
How we address it
We include a suitable microphone
Because audio quality is so important, we recommend a specific microphone and include it for new users. We want the experience to be as good as possible from day one. Beyond that, Journalia monitors audio quality in real time and alerts users when something is off. This alone has had a measurable impact on transcription quality.
Custom clinical vocabulary
We tune our speech recognition with clinical vocabulary specifically adapted to different professions. A physiotherapist gets a different vocabulary profile to a GP, which again differs from a psychologist. The result is high accuracy on medical terminology, and an experience where the system genuinely “understands” what you're talking about.
Speaker identification — your voice, 30 seconds
We use two technologies working together. Diarisation analyses the audio stream in real time and distinguishes between different voices. Speaker identificationgoes further — you spend 30 seconds training Journalia on your voice, and from then on the system recognises you across all consultations, clearly separating clinician from patient.
Language selection gives control
Rather than relying on automatic language detection, we give users full control: choose from 50+ languages before starting transcription. This simple choice delivers far better results than letting the machine guess.
Multiple models, chosen by situation
We don't use a single speech recognition model. We have several, chosen based on the situation. As they say in AI research: “there's no free lunch” — but by choosing the right tool for the right job, we give users the best of both worlds.
Audio that's not stored
Privacy isn't a feature we added — it's a basic design principle. The audio stream goes directly from the user's browser, is immediately encrypted and sent for processing. Audio is processed in real time and not stored— not by us, not by any third party. All processing happens within the EU in compliance with the GDPR and healthcare regulations. Read more on our security page.
It's about how we work
Technology alone isn't enough. Much of what makes Journalia's transcription good comes down to how we work as a team — continuous monitoring, regular model evaluation and regular feedback from clinicians who use the system every day.
Transcription isn't a “solved problem.” It's a field in constant evolution, and our job is to stay at the forefront.