Introduction
Drawing on our rich data, we measure what effect the use of Minikai has on the quality of notes written by nurses working in aged care. We analysed notes from 14 nurses at an aged residential care provider, comparing October 2025 (before any Minikai use) with April 2026 (after several months of use), and comparing heavy users with light or non-users.
The data shows nurses' notes improved meaningfully when they used Minikai to help write them. Length and SBAR completeness both showed statistically significant gains, while specificity improved directionally.
Minikai is an AI-native clinical support system built for the disability and aged care sector. It holds context for each participant, including their history, preferences, goals, and the care they have received, and uses that context to assist the clinical and support team. It is used by providers across Australia and New Zealand to support roles like support workers, care managers, quality and governance teams, and clinical leads, across a wider set of use cases than the ones discussed here.
What notes are and why they are important in aged care
A note is the core record of what happens with a participant over time. The next nurse on shift reads it to know how to look after the person. It is drawn on to update the doctor on what has been happening clinically and to update family. It is the evidence of care needs that funding relies on, and auditors use it to understand the care delivered. A richer and better-structured note gives every downstream reader more to work from.
For the nurses doing the writing.
Minikai assists them in putting the relevant details in the note, saving them time. As the provider put it: “We have consistently seen nurses finishing their shifts on time.”
For evidence of care needs in funding.
Funding decisions in aged care rely on documented evidence of each participant's care needs and how those needs change over time. AN-ACC in Australia uses participant records to inform funding classification and reassessment. Richer information and evidence in notes means the evidence base for these funding decisions is stronger.
For visibility of care quality.
Better quality notes give providers a clearer view of the care being delivered. The Australian Aged Care Quality and Safety Commission also looks at notes during audits, so the same documentation supports the Commission's assessment.
What existing research has shown
Existing research on AI and the quality of clinical documentation has largely focused on doctors in outpatient and ambulatory care settings.
Sasseville et al. (2025) reviewed eight AI scribe studies and found documentation efficiency and quality benefits, though accuracy and performance varied significantly across systems. The included studies were largely in outpatient or primary care, with doctors as the primary users.
Etagiuri et al. (2026) ran a randomised controlled trial of an AI scribe with a single adult reconstruction surgeon. Surgeon time per encounter fell from 13.90 to 11.76 minutes, and patient letters written with AI assistance were rated as more complete, professional, and clear.
In this article we look at Minikai's impact on the quality of notes written by nurses in residential aged care. Note quality also underpins other workflows Minikai supports, like funding workflows that draw on documented care needs, and compliance and audit workflows.
How we measured the change
Cohort
Six nurses in the treatment group were heavy users, each having completed 300 or more tasks with Minikai. Eight nurses in the control group were non-users or very light users, with 27 or fewer tasks done with Minikai. All fourteen worked at the same provider.
Sample
For each nurse we randomly sampled 30 notes from October 2025 (before any Minikai exposure) and 30 from April 2026 (after several months of Minikai availability), 840 notes in total. We then filtered to the 286 event-driven notes for the analysis. We classify a note as event-driven if it documents something that required clinical judgement or intervention: a fall, a new symptom, a behavioural incident, or a clinical assessment finding. Otherwise we classify it as a routine note describing business-as-usual care, for example “Checks maintained by staff. Nil new concerns reported.”
We filter for event-driven notes because there is a greater need for detail in them. A participant being found on the floor at 2am calls for length and detail of what happened, observations like times and numbers, and a clear escalation plan.
Measures
Each note was scored against three measures:
- Length: the word count of the note, taken as a proxy for how much information was recorded.
- Specificity: a 1 to 5 score for how much the note uses concrete observations like times, numbers, specific symptoms and specific interventions rather than vague phrases such as “some distress” or “family informed”.
- SBAR: a score of 0 (not at all), 0.5 (partial), or 1 (aligned) for following the four-part Situation, Background, Assessment, Recommendation framework, used in nursing for structured communication. A review of nursing documentation frameworks found that “SBAR was particularly effective in high-acuity settings, while SOAP and PIE were beneficial in routine care and ongoing patient assessment” (Hidalgo Tapia et al., 2025).
Scoring, and the classification of each note as event-driven or routine, was done by a large language model blind to group and period, with its output then sense checked by a human.
Statistical method
Nurses who use Minikai heavily may differ from those who don't, and there may also be underlying trends affecting both groups over the period. We use difference-in-differences to address both. We compare how much the Minikai-using nurses' notes changed between October and April with how much the non-users' notes changed over the same period. If both groups improved by the same amount, Minikai is not the explanation. The bigger the gap between the two groups' improvement, the bigger the Minikai effect.
To assess statistical significance, we fit an ordinary least squares regression and apply a wild cluster bootstrap at the nurse level (999 iterations, Rademacher weights). The bootstrap corrects for small-sample bias in the standard cluster-robust formula, which understates uncertainty at our cluster count of 14. We treat p < 0.05 as grounds to reject the null hypothesis of no Minikai effect at the 5% significance level.
A selection-bias check
If notes written with Minikai were classified differently as event-driven by the large language model scorer, this could shift which notes entered the analysis sample and distort any of the three measures. We checked the share of event-flagged notes in each group across the two periods. If Minikai use were shifting event classification, we would expect the treatment share to move over the period while control held steady. Instead both shares appear stable (treatment 45% to 42%, control 26% to 28%), so we find no sign of that.
| Group | Oct 2025 (pre) | Apr 2026 (post) |
|---|---|---|
| Treatment | 45% | 42% |
| Control | 26% | 28% |
Where Minikai had an effect
The charts below show each group's average pre-period and post-period score on each of the three measures. The gap between the teal line (treatment) and the grey line (control) between October and April is the Minikai effect.
Two of the three measures showed statistically significant gains under Minikai (shown in bold below): length and SBAR completeness. Specificity also improved but didn't reach statistical significance.
| Measure | Minikai effect (95% CI) | p-value |
|---|---|---|
| Length (words) | +37.8 (2.2 to 73.4) | 0.033 |
| SBAR (0–1 scale) | +0.37 (0.17 to 0.58) | 0.007 |
| Specificity (1–5 scale) | +0.46 (−0.02 to 0.94) | 0.057 |
Limitations
SBAR and specificity are ordinal scores analysed via OLS for interpretability in original score units. Classifying event-driven notes and scoring specificity and SBAR involve judgement. A large language model performed the scoring with human review, but we did not formally quantify inter-rater agreement (for example via weighted Cohen's kappa), and LLM scores may vary across runs. We did not test the parallel trends assumption underlying difference-in-differences. Both groups worked in the same organisational context, however, which makes different pre-trends less likely. We do not link individual notes to specific Minikai interactions. The analysis measures the change in note quality across heavy versus light users, not the per-note effect of AI assistance. The findings are from one provider.
- Etagiuri, A., Chaudhry, D., van der Merwe, J., Nickol, M. & van der Merwe, J.M. (2026). Evaluating the Impact of Artificial Intelligence Scribes on Clinical Workflow and Documentation Quality: A Randomized Controlled Trial. The Journal of Arthroplasty. doi.org/10.1016/j.arth.2026.03.069
- Hidalgo Tapia, E.C., León Yosa, J., Olalla García, M.H., Clavijo Morocho, N.J. & Sanmartín Calle, Y.A. (2025). Effectiveness of Nursing Documentation Frameworks (SBAR, SOAP, and PIE) in Enhancing Clinical Handoffs and Patient Safety. Cureus, 17(8), e89957. doi.org/10.7759/cureus.89957
- Sasseville, M., Yousefi, F., Ouellet, S., Naye, F., Stefan, T., Carnovale, V., Bergeron, F., Ling, L., Gheorghiu, B., Hagens, S., Gareau-Lajoie, S. & LeBlanc, A. (2025). The Impact of AI Scribes on Streamlining Clinical Documentation: A Systematic Review. Healthcare, 13(12), 1447. doi.org/10.3390/healthcare13121447
