Real-time Digital Human for Dementia Care

End-to-end AI companion system with STT + GPT-4o + TTS pipeline, RAG-enhanced dialogue, and NVIDIA Audio2Face + UE5 avatar — finalist in 2025 Cougar Investigator Grant Challenge.

Problem

Dementia patients lack consistent, responsive companionship. Caregivers face burnout and existing digital solutions cannot sustain natural, multi-turn voice interactions with emotional expressiveness.

Action

Built an end-to-end real-time pipeline: STT for speech recognition (92% average accuracy), GPT-4o for contextual dialogue generation, and TTS for natural voice output. Integrated Fine-tuning + RAG to improve context retention and recall accuracy by 25%. Combined NVIDIA Audio2Face with Unreal Engine 5 for lip-sync and facial expression animation, boosting realism and immersion by 30%.

Result

Achieved smooth multi-turn voice interaction with 3-4 second response latency. Context retention and recall accuracy improved by 25% via RAG. Avatar realism and immersion improved by 30% via Audio2Face + UE5. Selected as finalist in 2025 Cougar Investigator Grant Challenge (Shark Tank format).

Learnings

Deepened understanding of multimodal AI pipeline integration (STT → LLM → TTS → Avatar), real-time streaming architectures, and designing empathetic systems for vulnerable populations.

Tech Stack

GPT-4o

RAG

Fine-tuning

STT

TTS

NVIDIA Audio2Face

Unreal Engine 5

Python