Abstract

With the advent of large language models, human computer interface for these ai agents has garnered lots of interest. In this paper, We present a human like interface that allows users to have face-to-face conversations with photo realistic avatars in real-time. Given a single image, our system reconstructs a high-quality avatar that can be controlled by 52 blend shape weights. Then, given a question or a statement from the user, our system responds with a synthesized speech along with the synchronized movement of the reconstructed avatar. Our pipeline can also interact with emotion, and the process is done in real-time. Our experiments and user studies demonstrate that our system is capable of generating high-fidelity human-like virtual avatars that can allow users to interact and engage with ai systems.

1. Introduction

However,

In this paper,

2. Methodology

2-1. Text-Speech Interaction

2-2. Facial Avatar Reconstruction

2-3. Speech Driven Facial Animation

2-4. Emotion

Prompting LLM

Emotional TTS

Emotional Talking Face