Please describe the list of AI features and models and techniques that are at play from end-to-end to make mirror mirror happen.
Here's the full inventory of AI features, models, and techniques powering Mirror Mirror end-to-end:
face_landmarker.task (float16), loaded from @mediapipe/tasks-visiondepth-anything/Depth-Anything-V2-Small-hf via HuggingFace Transformers + PyTorchFACE_TRIANGLES + EYE_FILL_TRIANGLES).face.vert.wgsl, face.frag.wgsl) rendering the textured face mesh with per-vertex normals and lighting.POST /v1/voices/addvoice_id. This voice is then used for all TTS responses — the mirror literally speaks back in your voice.eleven_multilingual_v2POST /v1/text-to-speech/{voice_id}/with-timestampsscribe_v1POST /v1/speech-to-textclaude-opus-4-6POST /v1/messages)AnalyserNode computes RMS energy from the microphone stream in real-time. Once speech is detected (RMS > 0.01 threshold), a 1.5s silence timer triggers automatic recording stop. This gives natural conversational turn-taking — the mirror knows when you've stopped speaking.1 - e^(-speed·dt)) for fluid interpolation, with per-source lerp speeds and priority arbitration.In summary: The user looks into their camera. MediaPipe tracks their face. Depth Anything gives it depth. WebGPU renders a 3D replica. They speak an incantation — ElevenLabs clones their voice while Procrustes extracts their mouth shapes. They ask a question — Scribe transcribes it, Claude answers, ElevenLabs speaks the answer in their own cloned voice, and the 3D face lip-syncs using the captured viseme shapes. The mirror talks back.