The End of Awkward AI Transcriptions - Travis Bartley and Myungjong Kim

Introduction to NVIDIA Speech AI 00:00

Travis and Myungjong introduce the discussion on ending awkward AI transcriptions with NVIDIA's advancements in speech AI.
They outline the focus on model architectures, development processes, deployment, and customization for enterprise-level applications.

Key Focus Areas in Model Development 00:14

Robustness: Ensuring models perform well in both noisy and clean environments.
Coverage: Addressing customer domain needs such as medical, entertainment, and call center applications, while considering multilingual and dialect factors.
Personalization: Tailoring models to meet specific customer requirements, including target speaker AI and text normalization.
Deployment: Balancing speed and accuracy based on customer needs.

Model Architectures and Techniques 02:49

Use of CTC (Connectionist Temporal Classification) models for high-speed inference in streaming environments.
Introduction of R&T (Reinforcement and Training) models for improved accuracy in non-streaming scenarios.
Attention-based encoder-decoder models for handling multiple tasks like speech translation and language identification.

Fast Conformer Architecture 04:31

Fast conformer is identified as the backbone of NVIDIA's offerings, allowing for efficient training and faster inference due to reduced audio input size.
Models are categorized into Reva parakeet for streaming applications and Rea Canary for high accuracy models.

Customization and Additional Features 08:32

Voice activity detection for improved noise robustness and better speech segment identification.
Integration of language models and text normalization for enhanced transcription accuracy and readability.
Speaker identification features for multi-speaker scenarios.

Training and Data Development 11:06

Emphasis on sourcing diverse and high-quality data for robust model training.
Use of both open source and proprietary data, combined with pseudo labeling for improved model performance.
NVIDIA's Nemo toolkit is utilized for efficient training practices.

Deployment Strategies and Flexibility 13:26

Models are deployed via NVIDIA Reva, optimized for low latency and high throughput using NVIDIA Tensor and Triton inference servers.
Customization options are available to cater to specific application needs, including various industry terminologies.

Getting Started with NVIDIA Reva 15:32

Users are encouraged to explore NVIDIA Reva models through the NVIDIA website, which provides resources for developers, guides for fine-tuning models, and access to community forums.

Home Submit Saved