Audio SNR improvement utilizing video analysis

Researcher:

Prof. Yoav Schechner | Electrical and Computer Engineering

Categories:

Information and Computer Science | Physics and Electro-Optics

The Technology

This work deals with complex scenarios such as a ‘cocktail party’: multiple sound sources exist simultaneously in all modalities. This inhibits the interpretation of each source. Cross-modal analysis offers information beyond that extracted from individual modalities. Consider a camcorder having a single microphone in a cocktail-party: it captures several moving visual objects, which emit sounds. Audiovisual analysis should identify the number of independent audio-associated visual objects (AVOs), pinpoint the AVOs’ spatial locations in the video and isolate each corresponding audio component. Part of these problems were considered by prior studies, which were limited to simple cases, e.g., a single AVO or stationary sounds. The approach described here overcomes these challenges. It acknowledges the importance of temporal features that are based on significant changes in each modality. A probabilistic formalism identifies temporal coincidences between these features, yielding cross-modal association and visual localization.

Advantages

Does not require a microphone array
Deals with several simultaneous sounds
Applies to a variety of sound-producing objects
Enables enumeration of the number of sound sources
Separately decomposing sound and motion into discrete events which apply to a variety sound-producing objects

Applications and Opportunities

Removing noise in phone conversations using a device with a built in camera.
Removing communication disruptions in noisy environment
Isolating sound source
Tracking application in various fields

Business Development Contacts

Oz Mahlebani

Business Development Manager, Engineering

oz.m@trdf.technion.ac.il

Audio SNR improvement utilizing video analysis

Categories:

The Technology

Advantages

Applications and Opportunities

BECOME A MEMBER