The Technology
This work deals with complex scenarios such as a ‘cocktail party’: multiple sound sources exist simultaneously in all modalities. This inhibits the interpretation of each source. Cross-modal analysis offers information beyond that extracted from individual modalities. Consider a camcorder having a single microphone in a cocktail-party: it captures several moving visual objects, which emit sounds. Audiovisual analysis should identify the number of independent audio-associated visual objects (AVOs), pinpoint the AVOs’ spatial locations in the video and isolate each corresponding audio component. Part of these problems were considered by prior studies, which were limited to simple cases, e.g., a single AVO or stationary sounds. The approach described here overcomes these challenges. It acknowledges the importance of temporal features that are based on significant changes in each modality. A probabilistic formalism identifies temporal coincidences between these features, yielding cross-modal association and visual localization.
Advantages
- Does not require a microphone array
- Deals with several simultaneous sounds
- Applies to a variety of sound-producing objects
- Enables enumeration of the number of sound sources
- Separately decomposing sound and motion into discrete events which apply to a variety sound-producing objects
Applications and Opportunities
- Removing noise in phone conversations using a device with a built in camera.
- Removing communication disruptions in noisy environment
- Isolating sound source
- Tracking application in various fields