shape lab - Stanford University

Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user’s experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.

Spatial tactile mappings generated from video examples downloaded from YouTube.

Automatic tactile effects generation pipeline uses both visual and audio features to separate diegetic audio signal and determine the location of the tactile stimuli. The intensity of the generated haptic effects is only decided by the diegetic audio signals.

Papers

Kai Zhang, Lawrence H Kim, Yipeng Guo, and Sean Follmer. 2020. Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features of a Video. In Symposium on Spatial User Interaction. 1–10.

Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features of a Video

Kai Zhang, Lawrence H Kim, Yipeng Guo, Sean Follmer

Papers