HzSpeech

Visualization of speech-emotion with infrasound

In my bachelor thesis "HzSpeech" I dealt with the phenomenon of infrasound and the recognition of emotions in speech through the analysis of acoustic characteristics. The focus was on the exploration of the potentials of infrasound as an information medium. Furthermore, I examined how the interaction between speech and emotion can be analyzed and visualized.

Infrasound describes sound waves with a frequency below the lower limit of audibility to be perceived by humans (1-20Hz). Several processes in nature like volcanic eruptions, earthquakes or typhoons generate infrasound. Also, wind turbines or everyday life devices and machines produce low-frequency vibrations which are partially perceived by humans as a disturbance. The issue is controversial. On the Internet, there are various theories and myths about the effect of infrasound on human well-being. It is particularly impressive that elephants, blue whales or spiders perceive these frequencies and probably use them to interact. So far it is mostly unexplored how this language is constructed and which emotions are expressed. Related to human communication, both words and language serve as expression variants of emotions and are an integral part of communication. People are able - in addition to the meaning of words - to convey further messages such as joy, anger or fear with the tone of voice. This raises the question of how these messages can be analyzed computationally and displayed by using infrasound.

Rotary Subwoofer

The generation of infrasound is a challenging task. In my research, I came across the principle of the rotary subwoofer TRW-17 by Eminent Technology. The rotor head is rotatably mounted and connected to the voice coil of a subwoofer. While a 3-phase motor produces the rotation, the subwoofer is connected to an audio amplifier. When a playback device outputs an audio signal, the voice coil changes the tilt angle of the rotor blades. This impulse results in a compression of the air when installed in a sealed enclosure and generates infrasound.
The advantage of such subwoofers is, on the one hand, the more efficient oscillation and compression of the air. On the other hand, this design generates a higher sound pressure level and a higher resolution of low-frequency sounds. Under the aspects of reproducibility and low budget, components were modified, recycled, or newly created.

Rotary Subwoofer – functional case

Rotary Subwoofer – final prototype

Interface

Speech Emotion Recognition (SER) is one of the most challenging tasks in speech signal analysis domain. It is a research area problem which tries to infer the emotion from the speech signals. An integral feature of the interface is the ability to extract differentiations between different emotions from the speech signals. For this project, essential findings from the work SSI-Framework by J. Wagner are used.

In a first step, the voice recordings were analyzed on the basis of the acoustic properties. For this purpose recordings from emodb and the SAVEE database are used. These are visualized in spectrograms, tagged and summarized on a website. The tool is used to analyse differences and similarities, which serve as a starting point for further processing and traceability.

tool for visualization of speech data

The second step is to create an environment consisting of different tools. With a python-script speech input signals are recognized, extracted, analyzed for their acoustic variables and evaluated with the openvokaturi-api. The resulting values are mapped to a specific emotion. These influence the frequency, amplitude and rotation speed. Via OSC the values are sent to a puredata-script which outputs the speed of the blade rotation and infrasound frequencies by the Rotary Subwoofer.

Installation

On a flexible surface, Small plastic balls are placed, which are set in motion by signals from the rotary subwoofer. The recognized emotion of speech controls the intensity of the infrasound waves. A particular behavior of the balls represents the emotion of the speech signal.

The installation divides into several parts. Inside the pedestal, the rotary subwoofer is installed. Through a circular opening on top of the pedestal, the rotary-subwoofer move the air between the chambers. The generated infrasound waves set the flexible surface with the balls on it in motion.

Credits

bachelor thesis, 2019
supervisors: Prof. Hannes Nehls, Prof. Bastian Beate