by alan.delpiccolo murivan roc sdellemonache | updated February 08, 2016

miMic is an augmented microphone. It is an ordinary microphone embedding an Inertial Measurement Unit and two buttons. It senses voice and gesture under four modes. miMic is the "pencil" for sketching sound.



A need and an idea: february 13, 2015, on a plane to Marseille.

Sketching sound by voice and gesture: An activity that needs a tool as immediate as a pencil is for sketching on paper.
It would be a microphone, with two buttons and a camera. Two buttons would afford four modes, but only two are described here: Select and Play.

  • Select by imitation. It returns a sound model for the chosen sound category;
  • Play with the given sound model by vocal imitation and gesture.

The camera captures gestural actions that accompany vocal sketching.

Is a camera adequate to capture gestural actions with a microphone?
February 19, 2015 at 8:00 AM
Created by roc
Comments (0)

Bodystorming with StefanoDM playing vocal sketching while manipulating a fake microphone and a fake iron. StefanoB plays the role of the synthesizer. First, the sound model is selected. Second, sound synthesis from the selected model is controlled.

February 19, 2015 at 11:00 PM
Created by murivan and roc
Comments (0)

A camera is probably not a suitable mean to detect the gestures associated with sound sketching, for the following reasons:

  1. It would point to the face of the sketcher, thus capturing only facial gestures;
  2. It would be difficult to deduce gestural manipulations of the microphone by analyzing a video stream;
  3. There would be a lot of data to deal with: complexity, latency,...

Instead, an Inertial Measurement Unit (IMU) would allow detecting movement and orientation of the microphone. 

February 24, 2015 at 10:00 AM
Created by roc
Comments (0)

These are papers that propose sensor-augmented microphones. They can serve as a background for the design of miMic, the sketching microphone.

February 27, 2015 at 11:00 PM
Created by roc
Comments (0)

The main questions here are:

  • What kind of microphone shape?
  • Where to put the buttons?

To give an answer to these questions, we decide that the microphone should be graspable with a single hand (like a pencil is) and actuated with a couple of fingers. Given this requirement the attention is focused on stage microphone, thus excluding studio configurations that are not supposed to be manipulated. The two possible shapes are

  1. "gelato"
  2. classic Shure-55 and the like

In the initial idea and sketch, the buttons were put on the stand or on a separate button pad. However, we soon realized that it is much better to have everything in one hand. For the gelato shape, there is ample possibility to accomodate buttons on the stick, as in the Sennheiser prototype described in the literature research. However, the classic roundish shape is preferred as

  • it compactly fits one hand,
  • it is a sort of visual icon,
  • it can sit vertical on a plane
  • it can fit two large visible buttons on top and electronics inside



February 28, 2015 at 9:00 AM
Created by roc
Comments (0)

Microphone (to be hacked): http://www.soundsationmusic.com/?p=25891

Pushbutton, latching with light (one white, one blue): https://www.sparkfun.com/products/11975

IMU Adafruit LSM9DS0: http://www.adafruit.com/products/2021

Arduino Nano: http://arduino.cc/en/Main/arduinoBoardNano

Two 220ohm resistors.

Jumpers and wires.

Segments of metal tube.

March 10, 2015 at 10:00 AM
Created by roc
Comments (0)

The two buttons have been put on top of the frontal shell. Two holes have been drilled and pieces of metal tube have been used to raise the buttons a bit (construction, Silvano Rocchesso).

The microcontroller+IMU+button combination have been tested with breadboard+clips wiring.

The Adafruit tutorial contains all information for wiring: https://learn.adafruit.com/adafruit-lsm9ds0-accelerometer-gyro-magnetometer-9-dof-breakouts/overview. The modification to Arduino code to handle the buttons is trivial.

In the microphone shell there is just enough space to host the two buttons, the IMU, and the Arduino Nano. For the latter, it is convenient to use one of the two holes of the plastic board that keeps the microphone capsule suspended. The Nano can be embedded in such hole, perpendicular to the plastic board (see picture).

The buttons have five pins each, that are numbered 0 to 4 in the depicted schematics. Pins 0 and 1 are shortcircuited (red little wire) and connected to a digital input of the Nano, pin 2 goes to +5V, and pin 0 is grounded.

It is necessary to drill a rectangular hole on the bottom part of the back shell to plug the USB cable.

To keep the parts easily removable, soldering has been limited to a minimum, and jumper wires have been used, although this causes quite a bit of clutter.

Design Files
March 27, 2015 at 7:27 PM
Created by roc
Comments (0)

In order to test that buttons and motion sensor are working properly when the microphone is manipulated, I used the Processing sample code from the tutorial on how to make an Attitude and Heading Reference System (AHRS): https://learn.adafruit.com/ahrs-for-adafruits-9-dof-10-dof-breakout/introduction

I associated a white light to the white button and a blue light to the blue button to illuminate the rabbit.

For this test, the AHRS has not been calibrated, and the IMU board was just put into the microphone shells with no care about its orientation and firm positioning. That is why the rabbit is not axis-aligned with the microphone.

Design Files
March 30, 2015 at 12:49 PM
Created by roc
Comments (0)

A sound model for friction (Sound Design Toolkit) is driven by gesture (microphone tilt) and voice.

Friction modeling: paper

Sound Design Toolkit: paper, site


April 2, 2015 at 11:28 AM
Created by roc
Comments (0)

This demonstration is "semi-fake". Friction sound synthesis is effectively being controlled by voice and gesture, in real time. However, selection of the appropriate sound model is fictitious.

April 14, 2015 at 7:02 AM
Created by roc
Comments (0)

Selection of one or more sound models is operated by automatic classification of vocal imitations. There are two different approaches to the design of this function:

  • People centered: The tool is supposed to work for the ca- sual user, based on what the classifier learned from many imitations provided by a large pool of subjects.
  • Individual centered: The classifier is trained to recognize the imitations of a specific user. 

Design Files
May 22, 2015 at 11:54 AM
Created by murivan and roc
Comments (0)

To demonstrate the "Select" mode of  miMic we implement a basic model selector based on a classification tree. The construction of such classifier is based on the following steps:

  1. Collect examples
  2. Train a classifier
  3. Implement an online recognition system

Consider the following classes of sounds (models):

  • Wind (261 examples)
  • Liquid (275 examples)
  • Saw (271 examples)
  • DC motor (265 examples)
  • Engine (257 examples)

Extract 1329 examples of these classes from the Ircam imitation database (ref.).
Condition the extracted examples such that they are normalized at -1 dB FS, and length is at least 4 seconds.

May 27, 2015 at 11:26 AM
Created by roc and murivan
Comments (0)

Consider the following set of feature extractors that are part of the Sound Design Toolkit (sdt.spectralfeats~, sdt.pitch~, and sdt.envelope~):

  1. Centroid
  2. Variance
  3. Skewness
  4. Kurtosis
  5. Flatness
  6. Flux
  7. Onset
  8. Pitch
  9. Envelope
  10. RMS

For each feature, computed on windows of 4096 samples, with an overlap of 75%, compute the Median and IQR (InterQuartile Range) over the length of 4 seconds. For the sdt.spectralfeats∼ object the parameter minFreq and maxFreq are set at 50 Hz and 5000 Hz. sdt.pitch∼ has an additional tol- erance parameter set at 0.2 while sdt.envelope∼ has attack and release set to 10 msec and 1000 msec respectively. A Ratio between IQR and the Median is computed for Envelope and RMS features since they are dependent on the signal level. Values are sampled every 20 msec. The Max/MSP patch produces a line of text for each imitation example, which includes a label and the sequence of feature values.
The matlab script derives the binary classification tree.​

May 28, 2015 at 4:03 AM
Created by murivan
Comments (0)

To demonstrate the "Play" mode of  miMic we use a Max patch that collects the five used sound models.

For each sound model, a control layer has been built and tuned by Stefano Delle Monache, in such a way that the vocalizations get immediately interpreted as control signals. The assumption is that if a user selects a sound model (e.g., wind) then she will start controlling the model by producing wind-like sounds with the voice. So, the control layer associated with the model must be ready to interpret such kind of control sounds. Only at a later stage the user might want to explore different vocal emissions and to tune parameters by hand. Both of these actions are made possible by the graphical interface, where detailed maps between vocal features and parameters can be drawn, and each individual model parameter can be manually set.

In the construction of the control layer, we must consider the limits of humans in controlling the dimensions of timbre, as shown by the following study:

169th Meeting Acoustical Society of America
Pittsburgh, Pennsylvania
18–22 May 2015

Vocal imitations of basic auditory features.  

Guillaume Lemaitre, Ali Jabbari, Olivier Houix, Nicolas Misdariis, and
Patrick Susini

We recently showed that vocal imitations are effective descriptions of a variety of sounds (Lemaitre and Rocchesso, 2014). The current study investigated the mechanisms of effective vocal imitations by studying if speakers could accurately reproduce basic auditory features. It focused on four features: pitch, tempo (basic musical features), sharpness, and onset (basic dimensions of timbre). It used two sets of 16 referent sounds (modulated narrow-band noises and pure tones), each crossing two of the four features. Dissimilarity rating experiments and multidimensional scaling analyses confirmed that listeners could easily discriminate the 16 sounds based the four features. Two expert and two lay participants recorded vocal imitations of the 32 sounds. Individual analyses highlighted that participants could reproduce accurately pitch and tempo of the referent sounds (experts being more accurate). There were larger differences of strategy for sharpness and onset. Participants matched the sharpness of the referent sounds either to the frequency of one particular formant or to the overall spectral balance of their voice. Onsets were ignored or imitated with crescendos. Overall, these results show that speakers may not imitate accurately absolute dimensions of timbre, hence suggesting that other features (such as dynamic patterns) may be more effective for sound recognition.



June 19, 2015 at 2:36 AM
Created by roc and murivan
Comments (0)

Models for sound synthesis:

Voice-driven sound synthesis in miMic is achieved through a subset of the sound models palette available in the Sound Design Toolkit (SDT). The SDT is a software package providing advanced, perception-oriented and physically-consistent sound synthesis models that cover a mixture of acoustic phenomena, basic mechanical interactions (i.e., everyday sounds) and machines.

The software package and documentation can be downloaded at https://github.com/SkAT-VG/SDT/releases

Control and Mapping:

Each sound model is provided with a customizable control layer that allows to connect the vocal and gestural descriptors to the interactive parameters. While one-to-one and one-to-many mapping is a trivial task to accomplish, many-to-one associations are achieved through specific Js functions, that can be recalled and edited in the control layer, directly. Finally, a control module per parameter allows to smooth, scale, and eventually distort the audio feature value into the range meaningful for the control parameter.

June 19, 2015 at 2:37 AM
Created by murivan and sdellemonache
Comments (0)

miMic: The microphone as a pencil, D. Rocchesso, D.A. Mauro, and S. Delle Monache. The ACM International conference on Tangible, Embedded and Embodied Interaction. Feb. 14-17, 2016.


Design Files
November 5, 2015 at 3:37 AM
Created by roc
Comments (1)
Sweet, congrats on the TEI acceptance!
about 1 year ago

November 19, 2015 at 4:41 PM
Created by roc
Comments (0)

December 2, 2015 at 2:18 AM
Created by roc
Comments (0)

An opposite approach is that of training a classifier by examples provided by a specific user. We tested this approach by using MuBu objects for content-based real-time interactive audio processing, under Cycling’74 Max.

The MuBu.gmm object extracts Mel-Frequency Cepstral Coefficients (MFCC) from audio examples (at least one example per sound class), and models each sound class as a mixture of Gaussian distributions. In the recognition phase, the likelihood for each class is estimated, and it can be used as a mixing weight for the corresponding sound model.

December 2, 2015 at 8:00 AM
Created by murivan
Comments (0)

The classification tree is implemented in a Max/MSP patch for online recognition.

The vocal input undergoes the same processing adopted for the offline training so each imitation is analyzed and then the Median and IQR values are used in the classification tree. The classifiers select one of the classes and a synthetic example of that class is then played back to the user.

December 2, 2015 at 9:51 AM
Created by murivan
Comments (0)

The individual-centered selection of sound models, when included in miMic, implies a further sub-mode of use, namely the Train procedure. In our realization the Train procedure is activated when both the buttons of miMic are pushed. To avoid using the GUI for this personalization stage, the user is requested to produce one vocal imitation for each of the five different sounds classes in a precise order, with an audible feedback marking the start and end of each imitation. 

December 3, 2015 at 5:34 AM
Created by murivan
Comments (0)


Presentation given in the scope of the seminar "The Skat-VG project : a move to a new sound design tool", organized at Ircam, Paris, France, in January 2016.

February 8, 2016 at 4:47 AM
Created by sdellemonache
Comments (0)