Voice recognition



This was an experiment to see how well Mel Frequency Cepstral Coefficients (MFCC’s) and Chroma analysis are doing in extracting features from audio signals. To do this, I wanted to detect whether some song is by Chet Baker or Beyonce - clearly two very different genres.

This turned out to be very difficult to accomplish (especially so with the lack of prior work in this area), so I moved onto simpler task where using raw audio books I tried to classify the speaker's identity. First, I cleaned and transformed the preprocessed audio snippets, and fed my cleaned features into a neural network to classify the different pitches. Surprisingly, even a shallow network did a great job classifying the voices with an impressive 84% accuracy.

Credits to my friend Sean Lee, who I initially started exploring JavaScript's WebAudio API, and built interactive visualizations to music. Even though the transformations for my project was done by Librosa, we implemented the MFCC algorithm from scratch.

For technical details and collaboration, please see the Github repo above.

Handcrafted in New York City, design & code by
Dora Jambor