This was an experiment to see how well Mel Frequency Cepstral Coefficients (MFCC’s) and Chroma analysis are doing in extracting features from audio signals. To do this, I wanted to detect whether some song is by Chet Baker or Beyonce - clearly two very different genres.
This turned out to be very difficult to accomplish (especially so with the lack of prior work in this area), so I moved onto simpler task where using raw audio books I tried to classify the speaker's identity. First, I cleaned and transformed the preprocessed audio snippets, and fed my cleaned features into a neural network to classify the different pitches. Surprisingly, even a shallow network did a great job classifying the voices with an impressive 84% accuracy.
For technical details and collaboration, please see the Github repo above.