steadiness or dynamicity of the feature
Time extent of the description provided by the feature
Global
Instantaneous
Abstractness of the feature
i.e. what the feature represents
is the feature abstract or concrete?
Is it a mathematical process with little concrete meaning
or does it represent something we can understand from experience?
Extraction process of the feature
directly computed from waveform
i.e. in the time domain
after a transform
spectral characteristics
from a model (e.g. sinusoidal model)
e.g. for harmonicity/noisiness
that include psychoacoustic/physical models
Feature groups
Temporal shape
computed from the waveform or amplitude envelope
attack time
temporal increase/decrease
effective duration
Temporal feature
auto-correlation coefficients
zero-crossing rate
Energy features
global energy
harmonic energy
noise energy
Spectral shape (timbral texture) features
centroid
spread
skewness
kurtosis
spectral roll-off
the frequency Rn below which 85% of the energy distribution of the magnitude spectrum is concentrated
MFCC
Mel-Frequency Cepstral Coefficients
Three steps:
(1) Mel-scale filterbank
The most
common implementation of MFCC is calculated using 13 linearly spaced filters
separated by 133.33 Hz between their center frequencies, followed by 27 log-
spaced filters (separated by a factor of 1.0711703 in frequency) resulting in 40
filterbank values for each STFT frame.
(2) Log energy computation
The next step consists of computing the logarithm of the magnitude of
each of the filterbank outputs. This can be viewed as a simple step of dy-
namic compression, making feature extraction less sensitive to variations in
dynamics.
(3) discrete cosine transform
reducing the dimensionality of the 40 filterbank
outputs by performing a discrete cosine transform (DCT)
From: Music Data Mining EDITED BY Tao Li Mitsunori Ogihara George Tzanetakis. Chapter 2
Delta and DeltaDelta
MPEG-7 Low level audio descriptors
spectral flatness
crest factors
Harmonic features
Instantaneous features calculated from sinusoidal modeling
harmonic/noise ratio
odd to even and tristimulus harmonic energy ratio
harmonic deviation
Perceptual features
relative specific loudness
sharpness
spread
From: Peeters, G. (2004). A large set of audio features for sound description (similarity and classification) in the CUIDADO project (pp. 1–25).
Other perspective
Statistical properties
Spectral shape
Technical/Signal Properties
Intensity properties
Lerch, A. (n.d.). Chapter 3: Instantaneous features. In An Introduction to Audio Content Analysis (pp. 31–69).
Pope
Time-domain Low level
e.g. windowed RMS
Time-domain high level
e.g. Tempo, beat structure
Frequency domain low level
e.g. pitch tracking, spectral peaks
Frequency domain high level
e.g. instrument identification
From Pope's BigMATBook
Feature classification in the MPEG-7 standard
MPEG-7 is a multimedia content description standard
Part 4 of the standard deals with audio
basic
Instantaneous waveform and power values
basic spectral
Log-frequency power spectrum and spectral features (e.g. spectral centroid, spectral spread, spectral flatness)
signal parameters
fundamental frequency and harmonicity of signals
temporal timbral
Log attack time and temporal centroid
spectral timbral
specialized spectral features in a linear frequency space…
spectral basis representations
a number of features used in conjunction for sound recognition for projections into a low-dimensional space.