Pitch Estimation using Models of Voiced Speech on Three Levels

Dominik Joho, Maren Bennewitz, Sven Behnke

Abstract. We present an algorithm for estimating the fundamental frequency in speech signals. Our approach incorporates models of voiced speech on three levels. First, we estimate the pitch for each time frame based on its harmonic structure using non-negative matrix factorization. The second level utilizes temporal pitch continuity to extract partial pitch contours. Thirdly, we incorporate statistics of the succession of voiced segments to aggregate partial contours to the final contour of an utterance. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.

Download: [pdf]

BibTeX

@InProceedings{joho07icassp,
  author    = {Dominik Joho and Maren Bennewitz and Sven Behnke},
  title     = {Pitch Estimation using Models of Voiced Speech on
               Three Levels},
  booktitle = {Proceedings of the {IEEE} International Conference on
               Acoustics, Speech and Signal Processing {(ICASSP)}},
  month     = apr,
  year      = {2007},
  volume    = {4},
  pages     = {1077--1080},
  address   = {Honolulu, Hawaii, USA},
  doi       = {10.1109/ICASSP.2007.367260},
  issn      = {1520-6149},
  isbn      = {1-4244-0728-1}
}