Dominik Joho, Maren Bennewitz, Sven Behnke
Pitch Estimation using Models of Voiced Speech on Three Levels
Abstract. We present an algorithm for estimating the
fundamental frequency in speech signals. Our approach
incorporates models of voiced speech on three levels. First, we estimate the pitch for each time
frame based on its harmonic structure using non-negative matrix factorization. The second level
utilizes temporal pitch continuity to extract partial pitch contours. Thirdly, we incorporate
statistics of the succession of voiced segments to aggregate partial contours to the final contour
of an utterance. We evaluate our approach on the Keele database. The experimental results show the
robustness of our method for noisy speech, and the good performance for clean speech in comparison
with state-of-the-art algorithms.
Download: [pdf]
BibTeX
@InProceedings{joho07icassp, author = {Dominik Joho and Maren Bennewitz and Sven Behnke}, title = {Pitch Estimation using Models of Voiced Speech on Three Levels}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech and Signal Processing {(ICASSP)}}, month = apr, year = {2007}, volume = {4}, pages = {1077--1080}, address = {Honolulu, Hawaii, USA}, doi = {10.1109/ICASSP.2007.367260}, issn = {1520-6149}, isbn = {1-4244-0728-1} }