Objective: To introduce the fundamentals of speech signal processing and related applications. This course will present the basic principles of speech analysis and speech synthesis, and it will cover several applications including speech enhancement, speech coding and speech recognition.
T. Quatieri, Discrete-time speech signal processing, Prentice Hall, 2001.
P. Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007.
J. Deller, J. Proakis and J. Hansen, Discrete-time processing of speech signals, Prentice Hall, 1987.
L. Rabiner and R. Shafer, Digital Processing of Speech Signals, Prentice Hall, 1978.
MATLAB will be used for all computer assignments and projects. The MATLAB speech processing toolbox COLEA will also be used for some assignments, and can be downloaded from: http://www.utdallas.edu/~loizou/speech/colea.htm
Homework and Projects:
Computer homework (in MATLAB) will be assigned on a bi-weekly basis and will be graded. Late homework will not be accepted. There will also be 3 projects and one final project. The projects will involve the MATLAB implementation of a speech analysis, a speech enhancement, a speech coding or a speech recognition algorithm. The final project will be presented in class, and will be graded as follows: 10% for presentation, 10% for written report and 80% for merit and functionality. The final project will be assigned towards the end of the semester. Alternatively, students may submit a 3-page project proposal for a speech project of their interest.
Course Grading Policy:
Computer Homework: 20%
Final project: 20%
1. Speech production model (source-system model)
2. Speech perception
i. Classes of speech sounds (consonants, vowels, etc.)
ii. Spectral characteristics of consonants and vowels, formants
3. Speech analysis techniques
i. Pitch detection
ii. Endpoint detection
iii. Voiced/Unvoiced detection
4. Speech synthesis techniques
Formant-based speech synthesizers (e.g., KLATT synthesizer)
Articulatory speech synthesizers
5. Speech recognition
i. Feature extraction algorithms (e.g., mel-frequency cepstrum coefficients)
ii. Dynamic-time warping
iii. Hidden-Markov Models
6. Speech enhancement
i. Spectral subtraction methods
ii. Wiener filtering
7. Speech compression
ii. Linear-predictive coders, analysis-by-synthesis techniques (e.g. CELP)
iii. Speech and audio coding standards (e.g.,VSELP, MPEG)