Objective: To introduce the fundamentals of speech signal processing and related applications. This course will present the basic principles of speech analysis and speech synthesis, and it will cover several applications including speech enhancement, speech coding and speech recognition.
Textbooks:
Required:
T. Quatieri, Discrete-time speech signal processing, Prentice
Hall,
2001.
P.
Loizou, Speech
Enhancement: Theory and Practice,
CRC Press, 2007.
Optional:
J. Deller, J. Proakis and J.
Hansen, Discrete-time processing of speech signals, Prentice Hall,
1987.
L. Rabiner and R. Shafer, Digital
Processing of Speech Signals, Prentice Hall, 1978.
Software Tools
MATLAB will be used for all
computer assignments and projects. The MATLAB speech processing toolbox
COLEA will also be used for some assignments, and can be downloaded from:
http://www.utdallas.edu/~loizou/speech/colea.htm
Homework and Projects:
Computer homework (in MATLAB)
will be assigned on a bi-weekly basis and will be graded. Late homework
will not be accepted. There will also be 3 projects and one final project.
The projects will involve the MATLAB implementation of a speech analysis,
a speech enhancement, a speech coding or a speech recognition algorithm.
The final project will be presented in class, and will be graded as follows:
10% for presentation, 10% for written report and 80% for merit and functionality.
The final project will be assigned towards the end of the semester.
Alternatively, students may submit a 3-page project proposal for a speech
project of their interest.
Course Grading Policy:
Computer Homework: 20%
Projects:
60%
Final project:
20%
TOPICS
1. Speech production model (source-system model)
2. Speech perception
i. Classes of speech sounds (consonants, vowels, etc.)
ii. Spectral characteristics of consonants and vowels, formants
3. Speech analysis techniques
i. Pitch detection
ii. Endpoint detection
iii. Voiced/Unvoiced detection
4. Speech synthesis techniques
Formant-based speech synthesizers (e.g., KLATT synthesizer)
Articulatory speech synthesizers
5. Speech recognition
i. Feature extraction algorithms (e.g., mel-frequency cepstrum coefficients)
ii. Dynamic-time warping
iii. Hidden-Markov Models
6. Speech enhancement
i. Spectral subtraction methods
ii. Wiener filtering
7. Speech compression
i. ADPCM
ii. Linear-predictive coders, analysis-by-synthesis techniques (e.g. CELP)
iii. Speech and audio coding standards (e.g.,VSELP, MPEG)