SPEECH SIGNAL PROCESSING (EE 6362)


Prerequisites:   EE 6360 (DSP I) and EE 6349 (Random Processes)

Objective: To introduce the fundamentals of speech signal processing and related applications. This course will present the basic principles of speech analysis and speech synthesis, and it will cover several applications including speech enhancement, speech coding and speech recognition.

Textbooks:
Required:
T. Quatieri, Discrete-time speech signal processing, Prentice Hall, 2001.
P. Loizou,  Speech Enhancement: Theory and Practice, CRC Press, 2007.

Optional:
J. Deller, J. Proakis and J. Hansen, Discrete-time processing of speech signals, Prentice Hall, 1987.
L. Rabiner and R. Shafer, Digital Processing of Speech Signals, Prentice Hall, 1978.

Software Tools
MATLAB will be used for all computer assignments and projects. The MATLAB speech processing toolbox COLEA will also be used for some assignments, and can be downloaded from: http://www.utdallas.edu/~loizou/speech/colea.htm

Homework and Projects:
Computer homework (in MATLAB) will be assigned on a bi-weekly basis and will be graded. Late homework will not be accepted. There will also be 3 projects and one final project. The projects will involve the MATLAB implementation of a speech analysis, a speech enhancement, a speech coding or a speech recognition algorithm. The final project will be presented in class, and will be graded as follows: 10% for presentation, 10% for written report and 80% for merit and functionality. The final project will be assigned  towards the end of the semester. Alternatively, students may submit a 3-page project proposal for a speech project of their interest.

Course Grading Policy:
        Computer Homework: 20%
        Projects:                           60%
        Final project:                   20%

 


TOPICS

1.  Speech production model (source-system model)
2.  Speech perception
    i. Classes of speech sounds (consonants, vowels, etc.)
    ii. Spectral characteristics of consonants and vowels, formants
3.  Speech analysis techniques
    i. Pitch detection
    ii. Endpoint detection
    iii. Voiced/Unvoiced detection
4.  Speech synthesis techniques
     Formant-based speech synthesizers (e.g., KLATT synthesizer)
    Articulatory speech synthesizers
5.  Speech recognition
    i. Feature extraction algorithms (e.g., mel-frequency cepstrum coefficients)
    ii. Dynamic-time warping
    iii. Hidden-Markov Models
6.  Speech enhancement
    i. Spectral subtraction methods
    ii. Wiener filtering
7.  Speech compression
    i. ADPCM
    ii. Linear-predictive coders, analysis-by-synthesis techniques (e.g. CELP)
    iii.  Speech and audio coding standards (e.g.,VSELP,  MPEG)

Home