Problem

In speech analysis, synthesis, and coding, the speech signal is commonly modeled over a...

In speech analysis, synthesis, and coding, the speech signal is commonly modeled over a short time interval as the response of an LTI system excited by an excitation that switches between a train of equally spaced pulses for voiced sounds and a wideband random noise source for unvoiced sounds. To use homomorphic deconvolution to separate the components of the speech model, the speech signal s[n] = v[n] ∗ p[n] is multiplied by a window sequence w[n] to obtain x[n] = s[n]w[n]. To simplify the analysis, x[n] is approximated by

x[n] = (v[n] ∗ p[n]) · w[n] _ v[n] ∗ (p[n] · w[n]) = v[n] ∗ pw[n]

where pw[n] = p[n]w[n] as in Eq. (13.123).

(a) Give an example of p[n], v[n], and w[n] for which the above assumption may be a poor approximation

(b) One approach to estimating the excitation parameters (voiced/unvoiced decision and pulse spacing for voiced speech) is to compute the real cepstrum cx [n] of the windowed segment of speech x[n] as depicted in Figure P13.29-1. For the model of Section 13.10.1, express cx [n] in terms of the complex cepstrumxˆ [n].How would you use cx [n] to estimate the excitation parameters?

(c) Suppose that we replace the log operation in Figure P13.29-1 with the “squaring” operation so that the resulting system is as depicted in Figure P13.29-2. Can the new “cepstrum” qx [n] be used to estimate the excitation parameters? Explain.

Step-by-Step Solution

Request Professional Solution

Request Solution!

We need at least 10 more requests to produce the solution.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the solution will be notified once they are available.
Add your Solution
Textbook Solutions and Answers Search