Source Coding and Decoding

Carl Nassar , in Telecommunications Demystified, 2001

The Sampling

Ideal sampling is simple, and it's shown in Figure 4.1. Here, an analog input x(t)–say, Carl's speech-enters the sampler. The sampler does one thing: it multiplies Carl's speech by the signal

Figure 4.1. Ideal sampling

(4.1) p ( t ) = k = δ ( t k T s )

called an impulse train, shown in Figure 4.1. The output of the sampler is then

(4.2) x s ( t ) = x ( t ) p ( t ) = x ( t ) k = δ ( t k T s )

The multiplication of x(t) by the impulse train p(t) leads to the output shown on the output side of Figure 4.1. Here, we see that the output is made up of impulses at times kTs of height x(kTs ); that is, mathematically,

(4.3) x s ( t ) = k = x ( k T s ) δ ( t k T s )

In essence, this is everything there is to know about ideal sampling. Well, almost everything.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978008051867150010X

Patterns of Dynamic Activity and Timing in Neural Network Processing

Judith E. Dayhoff , ... Daw-Tung Lin , in Neural Networks and Pattern Recognition, 1998

7 Timing of Action Potentials in Impulse Trains

An entirely new realm of possibilities arises when impulses are used to communicate signals between neurons. This construct occurs in biology, where neurons produce action potentials (APs) that travel along axons. These APs are fast waves of depolarization that travel at speeds exceeding other biological mechanisms for communication. Impulse trains are trains of action potentials spaced over time, with varying time intervals between them. The brain thus includes a massively parallel impulse train generator and processor. Simultaneously generated impulse trains can have patterns that are a function of the activity of ensembles of neurons. Patterns and synchronies in these impulse trains furnish important putative codes for information transmission and processing in the brain. Models can incorporate spiking neurons, temporal patterns, or coincidences in the impulse trains, and sometimes attractor states [LAM+96].

Usually, artificial neural models use activation level parameters, which are continuous real-valued numbers that are communicated from one processing unit to another. A naive assumption is that the activation level in neural network models reflects firing rates in biological neural systems. While firing rates appear to play an information encoding role in some biological subsystems, it seems likely that a more complex processing scheme is enabled by the action potentials of neurons, based on a set of computational schemes that goes beyond simple firing rate encoding.

Simultaneously recorded nerve impulse trains appear as in Figure 22. Typically, the waveform is the same on each impulse recorded from the same neuron and, as a result, is not expected to carry information. Thus, the placement of impulses in time must represent, process, and carry the information.

Figure 22. Simultaneously recorded nerve impulse trains (simulated data).

Temporal patterns have been examined in nerve impulse trains. Favored patterns are firing patterns that repeat in exact or approximate form over an extended period of time (Figure 23). Their occurrences may be placed arbitrarily in time, or they may be periodic, occurring at equal intervals. Methods have been developed for identification of recurring temporal patterns that are statistically significant [DG83a], [DG83b]. These methods overcome the problem that some number of coincidental recurrences are expected at random. The methods realistically identify neural recordings that contain recurring patterns unusually often, according to statistical tests. Favored patterns have been found in single unit recordings and in multiple unit recordings [DG83a], [DG83b], [AG88], [FFH90]. This research has shown the presence of favored temporal patterns in neural recordings that include a variety of preparations (crayfish claw, cat visual cortex, cat brainstem). These intriguing results contribute to the accumulating studies and analysis of nerve impulse timing [NZJE96], [Les96], [SZTM96], [JSB97], [RWSB96], [MZ093], [Hop95], [SZ95], [TGK94], [Day87].

Figure 23. Particular firing patterns re-occur in the nerve impulse trains above, with some variation in interspike interval on each occurrence. The third line shows the pattern at the top occurring with an extra spike. Data was simulated. © 1987 IEEE.

Reprinted with permission from [Day87].

Temporal patterns are consistent with models that include dynamic attractors, as oscillating attractors can produce repeating temporal patterns among one or more neurons. A temporal pattern could be elicited in exact or approximate form each time that a section of an oscillating attractor is revisited.

In multiple unit recordings, it is cogent to evaluate data for the presence of temporal synchronies (and other patterns) among groups of two or more units. A synchrony would occur when a group of neurons each fire an impulse at approximately the same time. The study of multiunit synchronies is highly motivated for the following reasons. Neurons are natural recognizers of synchrony arriving at presynaptic sites, as synchronous stimuli sum more effectively when postsynaptic potential peaks coincide. Synchronous groups can stimulate postsynaptic activity faster than individual neurons. Synchronies play a role in LTP learning, and synchronous groups are consistent with models of neural processing. In addition, synchronous groups can multiplex firing rate codes. Methods for identification of synchronies have been developed and synchronies have been observed in biologically recorded systems, and evidence of ensemble coding has been found [Day95], [GPD85], [LHM   +   92   ], [GKES89], [RCF96], [CDSS97], [GSM96].

In a synchrony code representation, an ensemble of near-coincidental firing would represent information or its processing during cognitive tasks. The event of synchronous firing, however, would last only an instant unless repeated. Repetitions could occur at regular periods or irregularly over time. Clearly, the brain has a mechanism to sustain a representation over an arbitrary period of time because we can imagine an image or consider an idea for any chosen length of time. Thus the proposed synchrony code could allow for sustained representations by repetitions of the synchronous firing. Repetitions could in turn be caused by oscillations, or attractors, in the network dynamics. Thus, synchronies are consistent with models of dynamic attractors that oscillate to produce repeated synchronous events. Some models of networks of spiking neurons have shown synchronies, temporal patterns, or oscillations and attractors [PCD96], [MR96], [TH96], [Kel95].

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780125264204500057

Single-Channel Feedforward Control

S.J. ELLIOTT , in Signal Processing for Active Control, 2001

3.2.1 Waveform Synthesis

In the time domain, the control signal can be generated by passing a periodic reference signal through an FIR digital controller whose impulse response is as long as the fundamental period of the disturbance. Adjusting the coefficients of this filter can then generate any periodic waveform within the sampling constraints of the digital system. A particularly efficient implementation of a periodic controller is achieved if the sampling rate can be arranged to be an exact integer multiple of the period of the disturbance. For machines running at a constant speed, this could be achieved in practice by using a phase-locked loop to determine the sampling rate, which could be synchronised to a tachometer signal at the rotation rate. The FIR control filter now only has to have as many coefficients as the period of the disturbance divided by the sampling period. Furthermore, if the reference signal is assumed to be a periodic impulse train at the fundamental frequency of the disturbance, the implementation of the controller also becomes extremely efficient ( Elliott and Darlington, 1985).

Referring to the block diagram of such a control system shown in Fig. 3.4, the sampled output of the control filter, u(n), is given by the reference signal x(n), filtered by an I-th order FIR controller, so that in general

(3.2.1) u ( n ) = Σ i = 0 I 1 w i x ( n i ) .

In the particular case in which the reference signal is a periodic impulse train, then

(3.2.2) x ( n ) = Σ k = δ ( n k N ) ,

where the signal has N samples per period and δ(n) is the Kronecker impulse function, which is equal to 1 if n = 0 and is otherwise equal to zero. If the control filter also has N coefficients, its output can be written as

(3.2.3) u ( n ) = Σ k = Σ i = 0 N 1 w i δ ( n k N ) = w p ,

where p denotes the smallest value of (n − kN) for any k, and may be interpreted as the 'phase' of u(n) relative to x(n). The output signal is thus a periodic reproduction of the impulse response of the control filter, as illustrated in Fig. 3.4, and the waveform stored in the control filter is effectively 'played out' every cycle after triggering by the impulsive reference signal. The implementation of the control filter thus requires no computation except the sequential retrieval of the N coefficients wi . This form of controller was originally investigated by Chaplin (1983) who described it as using waveform synthesis. Chaplin originally suggested a 'trial and error' or power sensing approach to the adaptation of the coefficients of the control filter in which the mean-square value of the error signal is monitored and the coefficients are individually adjusted in turn until this is minimised. Later, more sophisticated algorithms were suggested for adapting all of the filter coefficients simultaneously using the individual samples of the error sequence, which were described as waveform sensing algorithms (Smith and Chaplin, 1983). Some of these algorithms can be shown to be variants of the LMS algorithm, which takes on a particularly simple form when the reference signal is a periodic impulse train (Elliott and Darlington, 1985).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780122370854500053

Thermal properties of nanoporous silicon materials

N. Koshida , in Porous Silicon for Biomedical Applications (Second Edition), 2021

2.3.3.1 Digital speaker

Utilizing the characteristic behavior of the PSi emitter, the development of an audible speaker compatible with a full digital drive has been pursued. The device parameters (p and d of PSi layer, surface heater material) were fixed such that they met the requirements for the frequency band (audible to 100   kHz) and for planar process flow of 4-in wafers. To confirm the availability for digital driving, a pulse density modulation (PDM) mode was employed. Analog signals were quantized at a maximum sampling frequency of 2 MHz with an analog/digital converter, and then introduced into the emitter as sequential impulse trains (0.3  μs width) with a constant pulse height. The acoustic output measurements and FFT analyses were done under both open-space (far-field) and closed-one (near-field) conditions in an anechoic box.

It has been demonstrated that under PDM operation the acoustic output corresponding to the digitized input properly traces the original analog signal well in the frequency range of 300   Hz–100   kHz (Koshida et al., 2013). An analog input signal of 20   kHz, the PDM-digitized impulse train, and the corresponding acoustic output signal, are shown in Fig. 2.6A–C , respectively. One can see that the digital output reproduces the original analog signal well with the same frequency as the input. The digital output is proportional to the pulse density regardless of the original signal waveform, since the measured acoustic pressure is regarded as being a superimposed burst one. The frequency spectrum of the reproduced acoustic output signal verifies a sufficiently low distortion.

Fig. 2.6

Fig. 2.6. Wave forms of an original analog input signal (20   kHz) (A), PDM digitized pulse (300   ns in width) trains (B), and the corresponding acoustic output signal (C). The original signal is reproduced well by PDM drive.

In open space, the acoustic output amplitude in the audible band increases in proportion to frequency, while it tends to saturate in the ultrasonic region. This behavior, common with that in the analog drive case, is consistent with the theoretical expectations (Hu et al., 2010; Hu, Wang, & Wang, 2012; Hu, Wang, Wu, & Wang, 2012; Vesterinen et al., 2010). In contrast, in closed space (2   cm3 in volume), a characteristic frequency response was observed: the acoustic output was in inverse proportion to frequency. At 300   Hz, the sound pressure level reached 100   dB. Low distortion values were kept unchanged even at low frequencies. The observed inverse proportionality can be explained from the fact that the acoustic output generated in the proximity of the device surface is expressed as a dynamic displacement of air. Being a compact, low-distortion, and broad-band emitter, the thermo-acoustic digital speaker is potentially useful for hearing-aids, directional emitters, and functional tweeters.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128216774000124

Speech Coding Standards

ANDREAS S. SPANIAS , in Multimedia Communications, 2001

3.3.1.4 Mixed Excitation Linear Prediction

In 1996 the U.S. government standardized a new 2.4 Kbits/s algorithm called Mixed Excitation LP (MELP) [7, 37 ]. The development of mixed excitation models in LPC was motivated largely by voicing errors in LPC-10 and also by the inadequacy of the two-state excitation model in cases of voicing transitions (mixed voiced-unvoiced frames). This problem can be solved using a mixed excitation model where the impulse train (buzz) excites the low-frequency region of the LP synthesis filter and the noise excites the high-frequency region of the synthesis filter ( Figure 3.6). The excitation shaping is done using first-order FIR filters H1(z) and H2(z) with time-varying parameters. The mixed source model also uses (selectively) pulse position jitter for the synthesis of weakly periodic or aperiodic voiced speech. An adaptive pole-zero spectral enhancer is used to boost the formant frequencies. Finally, a dispersion filter is used after the LPC synthesis filter to improve the matching of natural and synthetic speech away from the formants. The 2.4 Kbit/s MELP is based on a 22.5 ms frame and the algorithmic delay was estimated to be 122.5 ms. An integer pitch estimate is obtained open-loop by searching autocorrelation statistics followed by a fractional pitch refinement process. The LP parameters are obtained using the Levinson-Durbin algorithm and vector quantized as LSPs. MELP has an estimated MOS of 3.2 and complexity estimated at 40 MIPS.

FIGURE 3.6. Mixed Excitation LPC.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780122821608500049

Speech Synthesis Based on Linear Prediction

Bishnu S. Atal , in Encyclopedia of Physical Science and Technology (Third Edition), 2003

IV.A Speech Synthesis at Low Bit Rates

We are going to consider an important application of linear prediction—speech synthesis on computers. The synthesis procedure follows directly from Eq. (5), which can be rewritten as

(21) s n = ɛ n + k = 1 p a k s n k .

We find that the predictor coefficients can be quantized and represented accurately by a small number of bits, typically 50 bits for 10 coefficients. This implies an information rate of about 2500 bits/s for the predictor coefficients. To synthesize speech, we have to represent the prediction error ε(n) at a low enough data rate, too. In the prediction error waveforms of Fig. 12 , the prediction error looks like an impulse train. Thus for voiced speech, the prediction error ϵ( n) could be replaced by an impulse train of adjustable amplitude and period. For unvoiced speech, it is usually sufficient to replace the prediction error by white random noise of adjustable amplitude. A speech synthesizer based on these ideas is shown in Fig. 19. The excitation control data consists of 1 bit for the voiced/unvoiced decision, and 6 bits for the pitch period. The amplitude of the speech output is controlled by the amplifier G, specified by 5 bits. The predictor coefficients require a total of 50 bits of 10 coefficients. Thus 62 bits per frame are required to characterize the speech. If these parameters are specified once every 20   ms, it results in a data rate of about 3000 bits/s—a considerable saving over the usual 64,000 bits/s for PCM.

FIGURE 19. Low bit rate linear predictive speech synthesizer. The excitation is created by unit impulse train at pitch periods for voiced speech and white random noise for unvoiced speech. The speech energy is controlled by the gain factor G of the amplifier.

The above synthesis procedure, often used in linear predictive vocoders, can produce speech of fairly good quality. The synthetic speech is usually intelligible but does not sound natural as spoken by a human talker. The reasons for the lack of naturalness are not exactly understood but they are primarily due to the replacement of the prediction error ϵ(n) by a composite of pulses and white noise as shown in Fig. 19. Natural sounding speech of very high quality can be produced by using better representations of the prediction error. Three different synthesis models using linear predictive filters for speech synthesis are illustrated in Fig. 20.

FIGURE 20. Three different speech synthesis models making use of linear predictive filters: (a) vocoder model using separate excitations for voiced speech and unvoiced speech, (b) multipulse model using a sequence of pulses, and (c) stochastic model using random noise. Both multipulse and stochastic models eliminate the need for voiced/unvoiced decision and pitch analysis.

The first synthesis model is the vocoder model discussed earlier. This model requires that speech be classified into two well-defined classes: voiced and unvoiced, and the pitch period for voiced speech be exactly known. As a practical matter, it is sometimes difficult to determine if speech is voiced or unvoiced. Also, since the speech signal is only approximately periodic it is very hard in some instances to locate the exact pitch period. These problems are completely avoided in the multipulse synthesis model shown in Fig. 20. The multipulse model makes no distinction between voiced or unvoiced speech nor does it require an exact knowledge of the pitch period. The prediction error is replaced by a sequence of pulses whose exact locations are determined by a matching procedure that minimizes the difference between the original and synthetic speech waveforms. Only a few pulses are needed to produce speech of very high quality. The small number of pulses (typically 4 pulses every 5   ms) translates into a small bit rate for representing the prediction error. However, the high speech quality of a multipulse synthesizer is achieved by a significant increase in the bit rate in comparison to a vocoder. A typical bit rate for the multipulse synthesizer is 8000 bits/s.

The stochastic synthesis model shown in Fig. 20 can reduce the bit rate for high-quality speech synthesis even further. In this model, the prediction error (input to the long-delay filter) looks very much like random stochastic noise; hence the name stochastic model. The best random approximation to the prediction error is selected from a prestored codebook of random sequences to minimize the difference between the original and synthetic speech signals, as we did in the multipulse model. Both multipulse and stochastic speech synthesis models can produce natural sounding speech of very high quality but the methods they employ for determining the best multipulse or random sequences are fairly complex and require a sophisticated computer to perform the necessary computations.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122274105007201

Methods, Models, and Algorithms for Modern Speech Processing

John R. DellerJr., John Hansen , in The Electrical Engineering Handbook, 2005

3.2.2 Discrete-Time Model of Speech Production

Efforts to model speech production have focused principally on the "source–filter" view of the physiologic system in which, over short stationary intervals of an utterance, speech is modeled as a linear, time-invariant, acoustic filtering operation on an appropriate excitation (the "source"). The process of producing the source input is assumed to be decoupled from the dynamics of the acoustic filter. The prominent discrete-time (DT) source-filter model of speech, known variously as the "all-pole" or "linear prediction" model, is essentially a DT version of classic analog acoustic models. The linear-prediction model underlies a majority of the modern speech products and services.

The history of the developments leading up to modern DT models for speech production is interesting and informative, and it has great value in illuminating the present models (Flanagan, 1972; Klatt, 1987; Morgan and Gold, 1999; Shafer and Markel, 1979; Dudley, 1939). Circuit models developed early in the 20th century, consisting mainly of band-pass filters (resonators) to simulate formant spectra, represent the first attempts of the modern era to model speech production. The primary goal of this work was speech synthesis, and there was no assertion that the models bore any internal similarity to the physiologic system. Such models are often called terminal–analog models because they are analogous to the real system only at the terminus. Work on acoustic tube models in the 1960s (Fant, 1960) represents an attempt to model the internal physics of the speech system. Tube models and their differential equations of wave propagation proved to be remarkably useful tools in understanding the speech production system. Although the prevailing DT "all-pole" model of speech production was not derived directly from the acoustic tube theory, important insights and interpretations of the DT model have been obtained from classical theory (Deller, 2000; Rabiner and Schafer, 1978).

A general linear, time-invariant DT model for speech production is shown in the block-diagram form in Figure 3.4. The model is a terminal analog, and it is understood to represent only short intervals of speech over which signal dynamics are assumed to be stationary. Frequent updating or adaptation of model parameters is necessary to properly model the quickly time-varying nature of speech. With this caveat, such a system produces reasonable quality speech for coding and synthesis purposes and serves as a useful analysis model for recognition and related tasks, in spite of the fact that there are no provisions for coupling or nonlinear effects among subsystems in the model.

FIGURE 3.4. A General Discrete-Time Model for Speech Production

In the general DT model, a vocal-tract system T(z) is cascaded with a radiation model R(z). [We henceforth include any nasal-tract dynamics in the vocal-tract model T(z).] This system is excited by an uncorrelated excitation signal, say {en }, or by a filtered version of {en }. During unvoiced speech activity, the excitation {en } is a flat-spectrum noise source that excites the vocal tract model directly. During periods of voiced speech activity, the excitation is a periodic DT impulse train whose period corresponds to the desired fundamental frequency. 9 A more realistic voicing signal is created by using the pulse train as input to a glottal pulse shaping filter, G(z). In principle, we would like G(z) to produce a low-pass pulse waveform similar to that shown in Figure 3.5. The filter G(z) is further discussed in upcoming paragraphs.

FIGURE 3.5. Idealized Low-Pass Pulse Train. This figure shows three cycles of the glottal volume-velocity waveform during voiced speech.

Assuming that the vocal-tract system has a spectrum that is well-modeled by pure resonances, its model is taken to be all pole: 10

(3.1) T ( z ) = T o 1 Σ k = 1 N b k z k = T o k = 1 N ( 1 ρ k z 1 ) ,

where To represents an overall gain term and ρ k the (generally complex) pole locations for the model. Based on our knowledge of phonetics, we might suspect that the model is inadequate for certain classes of phonemes. We have noted that some phonemes (e.g., the nasals) exhibit spectral nulls, strongly suggesting the need for zeros in the model. We will return to the issue of zeros in the model below. A strong impetus to justify an all-pole model is the availability of powerful analytical methods that follow from its use.

Each pair of poles in the z-plane at complex conjugate locations ( ρ i , ρ i * ) roughly corresponds to a formant in the spectrum of T(z). Since the T(z) should be a stable system, all poles are inside the unit circle in the z-plane. If the poles in the z-plane are well-separated, estimates for formant frequencies and bandwidths can be obtained from the pole locations.

Now let us further characterize the glottal filter G(z). In keeping with the desire to have an all-pole speech model, it is sometimes suggested that the two-pole signal:

(3.2) g n = [ α n β n ] u n , β < α < 1 , α 1 ,

in which {un } is the unit step sequence; the signal is an appropriate impulse response for the filter. An all-pole impulse–response model for {gn } (regardless of the number of poles used), however, is incapable of producing the realistic pulse shapes that are observed in many experiments because an all-pole model is constrained to be of minimum phase (otherwise it would be unstable). More realistic pulse signals have been suggested in the literature. The Rosenberg pulse [18] is given by:

(3.3) g n = { 1 2 [ 1 cos ( π n / P ) ] , 0 n P , P = time of pulse peak . cos [ π ( n P ) / 2 ( K P ) ] , P n K , K = final sampale before complete closure . 0 , otherwise .

Equation 3.3 is popular because it can flexibly represent many realistic pulse shapes by adjusting its parameters. The Rosenberg pulse, however, cannot be approximated well using a model with only poles, so we add another concern for the discussion of zeros below.

The radiation component, R(z), is a low-impedance load that terminates the vocal-tract and converts the volume velocity at the lips to a pressure wave in the far field. The radiation load has been observed to have a high-pass filtering effect that is well-modeled by a simple differencer:

(3.4) R ( z ) = Z lips ( z ) = 1 ζ o z 1 , ζ o < 1 , ζ o 1.

We must finally come to terms with the apparent contradiction between the need for zeros in the model and the desire to avoid them. Indeed, R(z) itself is composed of a single zero and no poles. Some of the early writings on this subject argued that if equation 3.2 is a good model for the glottal dynamics, then one of the poles of G(z) will approximately cancel the zero ζ o in R(z). This does not, however, resolve the question of whether G(z) should contain zeros, or whether, in certain cases, T(z) should include zeros to appropriately model a phoneme. The answer to these questions depends largely on what aspect of speech we are trying to model. In most applications, correct spectral magnitude information is all that is required of the model. Generally speaking, this is because the human auditory system is "phase deaf" (Milner, 1970), so information gleaned from the speech is extracted from its magnitude spectrum. While a detailed discussion is beyond our current scope (Deller et al., 2000), the critical fact that justifies the all-pole model is that a magnitude (but not a phase) spectrum for any rational system function can be modeled to an arbitrary degree of accuracy with a sufficient number of stable poles. Therefore, the all-pole model can exactly preserve the magnitude spectral dynamics (the "information") in the speech but might not retain the phase characteristics. In fact, for the speech model to be stable and all pole, it must necessarily have a minimum phase spectrum, regardless of the true characteristics of the signal being encoded. Ideally, the all-pole model will have the correct magnitude spectrum but minimum phase characteristic with respect to the "true" model. If the objective is to code, store, resynthesize, and perform other such tasks on the magnitude spectral characteristics but not necessarily on the temporal dynamics, the all-pole model is perfectly adequate. One should not, however, anticipate that the all-pole model can preserve time-domain features of speech waveforms because such features depend explicitly on the phase spectrum.

Let us summarize these important results. Ignoring the technicalities of z-transform existence, we assume that the output (pressure wave) of the speech production system is the result of filtering the appropriate excitation by two (in the unvoiced case) or three (voiced) linear, separable filters. Ignoring the developments above momentarily, let us suppose that we know "exact" or "true" linear models of the various components. By this we mean that we (somehow) know models that will exactly produce the speech waveform under consideration. These models are only constrained to be linear and stable and are otherwise unrestricted. In the unvoiced case S(z) = E(z)T(z)R(z), where E(z) represents a partial realization of a white noise process. In the voiced case, S(z) = E(z)G(z)T(z)R(z), where E(z) represents a DT impulse train of period P, the pitch period of the utterance. Accordingly, the true overall system function is as follows:

(3.5) H ( z ) = S ( z ) E ( z ) = { T ( z ) R ( z ) , unvoiced case . G ( z ) T ( z ) R ( z ) , voiced case .

With enough painstaking experimental work, we could probably deduce reasonable "true" models for any stationary utterance of interest. In general, we would expect these models to require zeros as well as poles in their system functions. Yet, an all-pole model exists that will at least produce a model/speech waveform with the correct magnitude spectrum (Deller et al., 2000), and a waveform with correct spectral magnitude is frequently sufficient for coding, recognition, and synthesis.

We henceforth assume, therefore, that during a stationary frame of speech, the speech production system can be characterized by a z-domain system function of the form:

(3.6) H ( z ) = H 0 1 Σ i = 1 M a i z 1 with H 0 > 0 ,

which is driven by an excitation sequence:

(3.7) e n = { Σ q = δ n q p , voiced case . zero mean , unity variance , uncorrelated noise , unvoiced case .

In equation 3.6, 0 < M < ∞. A block diagram for this system is shown in Figure 3.6.

FIGURE 3.6. Block Diagram for z-Domain System Function

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780121709600500633

Polymer-optical fibres for data transmission

G. Stepniak , ... C.-A. Bunge , in Polymer Optical Fibres, 2017

8.2.9 Analog and digital systems interface

Physical signals, like a human voice or an electrocardiography signal are analog. An analog signal x(t) is continuous in time and can take values from the real numbers set R . However, in digital communications systems these signals are represented by bits. Devices which perform conversion of signals between the analog and digital domains are called ADCs or DACs.

In ADCs, two processes take place, sampling and quantisation. Sampling transforms a continuous signal to a series of discrete values with the sampling period T s. The inverse of the sampling period is called the sampling frequency f s = 1 T s . Mathematically, sampling of a signal x(t ) is equivalent to its multiplication with a delta impulse train ( Fig. 8.8), ie,

Figure 8.8. Sampling in the time and frequency domains.

(8.38) x s ( t ) = x ( t ) · n = δ ( t n T s ) .

The Eq. (8.38) can be transformed to the frequency domain:

(8.39) X s ( f ) = X ( f ) 1 T s n = δ ( f n f s ) = 1 T s n = X ( f n f s ) .

From the Eq. (8.39) follows that the spectrum of a sampled signal is a periodic train of the spectra of the continuous signal X(f), with the period equal to the sampling frequency f s. If the sampling frequency:

(8.40) f s > 2 W ,

where W is the bandwidth of x(t) the adjacent copies of the original spectrum do not overlap and the sampled signal contains full information on the continuous signal. The condition in Eq. (8.40) is also known as the Nyquist sampling theorem, as it defines the minimum sampling frequency.

To recover the analog signal from the sampled signal, it is sufficient to pass the sampled signal via a low-pass filter, which filters out all spectrum copies of Eq. (8.39) apart from the n =   0 component, which is the original spectrum of the continuous signal x(t).

Quantisation limits the amplitudes of the signal samples x s(nT s) to a finite set of values. In case of a uniform quantiser, the amplitude range (−X m, X m) is divided into 2 N intervals Δ, where:

(8.41) Δ = 2 X m 2 N

and each sample is represented by the closest value from this set. This operation may be denoted by Q(x). A quantiser stores the number of the quantisation interval in an N-bit word. A dequantiser performs the opposite operation; it takes an N-bit word and converts it to a signal value from the finite set.

Unlike sampling, quantisation is a lossy process, ie, the difference between the original sample and quantised sample is called quantisation error, and is an additional source of noise in the system. The quantisation error is ε = x Q ( x ) . Obviously, the more quantisation levels for a given interval, the smaller the quantisation noise, however more bits are needed to transfer the information on the analog signal.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780081000397000087

Information Theory

Alon Orlitsky , in Encyclopedia of Physical Science and Technology (Third Edition), 2003

II.E Applications

Specific source-coding applications use specialized compression schemes that are adapted to the source characteristics and distortion measure relevant to the application. We mention two of the most common applications: voice and image coding.

II.E.1 Voice Compression

Voice compression vies to compress human speech while maintaining a natural sound and a high level of intelligibility. Typical applications include transmission and storage.

The first commercial voice coders used scalar quantization and a logarithmic fidelity criterion to approximate the speech waveform. Typical of these voice coders was the pulse-coded modulation (PCM) coder which operated at 64 kilo bits per second (kbps). Subsequent speech coders are based on speech-production models where an excitation signal is passed though an all-pole filter representing the vocal tract. Typically the filter is determined using linear prediction (LP) techniques. The first such voice coders, called LPC Vocoders, used simple excitation signals representing unvoiced sounds by random noise and voiced sounds by impulse trains. With time, the excitation signals got more sophisticated. Code excited linear predictive (CELP) coders, introduced in 1985 use a collection of excitation signals that when passed through the filter approximate the voice signal as measured by a perceptual fidelity criterion. Nowadays, variations of CELP coders operate in rates as low as 4 kbps, while providing reasonable sound quality.

II.E.2 Image and Video Compression

Image and video compression methods can also be divided into lossless techniques where the reproduced image or video is identical to the original one, and lossy techniques, where it is not.

Lossless image compression schemes such as the lossless mode of the Joint Photographic Experts Group (JPEG) standard, are based on pixel prediction. An image is typically scanned and encoded in raster scan order. The next pixel to be encoded is predicted based on previously encoded pixels. The difference between the actual value of the pixel and the predicted value is then encoded using an entropy coder such as Huffman or arithmetic coding.

Lossy image coding generally uses a transform on the raw pixel values. The transform decorrelates the values and concentrates their energy in few coefficients which can then be quantized and transmitted. Historically, the transform of choice was the discrete cosine transform (DCT), which is the basis of the JPEG international standard, but recent research has focused on the wavelet transform which is used in the new JPEG2000 standard. Typically black and white images are compressed to about 0.3 bits per pixel and color images to about 1 bit per pixel while keeping high image quality.

Video coding schemes exploit the temporal redundancy as well as the interpixel correlation. Most video coders such as those in the motion picture experts group (MPEG) standard use motion compensation in which the current frame is encoded by reference to previously encoded frames. The encoder divides the frame into blocks and looks for a good match in the reference frame for each block in the current frame. If an acceptable match is found, the encoder encodes the location of that reference block and the difference between it and the current block. If no matches are found, the block is encoded using methods for coding still images. MPEG-4, for example, compresses images to about 0.25 bits per pixel while preserving a fairly high image quality.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122274105003379

Discrete-Time Systems

M. Sami Fadali , Antonio Visioli , in Digital Control Engineering, 2009

2.9 The Sampling Theorem

Sampling is necessary for the processing of analog data using digital elements. Successful digital data processing requires that the samples reflect the nature of the analog signal and that analog signals be recoverable, at least in theory, from a sequence of samples. Figure 2.10 shows two distinct waveforms with identical samples. Obviously, faster sampling of the two waveforms would produce distinguishable sequences. Thus, it is obvious that sufficiently fast sampling is a prerequisite for successful digital data processing. The sampling theorem gives a lower bound on the sampling rate necessary for a given band-limited signal (i.e., a signal with a known finite bandwidth).

Figure 2.10. Two different waveforms with identical samples.

Theorem 2.4:

The Sampling Theorem.

The band-limited signal with

(2.60) f ( t ) F F ( j ω ) , F ( j ω ) 0 , ω m ω ω m undefined F ( j ω ) = 0 ,  elsewhere

with F denoting the Fourier transform, can be reconstructed from the discrete-time waveform

(2.61) f * ( t ) = k = f ( t ) δ ( t k T )

if and only if the sampling angular frequency ωs = 2π/T satisfies the condition

(2.62) ω s > 2 ω m

The spectrum of the continuous-time waveform can be recovered using an ideal low-pass filter of bandwidth ωb in the range

(2.63) ω m < ω b < ω s / 2

Proof.

Consider the unit impulse train

(2.64) δ T ( t ) = k = δ ( t k T )

and its Fourier transform

(2.65) δ T ( ω ) = 2 π T n = δ ( ω n ω s )

Impulse sampling is achieved by multiplying the waveforms f(t) and δT (t). By the frequency convolution theorem, the spectrum of the product of the two waveforms is given by the convolution of their two spectra; that is,

{ δ T ( t ) × f ( t ) } = 1 2 π δ T ( j ω ) * F ( j ω ) = [ 1 T n = δ ( ω n ω s ) ] * F ( j ω ) = 1 T n = F ( ω n ω s )

where ωm is the bandwidth of the signal. Therefore, the spectrum of the sampled waveform is a periodic function of frequency ωs . Assuming that f(t) is a real valued function, then it is well known that the magnitude |F()| is an even function of frequency, whereas the phase ∠F() is an odd function. For a band-limited function, the amplitude and phase in the frequency range 0 to ωs /2 can be recovered by an ideal low-pass filter as shown in Figure 2.11.▪

Figure 2.11. Sampling theorem.

2.9.1 Selection of the Sampling Frequency

In practice, finite bandwidth is an idealization associated with infinite-duration signals, whereas finite duration implies infinite bandwidth. To show this, assume that a given signal is to be band limited. Band limiting is equivalent to multiplication by a pulse in the frequency domain. By the convolution theorem, multiplication in the frequency domain is equivalent to convolution of the inverse Fourier transforms. Hence, the inverse transform of the band-limited function is the convolution of the original time function with the sinc function, a function of infinite duration. We conclude that a band-limited function is of infinite duration.

A time-limited function is the product of a function of infinite duration and a pulse. The frequency convolution theorem states that multiplication in the time domain is equivalent to convolution of the Fourier transforms in the frequency domain. Thus, the spectrum of a time-limited function is the convolution of the spectrum of the function of infinite duration with a sinc function, a function of infinite bandwidth. Hence, the Fourier transform of a time-limited function has infinite bandwidth. Because all measurements are made over a finite time period, infinite bandwidths are unavoidable. Nevertheless, a given signal often has a finite "effective bandwidth" beyond which its spectral components are negligible. This allows us to treat physical signals as band limited and choose a suitable sampling rate for them based on the sampling theorem.

In practice, the sampling rate chosen is often larger than the lower bound specified in the sampling theorem. A rule of thumb is to choose ωs as

(2.66) ω s = k ω m , 5 k 10

The choice of the constant k depends on the application. In many applications, the upper bound on the sampling frequency is well below the capabilities of state-of-the-art hardware. A closed-loop control system cannot have a sampling period below the minimum time required for the output measurement; that is, the sampling frequency is upper-bounded by the sensor delay. 4 For example, oxygen sensors used in automotive air/fuel ratio control have a sensor delay of about 20 ms, which corresponds to a sampling frequency upper bound of 50 Hz. Another limitation is the computational time needed to update the control. This is becoming less restrictive with the availability of faster microprocessors but must be considered in sampling rate selection.

In digital control, the sampling frequency must be chosen so that samples provide a good representation of the analog physical variables. A more detailed discussion of the practical issues that must be considered when choosing the sampling frequency is given in Chapter 12. Here, we only discuss choosing the sampling period based on the sampling theorem.

For a linear system, the output of the system has a spectrum given by the product of the frequency response and input spectrum. Because the input is not known a priori, we must base our choice of sampling frequency on the frequency response.

The frequency response of a first-order system is

(2.67) H ( j ω ) = K j ω / ω b + 1

where K is the DC gain and ωb is the system bandwidth. The frequency response amplitude drops below the DC level by a factor of about 10 at the frequency 7ωb . If we consider ωm = 7ωb , the sampling frequency is chosen as

(2.68) ω s = k ω b , 35 k 70

For a second-order system with frequency response

(2.69) H ( j ω ) = K j 2 ζ ω / ω n + 1 ( ω / ω n ) 2

and the bandwidth of the system is approximated by the damped natural frequency

(2.70) ω d = ω n 1 ζ 2

Using a frequency of 7ωd as the maximum significant frequency, we choose the sampling frequency as

(2.71) ω s = k ω d , 35 k 70

In addition, the impulse response of a second-order system is of the form

(2.72) y ( t ) = A e ζ ω n t sin ( ω d t + ϕ )

where A is a constant amplitude, and ϕ is a phase angle. Thus, the choice of sampling frequency of (2.71) is sufficiently fast for oscillations of frequency ωd and time to first peak π/ωd .

Example 2.25

Given a first-order system of bandwidth 10 rad/s, select a suitable sampling frequency and find the corresponding sampling period.

Solution

A suitable choice of sampling frequency is ωs = 60, ωb = 600 rad/s. The corresponding sampling period is approximately T = 2π/ωs ≅ 0.01 s.

Example 2.26

A closed-loop control system must be designed for a steady-state error not to exceed 5 percent, a damping ratio of about 0.7, and an undamped natural frequency of 10 rad/s. Select a suitable sampling period for the system if the system has a sensor delay of

1.

0.02 s

2.

0.03 s

Solution

Let the sampling frequency be

ω s 35 ω d = 35 ω n 1 ζ 2 = 350 1 0.49 = 249.95 rad / s

The corresponding sampling period is T = 2π/ωs ≤ 0.025 s.

1.

A suitable choice is T = 20 ms because this is equal to the sensor delay.

2.

We are forced to choose T = 30 ms, which is equal to the sensor delay.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123744982000023