Algorithms and Software for Predictive and Perceptual Modeling of Speech

Algorithms and Software for Predictive and Perceptual Modeling of Speech PDF

Author: Venkatraman Atti

Publisher: Morgan & Claypool Publishers

Published: 2010-05-05

Total Pages: 124

ISBN-13: 160845388X

DOWNLOAD EBOOK →

From the early pulse code modulation-based coders to some of the recent multi-rate wideband speech coding standards, the area of speech coding made several significant strides with an objective to attain high quality of speech at the lowest possible bit rate. This book presents some of the recent advances in linear prediction (LP)-based speech analysis that employ perceptual models for narrow- and wide-band speech coding. The LP analysis-synthesis framework has been successful for speech coding because it fits well the source-system paradigm for speech synthesis. Limitations associated with the conventional LP have been studied extensively, and several extensions to LP-based analysis-synthesis have been proposed, e.g., the discrete all-pole modeling, the perceptual LP, the warped LP, the LP with modified filter structures, the IIR-based pure LP, all-pole modeling using the weighted-sum of LSP polynomials, the LP for low frequency emphasis, and the cascade-form LP. These extensions can be classified as algorithms that either attempt to improve the LP spectral envelope fitting performance or embed perceptual models in the LP. The first half of the book reviews some of the recent developments in predictive modeling of speech with the help of MatlabTM Simulation examples. Advantages of integrating perceptual models in low bit rate speech coding depend on the accuracy of these models to mimic the human performance and, more importantly, on the achievable "coding gains" and "computational overhead" associated with these physiological models. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on concepts introduced by Schroeder and Atal in 1979. For example, a simple approach employed in speech coding standards is to use a perceptual weighting filter to shape the quantization noise according to the masking properties of the human ear. The second half of the book reviews some of the recent developments in perceptual modeling of speech (e.g., masking threshold, psychoacoustic models, auditory excitation pattern, and loudness) with the help of MatlabTM simulations. Supplementary material including MatlabTM programs and simulation examples presented in this book can also be accessed here. Table of Contents: Introduction / Predictive Modeling of Speech / Perceptual Modeling of Speech

Algorithms and Software for Predictive and Perceptual Modeling of Speech

Algorithms and Software for Predictive and Perceptual Modeling of Speech PDF

Author: Venkatraman Atti

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 113

ISBN-13: 3031015169

DOWNLOAD EBOOK →

From the early pulse code modulation-based coders to some of the recent multi-rate wideband speech coding standards, the area of speech coding made several significant strides with an objective to attain high quality of speech at the lowest possible bit rate. This book presents some of the recent advances in linear prediction (LP)-based speech analysis that employ perceptual models for narrow- and wide-band speech coding. The LP analysis-synthesis framework has been successful for speech coding because it fits well the source-system paradigm for speech synthesis. Limitations associated with the conventional LP have been studied extensively, and several extensions to LP-based analysis-synthesis have been proposed, e.g., the discrete all-pole modeling, the perceptual LP, the warped LP, the LP with modified filter structures, the IIR-based pure LP, all-pole modeling using the weighted-sum of LSP polynomials, the LP for low frequency emphasis, and the cascade-form LP. These extensions can be classified as algorithms that either attempt to improve the LP spectral envelope fitting performance or embed perceptual models in the LP. The first half of the book reviews some of the recent developments in predictive modeling of speech with the help of MatlabTM Simulation examples. Advantages of integrating perceptual models in low bit rate speech coding depend on the accuracy of these models to mimic the human performance and, more importantly, on the achievable "coding gains" and "computational overhead" associated with these physiological models. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on concepts introduced by Schroeder and Atal in 1979. For example, a simple approach employed in speech coding standards is to use a perceptual weighting filter to shape the quantization noise according to the masking properties of the human ear. The second half of the book reviews some of the recent developments in perceptual modeling of speech (e.g., masking threshold, psychoacoustic models, auditory excitation pattern, and loudness) with the help of MatlabTM simulations. Supplementary material including MatlabTM programs and simulation examples presented in this book can also be accessed here. Table of Contents: Introduction / Predictive Modeling of Speech / Perceptual Modeling of Speech

Bandwidth Extension of Speech Using Perceptual Criteria

Bandwidth Extension of Speech Using Perceptual Criteria PDF

Author: Visar Berisha

Publisher: Springer Nature

Published: 2022-06-01

Total Pages: 71

ISBN-13: 3031015215

DOWNLOAD EBOOK →

Bandwidth extension of speech is used in the International Telecommunication Union G.729.1 standard in which the narrowband bitstream is combined with quantized high-band parameters. Although this system produces high-quality wideband speech, the additional bits used to represent the high band can be further reduced. In addition to the algorithm used in the G.729.1 standard, bandwidth extension methods based on spectrum prediction have also been proposed. Although these algorithms do not require additional bits, they perform poorly when the correlation between the low and the high band is weak. In this book, two wideband speech coding algorithms that rely on bandwidth extension are developed. The algorithms operate as wrappers around existing narrowband compression schemes. More specifically, in these algorithms, the low band is encoded using an existing toll-quality narrowband system, whereas the high band is generated using the proposed extension techniques. The first method relies only on transmitted high-band information to generate the wideband speech. The second algorithm uses a constrained minimum mean square error estimator that combines transmitted high-band envelope information with a predictive scheme driven by narrowband features. Both algorithms make use of novel perceptual models based on loudness that determine optimum quantization strategies for wideband recovery and synthesis. Objective and subjective evaluations reveal that the proposed system performs at a lower average bit rate while improving speech quality when compared to other similar algorithms.

Nonlinear Speech Modeling and Applications

Nonlinear Speech Modeling and Applications PDF

Author: Gerard Chollet

Publisher: Springer

Published: 2005-07-12

Total Pages: 444

ISBN-13: 3540318860

DOWNLOAD EBOOK →

This book presents the revised tutorial lectures given at the International Summer School on Nonlinear Speech Processing-Algorithms and Analysis held in Vietri sul Mare, Salerno, Italy in September 2004. The 14 revised tutorial lectures by leading international researchers are organized in topical sections on dealing with nonlinearities in speech signals, acoustic-to-articulatory modeling of speech phenomena, data driven and speech processing algorithms, and algorithms and models based on speech perception mechanisms. Besides the tutorial lectures, 15 revised reviewed papers are included presenting original research results on task oriented speech applications.

Engineer Your Software!

Engineer Your Software! PDF

Author: Scott A. Whitmire

Publisher: Springer Nature

Published: 2022-06-01

Total Pages: 121

ISBN-13: 3031015304

DOWNLOAD EBOOK →

Software development is hard, but creating good software is even harder, especially if your main job is something other than developing software. Engineer Your Software! opens the world of software engineering, weaving engineering techniques and measurement into software development activities. Focusing on architecture and design, Engineer Your Software! claims that no matter how you write software, design and engineering matter and can be applied at any point in the process. Engineer Your Software! provides advice, patterns, design criteria, measures, and techniques that will help you get it right the first time. Engineer Your Software! also provides solutions to many vexing issues that developers run into time and time again. Developed over 40 years of creating large software applications, these lessons are sprinkled with real-world examples from actual software projects. Along the way, the author describes common design principles and design patterns that can make life a lot easier for anyone tasked with writing anything from a simple script to the largest enterprise-scale systems.

Despeckle Filtering for Ultrasound Imaging and Video, Volume I

Despeckle Filtering for Ultrasound Imaging and Video, Volume I PDF

Author: Christos P. Loizou

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 154

ISBN-13: 3031015231

DOWNLOAD EBOOK →

It is well known that speckle is a multiplicative noise that degrades image and video quality and the visual expert's evaluation in ultrasound imaging and video. This necessitates the need for robust despeckling image and video techniques for both routine clinical practice and tele-consultation. The goal for this book (book 1 of 2 books) is to introduce the problem of speckle occurring in ultrasound image and video as well as the theoretical background (equations), the algorithmic steps, and the MATLABTM code for the following group of despeckle filters: linear filtering, nonlinear filtering, anisotropic diffusion filtering, and wavelet filtering. This book proposes a comparative evaluation framework of these despeckle filters based on texture analysis, image quality evaluation metrics, and visual evaluation by medical experts. Despeckle noise reduction through the application of these filters will improve the visual observation quality or it may be used as a pre-processing step for further automated analysis, such as image and video segmentation, and texture characterization in ultrasound cardiovascular imaging, as well as in bandwidth reduction in ultrasound video transmission for telemedicine applications. The aforementioned topics will be covered in detail in the companion book to this one. Furthermore, in order to facilitate further applications we have developed in MATLABTM two different toolboxes that integrate image (IDF) and video (VDF) despeckle filtering, texture analysis, and image and video quality evaluation metrics. The code for these toolsets is open source and these are available to download complementary to the two books. Table of Contents: Preface / Acknowledgments / List of Symbols / List of Abbreviations / Introduction to Speckle Noise in Ultrasound Imaging and Video / Basics of Evaluation Methodology / Linear Despeckle Filtering / Nonlinear Despeckle Filtering / Diffusion Despeckle Filtering / Wavelet Despeckle Filtering / Evaluation of Despeckle Filtering / Summary and Future Directions / References / Authors' Biographies

Analysis of the MPEG-1 Layer III (MP3) Algorithm using MATLAB

Analysis of the MPEG-1 Layer III (MP3) Algorithm using MATLAB PDF

Author: Andreas Spanias

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 115

ISBN-13: 3031015185

DOWNLOAD EBOOK →

The MPEG-1 Layer III (MP3) algorithm is one of the most successful audio formats for consumer audio storage and for transfer and playback of music on digital audio players. The MP3 compression standard along with the AAC (Advanced Audio Coding) algorithm are associated with the most successful music players of the last decade. This book describes the fundamentals and the MATLAB implementation details of the MP3 algorithm. Several of the tedious processes in MP3 are supported by demonstrations using MATLAB software. The book presents the theoretical concepts and algorithms used in the MP3 standard. The implementation details and simulations with MATLAB complement the theoretical principles. The extensive list of references enables the reader to perform a more detailed study on specific aspects of the algorithm and gain exposure to advancements in perceptual coding. Table of Contents: Introduction / Analysis Subband Filter Bank / Psychoacoustic Model II / MDCT / Bit Allocation, Quantization and Coding / Decoder

Cognitive Fusion for Target Tracking

Cognitive Fusion for Target Tracking PDF

Author: Ioannis Kyriakides

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 57

ISBN-13: 3031015282

DOWNLOAD EBOOK →

The adaptive configuration of nodes in a sensor network has the potential to improve sequential estimation performance by intelligently allocating limited sensor network resources. In addition, the use of heterogeneous sensing nodes provides a diversity of information that also enhances estimation performance. This work reviews cognitive systems and presents a cognitive fusion framework for sequential state estimation using adaptive configuration of heterogeneous sensing nodes and heterogeneous data fusion. This work also provides an application of cognitive fusion to the sequential estimation problem of target tracking using foveal and radar sensors.

Latency and Distortion of Electromagnetic Trackers for Augmented Reality Systems

Latency and Distortion of Electromagnetic Trackers for Augmented Reality Systems PDF

Author: Henry Himberg

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 173

ISBN-13: 3031015223

DOWNLOAD EBOOK →

Augmented reality (AR) systems are often used to superimpose virtual objects or information on a scene to improve situational awareness. Delays in the display system or inaccurate registration of objects destroy the sense of immersion a user experiences when using AR systems. AC electromagnetic trackers are ideal for these applications when combined with head orientation prediction to compensate for display system delays. Unfortunately, these trackers do not perform well in environments that contain conductive or ferrous materials due to magnetic field distortion without expensive calibration techniques. In our work we focus on both the prediction and distortion compensation aspects of this application, developing a "small footprint" predictive filter for display lag compensation and a simplified calibration system for AC magnetic trackers. In the first phase of our study we presented a novel method of tracking angular head velocity from quaternion orientation using an Extended Kalman Filter in both single model (DQEKF) and multiple model (MMDQ) implementations. In the second phase of our work we have developed a new method of mapping the magnetic field generated by the tracker without high precision measurement equipment. This method uses simple fixtures with multiple sensors in a rigid geometry to collect magnetic field data in the tracking volume. We have developed a new algorithm to process the collected data and generate a map of the magnetic field distortion that can be used to compensate distorted measurement data. Table of Contents: List of Tables / Preface / Acknowledgments / Delta Quaternion Extended Kalman Filter / Multiple Model Delta Quaternion Filter / Interpolation Volume Calibration / Conclusion / References / Authors' Biographies

Despeckle Filtering for Ultrasound Imaging and Video

Despeckle Filtering for Ultrasound Imaging and Video PDF

Author: Christos P. Loizou

Publisher: Morgan & Claypool Publishers

Published: 2015-04-01

Total Pages: 182

ISBN-13: 1627057420

DOWNLOAD EBOOK →

It is well known that speckle is a multiplicative noise that degrades image and video quality and the visual expert's evaluation in ultrasound imaging and video. This necessitates the need for robust despeckling image and video techniques for both routine