The chapter speech introduces the basic properties of speech and language, which are used by automatic speech recognition (ASR) systems to retrieve the spoken sentences. Furthermore, this chapter points out properties of speech, which could increase the word error rate of ASR systems.

 

1. Overview

For thousand of years humans have used speech to communicate messages between humans. In the context of speech communication, one can differentiate between language and speech, even though both words are often used as a synonym. Language is a system for representing and communicating complex conceptual structures. Language does not need to be encoded into vocal signals but could also be encoded into gestures.[1] A thorough introduction of the characteristics of language is described in the articles:

Speech refers to the auditory and vocal medium typical used by humans to convey language. Furthermore, speech contains much more information than the spoken message, the linguistic message. Speech also includes information about speaker such as gender, age or emotional state. A complete overview about speech and a description of the  the human physiology which encodes and decodes the speech signal is provided in the articles:

 

References

[1] W. Tecumseh Fitch, “The evolution of speech: a comparative review”, Trends in Cognitive Sciences, vol. 4, pp. 258-267, 2000.