The whole field of linguistics has an important impact on speech recognition applications, especially on their accuracy. Therefore, understanding the basics of how speech originates is crucial. The following will give an introduction to the importance of grammar in such applications, starting with a general definition and then proceeding with the influences on the recognition. Finally, existing approaches will be summarized and reviewed.


1. The term grammar in linguistics

To support the understanding of grammar in a speech recognition context, it is highly advisable to first get involved with the general definition and structure of the term grammar in a linguistic context.

1.1 General definition

The term grammar (from Lat. grammatica) in linguistics firstly reflects either the relations between words in a sentence or between sentences themselves, or secondly focuses  on the systematic exposition of these elements. But, the primary sense of grammar is how words are connected to express a person’s complete statement.

The correctness of an expressed thought depends only on if it is intelligible or not. Hence, grammar appertains closely to the use and customs of languages. Consequently, if something is not familiar and therefore cannot be understood by the common it is ungrammatical.

From age to age, the grammar of a language changes: By way of example, the modern Italian has changed essentially from the Latin grammar, so has the modern English grammar.  Furthermore, the differences between grammars of different languages will differentiate more or less depending on its geographical and cultural closeness [1].

1.2 Types of grammar

The linguistics field of research differentiates three different types of grammars: descriptive,  prescriptive and teaching grammars.

Descriptive grammars are models that describe the basic linguistic knowledge of people, that is to say the speakers’ linguistic capacity. There might be differences among a set of speakers’ knowledge, but a linguist tries to identify the intersecting set of speakers’ knowledge.

Surveying prescriptive grammars leads to the conclusion that these are products of language purists trying to actively impose rules of grammar. Due to the dynamic and constantly changing nature of spoken language, those prescriptivist are likely to fail in implying rules and guidelines - the concern should shift to speakers’ thoughts instead of trying to regiment their language.

Lastly, a teaching grammar is used to learn foreign languages from a speakers’ perspective and therefore is significantly different from descriptive grammars, that attempt to describe the knowledge of native speakers about their language. By way of example, sounds that do not exist in the native language are often paraphrased by known sounds [2].

2. The term grammar in speech recognition

After introducing the reader to grammar in general, the focus now shifts to analyzing the importance of grammar in the speech recognition context.

2.1 Contextual definition

In the speech recognition context, the term grammar is often used in a completely different sense than described in the previous subsection. Instead of describing the way of how words are put together and how sentences are structured, it rather embraces the set of rules to define the words and phrases that the application will be able to recognize. Thereby, multiple grammars can be defined - each basically consisting of a list of words respectively phrases [3].

In 2004, W3 Consortium published the „Speech Recognition Grammar Specification“, that „[…] defines syntax for representing grammars for use in speech recognition […]“. Its scope is to standardize the way of how rules are defined, for example it allows to create patterns in which words occur. An exemplary rule is structured as follows [4]:

<rule id="city">
      <item>"San Francisco"</item>
<rule id="command">
   <ruleref uri="#action"/>
   <ruleref uri="#object"/>

2.2 Possible influences on speech recognition

Synthesizing the definitions described above, the grammar in a speech recognition application has to cover all ways in which users’ formulations could differ.  Given a small vocabulary and a strict grammar, high accuracy results can be achieved. As the vocabulary size increases the grammar gains also in complexity making it difficult to stick with strict grammars. Especially the modeling of language is affected by these implications. 

But even in highly-praised statistical approaches the grammar leads to critical problems as any mismatch between a training set and input of an user degrades the performance of a speech recognition application [5].



[1] LLC, G. B. (2010). Encyclopaedia Britannica, 11th Edition. General Books LLC.

[2] Fromkin, V. (2012). An Introduction to Language. Cengage Learning Australia.

[3] Kemble, K. A. (2001). An introduction to speech recognition. Voice Systems Middleware Education-IBM Corporation, .

[4] Speech Recognition Grammar Specification Version 1.0. (2004). Retrieved from

[5] Gardner-Bonneau, D. (2007). Human Factors and Voice Interactive Systems. Springer.