It seems as though modern speech recognition (e.g., through Android, iOS phones) make use of grammar or sentence structure. (e.g., it might have a tough time distinguishing between "grammar" and "grandma" but can distinguish between "I'm going to see grandma" and "I'm reading a book on english grammar". (yes, I just tried it with my Android phone with vLingo app)
That is much improved (with Speaker Independent SR (i.e., no training)) over what I experienced with Dragon Dictate even using Speaker Dependent SR (with 30m of training).
So, I'm wondering whether my guess is right: When did the commercially available SR software start using grammar and sentence structure to "guess" the right speech?
As far as I know, much of the speech recognition software is build using Hidden Markov Models (HMM) where observations are individual sound phonemes, and the states correspond to words (or some partitions of the words).
I don't think the systems take into account grammar or sentence structure in the sense that they encode rules of English so much as they naturally pick up common structure from training. Because inferring the state in an HMM is based on both the observation (the sound) and the previous state, by adding more training sequences it is possible to capture typical phrases.
For example, there are likely several instances in the training data where the word "English" is followed by the word "grammar", whereas there are very few where it is followed by "grandma", thus the dynamic aspect of the model will capture these connections naturally.
It seems as though modern speech recognition (e.g., through Android, iOS phones) make use of grammar...