N Gram Language Models

Predicting the next word in a sentence

Example: Get probability of `house` given `This is a`.... -> Fourgram

Process:

Get a corpus

This is a house The turtle is fat This is a turtle They lay on the ground

1 sentence has house given This is a
2 fourgrams in total have This is a So P(house|This is a) = 1/2

Example: Get probability of `Jack` given `that` -> Bigram

Same process as above: This is a house This is a turtle This is a house that Jack built They lay on the ground - The turtle is fat That lay in the house that Jack built

2 sentences have Jack given that
3 bigrams in total have That So P(Jack|That) = 2/3

Predicting probability of sequence of words

Sequence of words = (w1w2w3...)

Bigram Language Model (n=2)

P(Sequence of words) = P(w1) P(w2|w1) ... P(wk|wk-1)
P(Sequence of words) = P(w1|start) P(w2|w1) ... P(wk|wk-1) P(end|wk)

Example:

Corpus: This is the malt That lay in the house that Jack build

P(this is the house) = P(this) P(is|this) P(the|is) P(house|the)

Simple/Bad Probability

P(this)

1/2

P(is this)

P(the is)

P(house the)

1/2

Note: P(this) was not calculated like P(w1), it was rather calculated as P(w1|start) where start is equal to 2 since there are two ways to start the sentence

Question:

Is this normalized for all sequence lengths? (i.e. Do they all add up to probability 1.0?)
No! They are normalized for each sequence length.
- P(this) + P(that) = 1.0
- P(this this) + P(this is) + ... + P(built built) = 1.0

Note: This approach has limitations with memory as reported here. - LSTMs are better for next-word predictions as it remembers longer sentences.

PreviousKey Word Finder NextNeural Networks

Last updated 3 years ago

Was this helpful?