bigram
Học thuậtThân thiện
Definition
Noun: A pair of consecutive written units, most commonly two consecutive letters, within a sequence of text. In computational linguistics and natural language processing, it is a fundamental unit for statistical analysis, often used to model the probability of character or word sequences.
Usage
A "bigram" is used to analyze the frequency and patterns of letter or word pairs. It is a technical term central to text analysis, cryptography, and language modeling. - In letter-level analysis, it refers to any two adjacent characters (e.g., "th" in "the"). - In word-level analysis, it refers to any two adjacent words (e.g., "black cat" in the phrase "a black cat").
Examples
- Letter Bigram:
- The word "hello" contains the letter bigrams: "he", "el", "ll", "lo".
- In English, the bigram "th" is one of the most frequent.
- Word Bigram:
- The sentence "I like apples" contains the word bigrams: "I like" and "like apples".
- Analyzing word bigrams helps predict the next likely word in a sentence.
Advanced Usage
- Statistical Language Modeling: Bigram models calculate the probability of a word given the previous word (P(word{n-1})). This is a simple but foundational type of N-gram model.
- Cryptanalysis: Studying letter bigram frequencies (like "qu", "ed") is a classic technique for breaking substitution ciphers.
- Text Prediction & Autocomplete: Systems often use bigram probabilities to suggest the next word as you type.
Variants and Related Words
- N-gram: A sequence of consecutive items (letters, syllables, words). A bigram is an N-gram where n=2.
- Trigram: A sequence of three consecutive items (n=3).
- Unigram: A single item sequence (n=1).
- Bigrammatic (adj): Pertaining to or consisting of bigrams.
Synonyms
- Digram (less common, sometimes used interchangeably in specific contexts like cryptography).
- 2-gram (a more technical synonym used in computational fields).
Related Phrases/Concepts
- Bigram Frequency: The count of how often a specific pair of letters or words appears in a corpus.
- Bigram Probability: The conditional probability of the second unit given the first.
- Bigram Collocation: A pair of words that frequently occur together (e.g., "high school", "traffic light").
Noun
- a word that is written with two letters in an alphabetic writing system