bigram

Học thuật
Thân thiện
bigram

A child points to the bigram "it" in a storybook.

Definition

Noun: A pair of consecutive written units, most commonly two consecutive letters, within a sequence of text. In computational linguistics and natural language processing, it is a fundamental unit for statistical analysis, often used to model the probability of character or word sequences.

Usage

A "bigram" is used to analyze the frequency and patterns of letter or word pairs. It is a technical term central to text analysis, cryptography, and language modeling. - In letter-level analysis, it refers to any two adjacent characters (e.g., "th" in "the"). - In word-level analysis, it refers to any two adjacent words (e.g., "black cat" in the phrase "a black cat").

Examples
  • Letter Bigram:
    • The word "hello" contains the letter bigrams: "he", "el", "ll", "lo".
    • In English, the bigram "th" is one of the most frequent.
  • Word Bigram:
    • The sentence "I like apples" contains the word bigrams: "I like" and "like apples".
    • Analyzing word bigrams helps predict the next likely word in a sentence.
Advanced Usage
  • Statistical Language Modeling: Bigram models calculate the probability of a word given the previous word (P(word{n-1})). This is a simple but foundational type of N-gram model.
  • Cryptanalysis: Studying letter bigram frequencies (like "qu", "ed") is a classic technique for breaking substitution ciphers.
  • Text Prediction & Autocomplete: Systems often use bigram probabilities to suggest the next word as you type.
Variants and Related Words
  • N-gram: A sequence of consecutive items (letters, syllables, words). A bigram is an N-gram where n=2.
  • Trigram: A sequence of three consecutive items (n=3).
  • Unigram: A single item sequence (n=1).
  • Bigrammatic (adj): Pertaining to or consisting of bigrams.
Synonyms
  • Digram (less common, sometimes used interchangeably in specific contexts like cryptography).
  • 2-gram (a more technical synonym used in computational fields).
Related Phrases/Concepts
  • Bigram Frequency: The count of how often a specific pair of letters or words appears in a corpus.
  • Bigram Probability: The conditional probability of the second unit given the first.
  • Bigram Collocation: A pair of words that frequently occur together (e.g., "high school", "traffic light").
bigram

A child points to the bigram "it" in a storybook.

Noun
  1. a word that is written with two letters in an alphabetic writing system

Từ gần giống