View the Project on GitHub

Last modified on 17 Dec 2021

Comp 550

Course Website

Terms & Concepts

  • CL - computational linguistics
  • NLP - natural language processing
  • NLU - natural language understanding
  • NLG - natural language generation
  • Semantics, pragmatics, discourse, syntax, phonology, phonetics, morphology
  • Morpheme - inflection, derivation; free, bound (prefixes, suffixes, infixes, circumfixes)
  • Language - isolating, synthetic
  • Lemmatization
  • Lexicon
  • Morphotactics
  • FSA - finite state automata - Q Σ q0 σ F
  • Porter stemmer
  • Morphological parsing - surface (foxes), intermediate (fox^s#), underlying (fox +N +Pl)
  • FST - finite state transducer - FSA with output symbols (Δ)
  • Word - tokens, types
  • Term frequency - TF(w, S)
  • Relative frequency - RF(w, S)
  • Corpus
  • Zipf’s Law, Zipf-Mandelbrow Law
  • Long tail
  • N-grams - unigram, bigram, trigram
  • k-fold cross validation
  • Entropy - H(p)
  • Cross entropy - H(p, q)
    • Estimation
  • Perplexity - Perplexity(p, q)
  • MLE - maximum likelihood estimation
  • Good-turing smoothing
  • Naïve Bayes
  • Supervised, semi-supervised, unsupervised learning
    • Clustering, grammar induction, word relatedness
  • POS - part of speech
  • Classification, regression
  • N-grams, bag of words
  • Naïve Bayes
  • POS, modals, auxiliary verbs, conjunctions, particles
  • Open classes, neologisms, content words
  • Closed classes, function words
  • Forward algorithm, backward algorithm, forward & backward, viterbi algorithm
  • Random restarts, biased initialization
  • Chunking (shallow parsing), named-entity recognition
  • IOB tagging
  • Generative, discriminative
  • LC-CRF - Linear-chain conditional random fields
  • Constituency
  • FSA vs CFG
  • CYK
  • PCFG
  • Constituent parsing, dependency parsing
  • Hearst
  • Hypernymy, hyponymy
  • Synonymy, antonymy
  • Polysemy
  • Homonymy, homophone, homograph
  • Metonymy, synecdoche
  • Holonymy, meronymy
  • WordNet, synset
  • Lesk, Yarowsky, bootstrapping
  • PMI - pointwise mutation information
  • SVD - singular value decomposition
  • Word embeddings
  • Co-compositionality
  • Domain of discourse, variables, predicates, functions, logical connectives, quantifiers
  • FOL - first order logic
  • Lambda calculus, beta reduction, type-raising
  • Neo-davidsonian event semantics
  • Russell’s definite descriptions
  • Cooper storage, store
  • Discourse markers
  • Referent, referring expression
  • Anaphora, antecedents, cataphora, zero anaphora, bridging reference
  • Pleonastic pronouns, clefting
  • Hobb

Lecture 1 - 2018/09/04

Lecture 2 - 2018/09/06

  • Morpheme - subpart of a word that cannot be broken down any further
    • Free morphemes can occur on their own
    • Bound morphemes must occur with other morphemes as parts of words
      • Prefix (before), suffix (after), infix (between), circumfix (around)
  • Inflectional morphology - expresses grammatical function required by language
  • Derivational morphology - derives a new word, possibly of a different part of speech
  • Isolating languages - low morpheme-to-word ratio
  • Synthetic languages - high morpheme-to-word ratio
  • Computation Tasks
    • Morphological recognition - is this a well formed word?
    • Stemming - cut affixes off to find the stem
    • Morphological analysis
      • Lemmatization - remove inflectional morphology & recover lemma
      • Full morphological analysis - recover full structure
  • FSA - finite state automata

Lecture 3 - 2018/09/11

  • Word - smallest unit that can appear in isolation
  • Convenient assumption is that words are delimited by spaces
  • Word frequency
    • Term frequency: TF(w, S) = #w in corpus S
      • eg TF(cat, the cat sat on the mat) = 1
    • Relative frequency: RF(w, S) = TF(w, S) / |S|
      • eg RF(cat, the cat sat on the mat) = ⅙
  • Corpus - collection of text; used for text to count
  • Zipf’s Law: f ∝ 1/r
    • Frequency of word type is inversely proportional to rank of word type (by frequency)
  • Zipf-Mandelbrow Law - better fit
    • f = P/(r + ρ)B or log f = log P - B log(r + ρ)

Lecture 4 - 2018/09/13

  • Entropy - minimum number of bits needed to communicate some message if we know what probability distribution the message is drawn from
  • Cross entropy is for when we don’t know the distribution
    • H(p, q) = -Σki = 1 pilog2qi
    • Estimate: H(p, q) = -(1/N) log2 q(w1…wN)
    • Perplexity(p, q) = 2H(p, q)
    • Eg: [A B C B B], M: P(A) = 0.3, P(B) = 0.4, P(C) = 0.3
      • q = 0.3 * 0.4</sup>3</sup> * 0.3
      • Perp(M) = 2-&frac15;log2q
  • Overfitting - model that is tailored to the training data, making it a poor representation of new data
  • OOV - out of vocabulary
  • Explicitly modelling OOV items
    • Assign vocabulary items less frequent than some frequency threshold <UNK>
    • Treat <UNK> as some vocabulary word
    • Use <UNK> for unseen words during testing
  • Smoothing - shift probability mass to cases that we haven’t seen before/are unsure about on unseen data
  • Good-turing smoothing
    • N = Σifi x i
    • c* = (c + 1)fc+1/fc
    • P(wc) = c*/N
    • P(UNK) = f1/N
    • Problem with high c as fc+1 is often 0. Solution is to estimate log fcLR = a log c + b (LR → linear relationship)

Lecture 5 - 2018/09/18

  • Supervised learning - model has access to some input data and corresponding output data (eg a label)
  • Unsupervised learning - model has only input data
  • Clustering - finding hidden structure in data without labels
  • Learning - creating good characterization of data
  • Semi-supervised learning - outputs for some inputs but not all
  • Grey Area
    • Specific rules
      • eg anything ending in -ed is a verb
      • Variously called semi-supervised, distantly supervised, minimally supervised, or unsupervised
    • Give seed set
      • eg learning sentiment lexicon; gives positive seeds (good, awesome, great) and negative seeds (bad, awful, dreadful)
  • Regression - y is a continuous outcome
  • Classification - y is a discrete outcome
  • Linear Regression
  • N-grams
    • Aka bag-of-words
    • Versions
      • Presence of absence of an N-gram (1 or 0)
      • Count of N-gram
      • Proportion of total document
      • Scaled versions of counts
  • POS tags
    • Crudely captures syntactic patterns
    • Needs preprocessing
  • Naïve Bayes
    • P(y|x) = P(y)∏iP(xi|y)/P(x)

Lecture 6 - 2018/09/20

  • Part of speech
    • Identifies some grammatical properties of words
    • Includes
      • Model & auxiliary words - Did the chicken cross the road?
      • Conjunctions - and, but
      • Particles - look up, turn down
  • Penn Treebank (PTB) Tagset
  • Open class - words tend to be added (neologism)
    • Noun, verb, adjective, etc
  • Closed class - new words tend not to be added
    • Pronoun, determiners, quantifiers, conjunctions, modals & auxiliary, preposition
  • PTB distinguishes between singular and plural nouns , but not transitive and intransitive verbs

Lecture 7 - 2018/09/25

  • P(outcome i) = #(outcome i)/#(all events)
  • πi = P(Q1 = 1) = #(Q1 = 1)/#(sentences)
  • aij = P(Qt+1 = j | Qt = i) = #(i, j)/#(i)
  • bik = P(Ot = k | Qt = i) = #(word k, tag i)/#(i)
  • Forward algorithm
    • Uses DP to avoid unnecessary recalculations
    • Create table of all possible state sequences, annotated with probabilities
    • Trellis - table for possible state sequences
    • Runtime O(N2T)
  • Backward algorithm
    • Trellis with cells βi(t)
    • Unlike αi(t), it excludes the current word
  • Forward & backward
    • P(O|θ) = ΣNi=1 αi(t)βi(t) for any t = 1 .. T
  • Log sum trick - to avoid underflow (from small numbers), work in log domain
  • Viterbi algorithm - similar to forward algorithm, but replace summation with max
    • Used to find most likely state sequence
    • Backpointers - keep track of max entry; work backwards to recover best label sequence
  • Unsupervised training
    • No state sequences; have to estimate them
    • Initialize params randomly
    • Viterby EM (‘Hard EM)
      • Prediction and updates using Viterbi algorithm
    • Soft predictions - probabilities of all possible state sequences

Lecture 8 - 2018/09/27

  • Chunking - find syntactic chunks in sentence; not hierarchical
  • Named-entity recognition (NER) - identify elements in text corresponding to high level categories (eg person, organization, location)
  • IOB Tagging - label whether word is inside, outside, or at beginning of span
    • Eg for Org, McGill University would be labelled B I, as both compose a single organization
  • Generative models - learn joint distributions P(X, Y)
    • Can be applied to new models, unlike discriminative
  • Discriminative models - learn conditional distributions P(Y|X)
  • Standard HMM - product of probabilities
    • Forward algorithm - P(X|θ)
    • Viterbi algorithm - argmaxyP(X, y|θ)
  • Linear-chain conditional random fields (CRF) - Sums over features & time-steps
    • Replaces HMM products by numbers that are not probabilities, but linear combinations of weights & feature values
    • Allows word templates
    • Forward algorithm - Z(X)
    • Viterbi algorithm - argmaxYP(Y|X, θ)
  • To avoid overfitting, make weights close to zero
    • Done be adding an extra term (-θk2) for regularization

Lecture 9 - 2018/10/02

  • Linear models
    • Includes logistic regressions, support vector machines, etc
    • Cannot learn complex non-linear functions (eg starts with capital and not at beginning of sentence -> proper noun)
  • Artificial neural network
    • Automatically learns non-linear functions from input to output
  • Feedforward
    • All connections flow forward (no loops)
    • Each layer of hidden units fully connected to next
  • Activation function
    • Linear combination of inputs & weights → non-linearity
    • Popular choices
      • Sigmoid function
      • Tanh function
      • Rectifier/ramp function: g(x) = max(0, x)
    • Need non-linearity, otherwise we are just transforming a linear function into another linear function
  • Loss function
    • Measures how much error is made during predictions
    • y - correct distribution
    • ŷ - predicted distribution
    • L(y, ŷ) - loss function
  • Training
    • Typically by stochastic gradient descent
    • Back propagation - derive new values from error signal back to inputs
  • Recurrent neural network
  • Long-term dependencies in language
    • Dependencies between words can be arbitrarily far apart
      • Eg look up [x] can be written as look [x] up
    • Cannot easily model with HMMs or LC-CRFs, but possible with RNNs

Lecture 10 - 2018/10/04

  • Syntax
    • How words can be arranged to form grammatical sentence
    • Invalid sentences are prefixed with *
  • Constituency
    • Group of words that behave as a unit
      • Noun phrases - three people on the bus, Justin Trudeau
      • Adjective phrases - blue, very good
    • Tests for constituency
      • Check if they can appear in similar syntactic environment
        • Eg I saw [it, three people on the bus, *on the]
      • Can be placed in different positions/replaced as a unit
      • Can be used to answer a questions
  • Grammatical relations
    • Subject - tend to come before verbs
    • Direct object - the boy kicked the ball
    • Indirect object - she gave him a good beating
    • Some verbs require different number of arguments
      • Eg relax (subj), steal (subj, dobj), give (subj, iobj, dobj)
  • Formal grammar - rules that generate a set of strings that make up a language
    • Format of A → B
  • Constituent tree
    • Trees & sentences generated by previous rules
    • Entries on the LHS of arrow are non-terminals
    • Entries on the RHS of arrows are terminals
  • free grammar (CFG)
    • 4-tuple
      • N - set of non-terminal symbols
      • Σ - set of terminal symbols
      • R - set of rules or productions of form A → (Σ ∪ N)*, and A &in; N
      • S - designated start symbol, S &in; N

Lecture 11 - 2018/10/09

  • Parsing algorithms
    • Top-down - start at top of tree, expanding downwards with rewrite rules
      • Eg Earley parser
    • Bottom-up - start from input words and build bigger subtrees until a tree spans the whole sentence
      • Eg CYK algorithm, shift-reduce parser
  • CYK Algorithm
    • DP algorithm - partial solutions stored & efficiently reused
    • Steps
      • Convert CFG to appropriate form
      • Set up table of possible constituents
      • Fill in table
      • Read table to recover all possible parsers
  • CNF Rules
    • A → B C D becomes A → X1 D and X1 → B C
    • A → s B becomes A → X2 B and X2 → s
    • A → B - replace all B on LHS with A
  • Set up table
    • Let sentence have N words, w[0] ... w[N - 1]
    • Create table where cell i, j represents phrase spanning from w[i:j + 1]
      • Only need half table since i < j
  • Running through table
    • Base case - one word phrase - check all non-terminals
  • PCFG - CFG with probabilities for each rule
    • For each nonterminal A &in; N
      • Σα → β &in; R s.t. α = A Pr(α → β) = 1
  • Extend CYK to PCFG by calculating partial probabilities as you fill the table
    • Only best probability is included per cell
  • Train PCFG with treebank, such as WSG
    • Simple version: MLE estimate
      • Pr(α → β) = #(α → β) / #α
        • Not great:
          • No context
            • Example: NPs in subject and object positions are not identically distributed
          • Too sparse
            • WIth the word ‘ate’, ‘ate quickly’, ‘ate with a fork’, ‘ate a sandwich’, etc are all labelled differently. Should be factorized, eg having an adverbial modifier

Lecture 12 - 2018/10/11

  • Adding context to PCFG
    • Append parent categories (eg NP^S → PRP to NP^VP → PRP)
  • Removing context from long chain rule by splitting them into separate rule
    • Eg Pr(VP → START AdvP VBD NP PP END) to
      • Pr(VP → START AdvP)
      • Pr(VP → AdvP VBD)
      • Pr(VP → VBD NP)
      • Pr(VP → NP PP)
      • Pr(VP → PP END)
  • Markovization
    • Vertical markovization - adding ancestors as context
      • Zeroth order - vanilla PCFGs
      • First order - one parent only (see above)
      • nth order - add n parents
    • Horizontal markovization - breaking RHS into parts
      • Infinite order - vanilla PCFGs
      • First order - pairs (see above)
      • Can choose any order with interpolations
  • Semantics - study of meaning in language
    • Meaning of words - lexical semantics
  • Referent - person/thing to which a expression refers
  • Extensional definition - all things for which a definition applies, as opposed to intention or comprehension
  • Intensional definition - necessary and sufficient conditions
  • Multiple words can refer to the same referent. Used in different contexts → different senses.
    • Example of venus, morning star, evening star, which are all venus
  • Hyponym/hypernym - ISA (is a) relationship
    • Monkey & mammal, Montreal & city, wine & beverage, etc. ([hyponym] is a [hypernym])
  • Synonym/Antonym - roughly same/different meaning
  • Homonymy
    • Homophone - same sound (son vs sun)
    • Homograph - same written form (lead (noun) vs lead (verb))
  • Polysemy - multiple related meanings
    • Eg newspaper
      • Daily weekly publication
      • Business that publishes newspaper (meaning above)
      • Physical object that is the product of a newspaper publisher
      • Cheap paper made from wood pulp
  • Homonymy is typically with unrelated meaning, whereas polysemy is with related meaning
  • Metonymy - substitution of one entity for another related one
    • Eg
      • Ordering a dish (food, not a physical dish)
      • I work for the local paper (place, not object)
      • Quebec City is cutting our budget again (government, not location)
      • The loonie is at a 11-year low (value, not physical loonie)
    • Synecdoche - specific kind involving whole-part relations
      • All hands on deck
  • Holonymy/meronymy - whole part relationships
    • Holonym → whole, meronym → part
  • Computational approach
    • Heuristic - eg Lesk’s
    • Supervised ML - lots of work
    • Unsupervised - eg Yarowsky’s
  • Lesk’s
    • Given a sentence, construct a word bag from definitions of all senses of context words, get the overlaps between each sense and the word in question, then select the one with the highest overlap
  • Yarowsky’s
    • Gather data set with target word to be disambiguated
    • Automatically label small seed set of examples
    • Repeat for a while
      • Train from seed set
      • Apply model to entire data set
      • Keep highly confident classification outputs to be new seed set
    • Use last model as final model

Lecture 13 - 2018/10/16

  • Hearst patterns - pairs of terms in hyponym-hypernym relationships tend to occur in certain lexico-syntactic patterns
    • Can find relations through key words (cause, induce, stem (from), etc)
    • Can find relations through co-occurring words (such as, and, or)
    • Can find relations through words that tend to co-occur with target words
  • Term-context matrix
    • Each row is a vector representation of word through context
      • Context can refer to nearest ‘x’ neighbouring words
  • Cosine similarity
    • sim(A, B) = (A · B)/(||A|| ||B||)
    • -1 if vectors point in opposite direction, 0 if orthogonal, 1 if same direction
    • Positive (eg count) vectors are scored within 0 and 1
  • Cosine similarity is not enough to determine synonymy
    • High cosine similarity can also indicate antonyms, word senses
  • Similarity - synonymy, hypernymy, hyponymy
  • Relatedness - anything that might be associated (eg good and bad)
  • Cosine similarity is really a measure of relatedness
  • Rescaling vectors
    • Pointwise mutual information (PMI)
      • pmi(w1, w2) = log (P(w1, w2)/(P(w1)P(w2)))
      • Numerator - probability of both words occurring
      • Denominator - probability of each word occurring in general
    • If ratio < 1, PMI is negative
      • Often disregarded → positive PMI (PPMI)
  • Compressing term-context matrix
    • Often very sparse, lots of zeros
    • Singular value decomposition (SVD)
      • X = W x Σ x CT
        • m is rank of matrix X
        • Rows of W are new word vectors
        • Rows of C (cols of CT) are new context vectors
        • Σ is diagonal matrix of singular values of X (sqrt of eigenvalues of XTX, from highest to lowest)
    • Truncated SVD
      • Throw out some singular values (lowest ones) in Σ
    • Word embeddings
      • Neural network models
    • Skip-gram
  • Validation
    • Word similarity
      • a : b :: c : d - what is d?
        • Eg man : king :: woman : ? (man is to king as woman is to queen)
      • Solved by assuming vector differences are 0: va - vb = vc - vd
  • Can download pretrained word2vec embeddings from web
    • Advantages - trained, large quantity, cheap, easy to try
    • Disadvantages - doesn’t always work

Lecture 14 - 2018/10/18

  • Compositionality - meaning of phrase depends on meanings of its parts
    • Idioms often violate compositionality
  • Co-compositionality - meanings of words depend also on other words that they are composed with
    • Considering words such as rose, wine, cheeks, red does not combine compositionally, but rather co-compositionally
  • Semantic inference - making explicit something that is implicit
    • I want to visit the capital of Italy; capital of Italy is Rome; &therefore; I want to visit Rome
  • Montagovian semantics - using logical formalism to represent natural language inferences
  • First-order predicate calculus
    • Domain of discourse - set of entities we care about
    • Variables - potential elements of domain
      • Typically lower case
    • Predicates - maps elements of domain to truth values
      • Can be different valences (eg takes multiple elements)
    • Functions - maps elements to other elements
      • Can be different valences
        • Valence 0 function is a constant
    • Logical connectives - ¬, &wedge;, &vee;, →, ↔
    • Quantifiers - ∃, ∀
  • Interpretation of first order logic (FOL)
    • Domain of discourse (D)
    • Mapping for functions to elements of D
    • Mapping for predicates to True or False
    • All students who study and do homework will get an A
      • ∀x.(study(x) &wedge; hw(x)) → grade(x) = A
  • Lambda calculus - describes computation using mathematical functions
    • Definitions
      • Variables (eg x)
      • λx.t, t is a lambda term
      • ts, t and s are lambda terms
    • Function application (or beta reduction)
      • Given (λx.t)s, replace all instances of x in t with expression s
      • Left associative - abcd = ((ab)c)d
      • (λx.x + y)2 → 2 + y
      • (λx.xx)(λx.x) = (λx.x)(λx.x) = λx.x
    • Allows for partial computations
  • Syntax-driven semantic composition
    • Augment CFG with lambda expressions (syntactic composition = function application)
    • Semantic attachments
      • Syntactic composition: A → a1 … an
      • Semantic attachment: {f(aj.sem, …, ak.sem)}
    • Proper noun: PN → COMP550 to {COMP550}
    • Type-raised proper noun: PN → COMP550 to {λx.x(COMP550)}
      • We create a function taking in the PN as argument
    • NP rule: NP → PN to {PN.sem}
    • Common noun: N → student to {λx.Student(x)}
      • Type &langle;e, t&rangle;; takes an entity and returns a truth value
    • Intransitive verbs: V → rules to {λx.∃e.Rules(e) &wedge; Ruler(e, x)}
      • Created an event variable e
    • Composition: S → NP VP to {NP.sem(VP.sem)}
  • Neo-Davidsonian event semantics
    • Method 1: multi-place predicate: Rules(x)
    • Method 2: Neo-Davidsonian version with event variable: ∃e.Rules(e) &wedge; Ruler(e, x)
      • Reifying event variable makes things more flexible
        • Optional elements such as location and time
        • Can add information to event variable

Lecture 15 - 2018/10/23

  • Neo-Davidsonian semantics cont.
    • Transitive verbs: V → enjoys to {λw.λz.w(λx.∃e.Enjoys(e) &wedge; Enjoyer(e, z) &wedge; Enjoyee(e, x))}
  • Note that for quantifiers, universal quantifiers use →, while existential ones use &wedge;
  • Russell (1905)’s Definite descriptions
    • How to represent ‘the’ in FOL?
      • In “the student took Comp550”, we must enforce
        • An entity who is a student
        • At most one thing referred as a student
        • Student participates in some predicate (“took Comp550”)
      • ∃x.Student(x) &wedge; ∀y.(Student(y) → y = x) &wedge; took(x, COMP550
    • Det → a to {λP.λQ.∃x. P(x) &wedge; Q(x)}
    • Det → the {λP.λQ.∃x P(x) &wedge; ∀y(P(y) → y = x)}
  • Scopal ambiguity
    • Eg. Every student took a course
      • every > a: for all students, there exists a course that was taken by that student
      • a > every: there exists a course for which all students took
  • Underspecified representation - meaning representation that can embody all possible readings without explicitly enumerating lal of them
  • Cooper Storage
    • Associate a store with each FOL so that each reading can be recovered
    • Every student took a course
      • ∃e.took(e) &wedge; taker(e, s1) &wedge; takee(e, s2)
        • (λQ.∀x.Student(x) → Q(x), 1)
        • (λQ.∃y.Course(y) &wedge; Q(y), 2)
      • We do not specify the order of each sub expression
    • Compositional rules now modify the inside part of a store, and introduce new idnex variables

Lecture 16 - 2018/10/25

  • Coherent - property of discourse that makes sense
    • Contains logical structure/meaning
  • Incoherent - eg two completely unrelated sentences
  • Cohesion - use of linguistic devices to tie together text units
    • Lexical cohesion - related words in passage
    • Discourse markers - cue words for discourse relations
      • Eg also, and, therefore, however
  • Referent - actual entity in the world
  • Referring expressions - phrases that refer to the referent
  • Referring expressions towards the same referent are said to corefer
  • Antecedent - thing that exists before or logically precedes another
  • Anaphor - points to an antecedent that precedes it
  • Cataphor - points to an antecedent that follows it
  • Zero anaphora - ommitting pronouns in certain contexts
    • Languages with this are referred to as pro-drop
    • Eg No habl-o español
    • Others can be dependent on context
  • Bridging reference - reference to entities that can be inferred from previously mentioned entity
    • I like my office. The windows are large.
  • Non-referential pronouns
    • Pleonastic pronouns
      • It is raining - not sure what ‘it’ is referring to
    • Clefting
      • It is he that is Bob - ‘it’ used to focus on some point
  • Pronominal anaphora resolution
    • Relevant info to identify pronouns include gender & number (eg 3Sg), syntactic information, and recency
  • C-command - reflexives that must be bound by a subject in certain syntactic relationships
    • Eg ‘the students taught themselves’, themselves refers to the students, if it was ‘them’ it would refer to some other group
  • Hobb’s algorithm
    1. Begin at NP node i mmediately dominating the pronoun
    2. Go to first NP or S above it. Call this node X and the path to it p
    3. Do left to right breadth first traversal of all branches below X to left o p. Propose as antecedent any NP node encountered that has an NP or S between it and X
    4. If X is highest S in sentence, consider parse trees of previous sentences in recency order, and traverse each in left to right breadth first order. When NP encountered, propose as antecendent. If X not highest S, continue to step 5
    5. From X, go to first NP or S above it. Call this node X and path to it p
    6. If X is NP and p doens’t pass through Nominal that X immediately dominates, propose X as antecedent
    7. Do left to right breadth first traversal of all branches below X to left of p. Propose any NP encountered as antecedent
    8. If X is S, do left to right breadth first traversal of all branches below X to right of p, but don’t go below any NP or S encountered, Propose any NP encountered as antecedent
    9. Go to step 4

Lecture 17 - 2018/10/30

  • This-anaphora
    • May refer to complex chunks of information
  • Midterm review
    • CYK