- CL - computational linguistics
- NLP - natural language processing
- NLU - natural language understanding
- NLG - natural language generation
- Semantics, pragmatics, discourse, syntax, phonology, phonetics, morphology
- Morpheme - inflection, derivation; free, bound (prefixes, suffixes, infixes, circumfixes)
- Language - isolating, synthetic
- Lemmatization
- Lexicon
- Morphotactics
- FSA - finite state automata - Q Σ q
_{0}σ F - Porter stemmer
- Morphological parsing - surface (foxes), intermediate (fox^s#), underlying (fox +N +Pl)
- FST - finite state transducer - FSA with output symbols (Δ)
- Word - tokens, types
- Term frequency - TF(w, S)
- Relative frequency - RF(w, S)
- Corpus
- Zipf’s Law, Zipf-Mandelbrow Law
- Long tail
- N-grams - unigram, bigram, trigram
- k-fold cross validation
- Entropy - H(p)
- Cross entropy - H(p, q)
- Estimation

- Perplexity - Perplexity(p, q)
- MLE - maximum likelihood estimation
- Good-turing smoothing
- Naïve Bayes
- Supervised, semi-supervised, unsupervised learning
- Clustering, grammar induction, word relatedness

- POS - part of speech
- Classification, regression
- N-grams, bag of words
- Naïve Bayes
- POS, modals, auxiliary verbs, conjunctions, particles
- Open classes, neologisms, content words
- Closed classes, function words
- Forward algorithm, backward algorithm, forward & backward, viterbi algorithm
- Random restarts, biased initialization
- Chunking (shallow parsing), named-entity recognition
- IOB tagging
- Generative, discriminative
- LC-CRF - Linear-chain conditional random fields
- Constituency
- FSA vs CFG
- CYK
- PCFG
- Constituent parsing, dependency parsing
- Hearst
- Hypernymy, hyponymy
- Synonymy, antonymy
- Polysemy
- Homonymy, homophone, homograph
- Metonymy, synecdoche
- Holonymy, meronymy
- WordNet, synset
- Lesk, Yarowsky, bootstrapping
- PMI - pointwise mutation information
- SVD - singular value decomposition
- Word embeddings
- Co-compositionality
- Domain of discourse, variables, predicates, functions, logical connectives, quantifiers
- FOL - first order logic
- Lambda calculus, beta reduction, type-raising
- Neo-davidsonian event semantics
- Russell’s definite descriptions
- Cooper storage, store
- Discourse markers
- Referent, referring expression
- Anaphora, antecedents, cataphora, zero anaphora, bridging reference
- Pleonastic pronouns, clefting
- Hobb

- Morpheme - subpart of a word that cannot be broken down any further
- Free morphemes can occur on their own
- Bound morphemes must occur with other morphemes as parts of words
- Prefix (before), suffix (after), infix (between), circumfix (around)

- Inflectional morphology - expresses grammatical function required by language
- Derivational morphology - derives a new word, possibly of a different part of speech
- Isolating languages - low morpheme-to-word ratio
- Synthetic languages - high morpheme-to-word ratio
- Computation Tasks
- Morphological recognition - is this a well formed word?
- Stemming - cut affixes off to find the stem
- Morphological analysis
- Lemmatization - remove inflectional morphology & recover lemma
- Full morphological analysis - recover full structure

- FSA - finite state automata

- Word - smallest unit that can appear in isolation
- Convenient assumption is that words are delimited by spaces
- Word frequency
- Term frequency: TF(w, S) = #w in corpus S
- eg TF(
*cat*, the cat sat on the mat) = 1

- eg TF(
- Relative frequency: RF(w, S) = TF(w, S) / |S|
- eg RF(
*cat*, the cat sat on the mat) = ⅙

- eg RF(

- Term frequency: TF(w, S) = #w in corpus S
- Corpus - collection of text; used for text to count
- Zipf’s Law: f ∝ 1/r
- Frequency of word type is inversely proportional to rank of word type (by frequency)

- Zipf-Mandelbrow Law - better fit
- f = P/(r + ρ)
^{B}or log f = log P - B log(r + ρ)

- f = P/(r + ρ)

- Entropy - minimum number of bits needed to communicate some message if we know what probability distribution the message is drawn from
- Cross entropy is for when we don’t know the distribution
- H(p, q) = -Σ
^{k}_{i = 1}p_{i}log_{2}q_{i} - Estimate: H(p, q) = -(1/N) log
_{2}q(w_{1}…w_{N}) - Perplexity(p, q) = 2
^{H(p, q)} - Eg: [A B C B B], M: P(A) = 0.3, P(B) = 0.4, P(C) = 0.3
- q = 0.3 * 0.4</sup>3</sup> * 0.3
- Perp(M) = 2
^{-⅕log2q}

- H(p, q) = -Σ
- Overfitting - model that is tailored to the training data, making it a poor representation of new data
- OOV - out of vocabulary
- Explicitly modelling OOV items
- Assign vocabulary items less frequent than some frequency threshold
`<UNK>`

- Treat
`<UNK>`

as some vocabulary word - Use
`<UNK>`

for unseen words during testing

- Assign vocabulary items less frequent than some frequency threshold
- Smoothing - shift probability mass to cases that we haven’t seen before/are unsure about on unseen data
- Good-turing smoothing
- N = Σ
_{i}f_{i}x i - c* = (c + 1)f
_{c+1}/f_{c} - P(w
_{c}) = c*/N - P(UNK) = f
_{1}/N - Problem with high c as f
_{c+1}is often 0. Solution is to estimate log f_{c}^{LR}= a log c + b (LR → linear relationship)

- N = Σ

- Supervised learning - model has access to some input data and corresponding output data (eg a label)
- Unsupervised learning - model has only input data
- Clustering - finding hidden structure in data without labels
- Learning - creating good characterization of data
- Semi-supervised learning - outputs for some inputs but not all
- Grey Area
- Specific rules
- eg anything ending in -ed is a verb
- Variously called semi-supervised, distantly supervised, minimally supervised, or unsupervised

- Give seed set
- eg learning sentiment lexicon; gives positive seeds (good, awesome, great) and negative seeds (bad, awful, dreadful)

- Specific rules
- Regression - y is a continuous outcome
- Classification - y is a discrete outcome
- Linear Regression
- N-grams
- Aka bag-of-words
- Versions
- Presence of absence of an N-gram (1 or 0)
- Count of N-gram
- Proportion of total document
- Scaled versions of counts

- POS tags
- Crudely captures syntactic patterns
- Needs preprocessing

- Naïve Bayes
- P(y|x) = P(y)∏
_{i}P(x_{i}|y)/P(x)

- P(y|x) = P(y)∏

- Part of speech
- Identifies some grammatical properties of words
- Includes
- Model & auxiliary words -
__Did__the chicken cross the road? - Conjunctions -
__and__,__but__ - Particles - look
__up__, turn__down__

- Model & auxiliary words -

- Penn Treebank (PTB) Tagset
- Open class - words tend to be added (neologism)
- Noun, verb, adjective, etc

- Closed class - new words tend not to be added
- Pronoun, determiners, quantifiers, conjunctions, modals & auxiliary, preposition

- PTB distinguishes between singular and plural nouns , but not transitive and intransitive verbs

- P(outcome i) = #(outcome i)/#(all events)
- π
_{i}= P(Q_{1}= 1) = #(Q_{1}= 1)/#(sentences) - a
_{ij}= P(Q_{t+1}= j | Q_{t }= i) = #(i, j)/#(i) - b
_{ik}= P(O_{t}= k | Q_{t}= i) = #(word k, tag i)/#(i) - Forward algorithm
- Uses DP to avoid unnecessary recalculations
- Create table of all possible state sequences, annotated with probabilities
- Trellis - table for possible state sequences
- Runtime O(N
^{2}T)

- Backward algorithm
- Trellis with cells β
_{i}(t) - Unlike α
_{i}(t), it excludes the current word

- Trellis with cells β
- Forward & backward
- P(O|θ) = Σ
^{N}_{i=1}α_{i}(t)β_{i}(t) for any t = 1 .. T

- P(O|θ) = Σ
- Log sum trick - to avoid underflow (from small numbers), work in log domain
- Viterbi algorithm - similar to forward algorithm, but replace summation with max
- Used to find most likely state sequence
- Backpointers - keep track of max entry; work backwards to recover best label sequence

- Unsupervised training
- No state sequences; have to estimate them
- Initialize params randomly
- Viterby EM (‘Hard EM)
- Prediction and updates using Viterbi algorithm

- Soft predictions - probabilities of all possible state sequences

- Chunking - find syntactic chunks in sentence; not hierarchical
- Named-entity recognition (NER) - identify elements in text corresponding to high level categories (eg person, organization, location)
- IOB Tagging - label whether word is inside, outside, or at beginning of span
- Eg for Org, McGill University would be labelled B I, as both compose a single organization

- Generative models - learn joint distributions P(X, Y)
- Can be applied to new models, unlike discriminative

- Discriminative models - learn conditional distributions P(Y|X)
- Standard HMM - product of probabilities
- Forward algorithm - P(X|θ)
- Viterbi algorithm - argmax
_{y}P(X, y|θ)

- Linear-chain conditional random fields (CRF) - Sums over features & time-steps
- Replaces HMM products by numbers that are not probabilities, but linear combinations of weights & feature values
- Allows word templates
- Forward algorithm - Z(X)
- Viterbi algorithm - argmax
_{Y}P(Y|X, θ)

- To avoid overfitting, make weights close to zero
- Done be adding an extra term (-θ
_{k}/σ^{2}) for regularization

- Done be adding an extra term (-θ

- Linear models
- Includes logistic regressions, support vector machines, etc
- Cannot learn complex non-linear functions (eg starts with capital and not at beginning of sentence -> proper noun)

- Artificial neural network
- Automatically learns non-linear functions from input to output

- Feedforward
- All connections flow forward (no loops)
- Each layer of hidden units fully connected to next

- Activation function
- Linear combination of inputs & weights → non-linearity
- Popular choices
- Sigmoid function
- Tanh function
- Rectifier/ramp function: g(x) = max(0, x)

- Need non-linearity, otherwise we are just transforming a linear function into another linear function

- Loss function
- Measures how much error is made during predictions
- y - correct distribution
- ŷ - predicted distribution
- L(y, ŷ) - loss function

- Training
- Typically by stochastic gradient descent
- Back propagation - derive new values from error signal back to inputs

- Recurrent neural network
- Long-term dependencies in language
- Dependencies between words can be arbitrarily far apart
- Eg look up [x] can be written as look [x] up

- Cannot easily model with HMMs or LC-CRFs, but possible with RNNs

- Dependencies between words can be arbitrarily far apart

- Syntax
- How words can be arranged to form grammatical sentence
- Invalid sentences are prefixed with
`*`

- Constituency
- Group of words that behave as a unit
- Noun phrases - three people on the bus, Justin Trudeau
- Adjective phrases - blue, very good

- Tests for constituency
- Check if they can appear in similar syntactic environment
- Eg I saw [it, three people on the bus, *on the]

- Can be placed in different positions/replaced as a unit
- Can be used to answer a questions

- Check if they can appear in similar syntactic environment

- Group of words that behave as a unit
- Grammatical relations
- Subject - tend to come before verbs
- Direct object - the boy kicked
__the ball__ - Indirect object - she gave
__him__a good beating - Some verbs require different number of arguments
- Eg relax (subj), steal (subj, dobj), give (subj, iobj, dobj)

- Formal grammar - rules that generate a set of strings that make up a language
- Format of A → B

- Constituent tree
- Trees & sentences generated by previous rules
- Entries on the LHS of arrow are non-terminals
- Entries on the RHS of arrows are terminals

- free grammar (CFG)
- 4-tuple
- N - set of non-terminal symbols
- Σ - set of terminal symbols
- R - set of rules or productions of form A → (Σ ∪ N)*, and A ∈ N
- S - designated start symbol, S ∈ N

- 4-tuple

- Parsing algorithms
- Top-down - start at top of tree, expanding downwards with rewrite rules
- Eg Earley parser

- Bottom-up - start from input words and build bigger subtrees until a tree spans the whole sentence
- Eg CYK algorithm, shift-reduce parser

- Top-down - start at top of tree, expanding downwards with rewrite rules
- CYK Algorithm
- DP algorithm - partial solutions stored & efficiently reused
- Steps
- Convert CFG to appropriate form
- Set up table of possible constituents
- Fill in table
- Read table to recover all possible parsers

- CNF Rules
- A → B C D becomes A → X1 D and X1 → B C
- A → s B becomes A → X2 B and X2 → s
- A → B - replace all B on LHS with A

- Set up table
- Let sentence have N words,
`w[0] ... w[N - 1]`

- Create table where cell i, j represents phrase spanning from
`w[i:j + 1]`

- Only need half table since i < j

- Let sentence have N words,
- Running through table
- Base case - one word phrase - check all non-terminals

- PCFG - CFG with probabilities for each rule
- For each nonterminal A ∈ N
- Σ
_{α → β ∈ R s.t. α = A}Pr(α → β) = 1

- Σ

- For each nonterminal A ∈ N
- Extend CYK to PCFG by calculating partial probabilities as you fill the table
- Only best probability is included per cell

- Train PCFG with treebank, such as WSG
- Simple version: MLE estimate
- Pr(α → β) = #(α → β) / #α
- Not great:
- No context
- Example: NPs in subject and object positions are not identically distributed

- Too sparse
- WIth the word ‘ate’, ‘ate quickly’, ‘ate with a fork’, ‘ate a sandwich’, etc are all labelled differently. Should be factorized, eg having an adverbial modifier

- No context

- Not great:

- Pr(α → β) = #(α → β) / #α

- Simple version: MLE estimate

- Adding context to PCFG
- Append parent categories (eg
`NP^S`

→ PRP to`NP^VP`

→ PRP)

- Append parent categories (eg
- Removing context from long chain rule by splitting them into separate rule
- Eg Pr(VP → START AdvP VBD NP PP END) to
- Pr(VP → START AdvP)
- Pr(VP → AdvP VBD)
- Pr(VP → VBD NP)
- Pr(VP → NP PP)
- Pr(VP → PP END)

- Eg Pr(VP → START AdvP VBD NP PP END) to
- Markovization
- Vertical markovization - adding ancestors as context
- Zeroth order - vanilla PCFGs
- First order - one parent only (see above)
- nth order - add n parents

- Horizontal markovization - breaking RHS into parts
- Infinite order - vanilla PCFGs
- First order - pairs (see above)
- Can choose any order with interpolations

- Vertical markovization - adding ancestors as context
- Semantics - study of meaning in language
- Meaning of words - lexical semantics

- Referent - person/thing to which a expression refers
- Extensional definition - all things for which a definition applies, as opposed to intention or comprehension
- Intensional definition - necessary and sufficient conditions
- Multiple words can refer to the same referent. Used in different contexts → different senses.
- Example of venus, morning star, evening star, which are all venus

- Hyponym/hypernym - ISA (is a) relationship
- Monkey & mammal, Montreal & city, wine & beverage, etc. ([hyponym] is a [hypernym])

- Synonym/Antonym - roughly same/different meaning
- Homonymy
- Homophone - same sound (son vs sun)
- Homograph - same written form (lead (noun) vs lead (verb))

- Polysemy - multiple related meanings
- Eg newspaper
- Daily weekly publication
- Business that publishes newspaper (meaning above)
- Physical object that is the product of a newspaper publisher
- Cheap paper made from wood pulp

- Eg newspaper
- Homonymy is typically with unrelated meaning, whereas polysemy is with related meaning
- Metonymy - substitution of one entity for another related one
- Eg
- Ordering a
__dish__(food, not a physical dish) - I work for the local
__paper__(place, not object) __Quebec City__is cutting our budget again (government, not location)- The
__loonie__is at a 11-year low (value, not physical loonie)

- Ordering a
- Synecdoche - specific kind involving whole-part relations
- All
__hands__on deck

- All

- Eg
- Holonymy/meronymy - whole part relationships
- Holonym → whole, meronym → part

- Computational approach
- Heuristic - eg Lesk’s
- Supervised ML - lots of work
- Unsupervised - eg Yarowsky’s

- Lesk’s
- Given a sentence, construct a word bag from definitions of all senses of context words, get the overlaps between each sense and the word in question, then select the one with the highest overlap

- Yarowsky’s
- Gather data set with target word to be disambiguated
- Automatically label small seed set of examples
- Repeat for a while
- Train from seed set
- Apply model to entire data set
- Keep highly confident classification outputs to be new seed set

- Use last model as final model

- Hearst patterns - pairs of terms in hyponym-hypernym relationships tend to occur in certain lexico-syntactic patterns
- Can find relations through key words (cause, induce, stem (from), etc)
- Can find relations through co-occurring words (such as, and, or)
- Can find relations through words that tend to co-occur with target words

- Term-context matrix
- Each row is a vector representation of word through context
- Context can refer to nearest ‘x’ neighbouring words

- Each row is a vector representation of word through context
- Cosine similarity
- sim(A, B) = (A · B)/(||A|| ||B||)
- -1 if vectors point in opposite direction, 0 if orthogonal, 1 if same direction
- Positive (eg count) vectors are scored within 0 and 1

- Cosine similarity is not enough to determine synonymy
- High cosine similarity can also indicate antonyms, word senses

- Similarity - synonymy, hypernymy, hyponymy
- Relatedness - anything that might be associated (eg good and bad)
- Cosine similarity is really a measure of relatedness
- Rescaling vectors
- Pointwise mutual information (PMI)
- pmi(w
_{1}, w_{2}) = log (P(w_{1}, w_{2})/(P(w_{1})P(w_{2}))) - Numerator - probability of both words occurring
- Denominator - probability of each word occurring in general

- pmi(w
- If ratio < 1, PMI is negative
- Often disregarded → positive PMI (PPMI)

- Pointwise mutual information (PMI)
- Compressing term-context matrix
- Often very sparse, lots of zeros
- Singular value decomposition (SVD)
- X = W x Σ x C
^{T}- m is rank of matrix X
- Rows of W are new word vectors
- Rows of C (cols of C
^{T}) are new context vectors - Σ is diagonal matrix of singular values of X (sqrt of eigenvalues of X
^{T}X, from highest to lowest)

- X = W x Σ x C
- Truncated SVD
- Throw out some singular values (lowest ones) in Σ

- Word embeddings
- Neural network models

- Skip-gram

- Validation
- Word similarity
- a : b :: c : d - what is d?
- Eg man : king :: woman : ? (man is to king as woman is to
__queen__)

- Eg man : king :: woman : ? (man is to king as woman is to
- Solved by assuming vector differences are 0: v
_{a}- v_{b}= v_{c}- v_{d}

- a : b :: c : d - what is d?

- Word similarity
- Can download pretrained word2vec embeddings from web
- Advantages - trained, large quantity, cheap, easy to try
- Disadvantages - doesn’t always work

- Compositionality - meaning of phrase depends on meanings of its parts
- Idioms often violate compositionality

- Co-compositionality - meanings of words depend also on other words that they are composed with
- Considering words such as rose, wine, cheeks, red does not combine compositionally, but rather co-compositionally

- Semantic inference - making explicit something that is implicit
- I want to visit the capital of Italy; capital of Italy is Rome; ∴ I want to visit Rome

- Montagovian semantics - using logical formalism to represent natural language inferences
- First-order predicate calculus
- Domain of discourse - set of entities we care about
- Variables - potential elements of domain
- Typically lower case

- Predicates - maps elements of domain to truth values
- Can be different valences (eg takes multiple elements)

- Functions - maps elements to other elements
- Can be different valences
- Valence 0 function is a constant

- Can be different valences
- Logical connectives - ¬, ∧, ∨, →, ↔
- Quantifiers - ∃, ∀

- Interpretation of first order logic (FOL)
- Domain of discourse (D)
- Mapping for functions to elements of D
- Mapping for predicates to True or False
- All students who study and do homework will get an A
- ∀x.(study(x) ∧ hw(x)) → grade(x) = A

- Lambda calculus - describes computation using mathematical functions
- Definitions
- Variables (eg x)
- λx.t, t is a lambda term
- ts, t and s are lambda terms

- Function application (or beta reduction)
- Given (λx.t)s, replace all instances of x in t with expression s
- Left associative - abcd = ((ab)c)d
- (λx.x + y)2 → 2 + y
- (λx.xx)(λx.x) = (λx.x)(λx.x) = λx.x

- Allows for partial computations

- Definitions
- Syntax-driven semantic composition
- Augment CFG with lambda expressions (syntactic composition = function application)
- Semantic attachments
- Syntactic composition: A → a
_{1}… a_{n} - Semantic attachment: {f(a
_{j}.sem, …, a_{k}.sem)}

- Syntactic composition: A → a
- Proper noun: PN → COMP550 to {COMP550}
- Type-raised proper noun: PN → COMP550 to {λx.x(COMP550)}
- We create a function taking in the PN as argument

- NP rule: NP → PN to {PN.sem}
- Common noun: N → student to {λx.Student(x)}
- Type ⟨e, t⟩; takes an entity and returns a truth value

- Intransitive verbs: V → rules to {λx.∃e.Rules(e) ∧ Ruler(e, x)}
- Created an event variable e

- Composition: S → NP VP to {NP.sem(VP.sem)}

- Neo-Davidsonian event semantics
- Method 1: multi-place predicate: Rules(x)
- Method 2: Neo-Davidsonian version with event variable: ∃e.Rules(e) ∧ Ruler(e, x)
- Reifying event variable makes things more flexible
- Optional elements such as location and time
- Can add information to event variable

- Reifying event variable makes things more flexible

- Neo-Davidsonian semantics cont.
- Transitive verbs: V → enjoys to {λw.λz.w(λx.∃e.Enjoys(e) ∧ Enjoyer(e, z) ∧ Enjoyee(e, x))}

- Note that for quantifiers, universal quantifiers use →, while existential ones use ∧
- Russell (1905)’s Definite descriptions
- How to represent ‘the’ in FOL?
- In “the student took Comp550”, we must enforce
- An entity who is a student
- At most one thing referred as a student
- Student participates in some predicate (“took Comp550”)

- ∃x.Student(x) ∧ ∀y.(Student(y) → y = x) ∧ took(x, COMP550

- In “the student took Comp550”, we must enforce
- Det → a to {λP.λQ.∃x. P(x) ∧ Q(x)}
- Det → the {λP.λQ.∃x P(x) ∧ ∀y(P(y) → y = x)}

- How to represent ‘the’ in FOL?
- Scopal ambiguity
- Eg. Every student took a course
- every > a: for all students, there exists a course that was taken by that student
- a > every: there exists a course for which all students took

- Eg. Every student took a course
- Underspecified representation - meaning representation that can embody all possible readings without explicitly enumerating lal of them
- Cooper Storage
- Associate a store with each FOL so that each reading can be recovered
- Every student took a course
- ∃e.took(e) ∧ taker(e, s
_{1}) ∧ takee(e, s_{2})- (λQ.∀x.Student(x) → Q(x), 1)
- (λQ.∃y.Course(y) ∧ Q(y), 2)

- We do not specify the order of each sub expression

- ∃e.took(e) ∧ taker(e, s
- Compositional rules now modify the inside part of a store, and introduce new idnex variables

- Coherent - property of discourse that makes sense
- Contains logical structure/meaning

- Incoherent - eg two completely unrelated sentences
- Cohesion - use of linguistic devices to tie together text units
- Lexical cohesion - related words in passage
- Discourse markers - cue words for discourse relations
- Eg also, and, therefore, however

- Referent - actual entity in the world
- Referring expressions - phrases that refer to the referent
- Referring expressions towards the same referent are said to corefer
- Antecedent - thing that exists before or logically precedes another
- Anaphor - points to an antecedent that precedes it
- Cataphor - points to an antecedent that follows it
- Zero anaphora - ommitting pronouns in certain contexts
- Languages with this are referred to as pro-drop
- Eg No habl-o español
- Others can be dependent on context

- Bridging reference - reference to entities that can be inferred from previously mentioned entity
- I like my office.
__The windows__are large.

- I like my office.
- Non-referential pronouns
- Pleonastic pronouns
- It is raining - not sure what ‘it’ is referring to

- Clefting
- It is he that is Bob - ‘it’ used to focus on some point

- Pleonastic pronouns
- Pronominal anaphora resolution
- Relevant info to identify pronouns include gender & number (eg 3Sg), syntactic information, and recency

- C-command - reflexives that must be bound by a subject in certain syntactic relationships
- Eg ‘the students taught themselves’, themselves refers to the students, if it was ‘them’ it would refer to some other group

- Hobb’s algorithm
- Begin at NP node i mmediately dominating the pronoun
- Go to first NP or S above it. Call this node X and the path to it p
- Do left to right breadth first traversal of all branches below X to left o p. Propose as antecedent any NP node encountered that has an NP or S between it and X
- If X is highest S in sentence, consider parse trees of previous sentences in recency order, and traverse each in left to right breadth first order. When NP encountered, propose as antecendent. If X not highest S, continue to step 5
- From X, go to first NP or S above it. Call this node X and path to it p
- If X is NP and p doens’t pass through Nominal that X immediately dominates, propose X as antecedent
- Do left to right breadth first traversal of all branches below X to left of p. Propose any NP encountered as antecedent
- If X is S, do left to right breadth first traversal of all branches below X to right of p, but don’t go below any NP or S encountered, Propose any NP encountered as antecedent
- Go to step 4

- This-anaphora
- May refer to complex chunks of information

- Midterm review
- CYK