/ CS224N

cs224n - Lecture 4. Dependency Parsing

Two views of linguistic structure: Phrase structure

  • Constituency = phrase structure grammar = context-free grammers(CFGs)
    Phrase structure organizes words into nested constituents

  • Starting unit: words (noun, preposition, adjective, determiner, …)
    the, $\ $ cat, $\ $ cuddly, $\ $ by, $\ $ door

  • Words combine into phrases
    the cuddly cat(noun phrase),
    by the door(prepositional phrase; preposition(by) + noun phrase)

  • Phrases can combine into bigger phrases
    the cuddly cat by the door(noun phrase)

  • Lexicon:
    $\text{N} \rightarrow \text{cat}$
    $\text{N} \rightarrow \text{door}$
    $\text{Det} \rightarrow \text{the}$
    $P \rightarrow \text{by}$
    $\text{Adj} \rightarrow \text{cuddly}$
    $\vdots$
  • Grammar:
    $\text{NP} \rightarrow \text{Det } \ \text{ (Adj)}^{\ast} \ \text{ N } \ \text{ (PP)}$
    $\text{PP} \rightarrow \text{P } \ \text{ NP}$
    $\vdots$

Two views of linguistic structure: Dependency structure

  • Dependency structure shows which words depend on (modify, attach to, or are arguments of) which other words.

Why do we need sentence structure?

  • Humans communicate complex ideas by composing words together into bigger units to convey complex meanings
  • Listeners need to work out what modifies attaches to what
  • A model needs to understand sentence structure in order to be able to interpret language correctly

Ambiguities

  • Prepositional phrase ambiguity
    • A key parsing decision is how we ‘attach’ various constituents
      • PPs, adverbial or participial phrases, infinitives, coordinations
        e.g.
        \(\begin{align*} \text{The board approved [its acquisition]} & \text{[by Royal Trustco Ltd.]} \\ & \text{[of Toronto]} \\ & \text{[for \$27 a share]} \\ & \text{[at its monthly meeting].} \end{align*}\)
    • With a sentence of k prepositional phrases at the end of it, the number of parses is given by the Catalan numbers; $C_n = (2n)!/[(n+1)!n!]$, an exponential series growing as the number of prepositional phrases.
  • Coordination scopre ambiguity
    e.g. Shuttle veteran and longtime NASA executive__ Fred Gregory appointed to board
    $\rightarrow$ 1 or 2 person?

  • Adjectival/Adverbial Modifier ambiguity
    e.g. Students get first hand job experience

  • Verb Phrase(VP) attachment ambiguity
    e.g. Mutilated body washes up on Rio beach to be used for Olympics beach volleyball

Dependency paths help extract semantic interpretation

  • simple practical example: extracting protein-protein interaction

png

Dependency Grammar and Dependency Structure

  • Dependency syntax postulates that syntactic structure consists of relations between lexical items, normally binary asymmetric relations (“arrows”) called dependencies

png

  • The arrows are commonly typed with the name of grammatical relations (subject, prepositional object, apposition, etc.)

  • An arrow connects a head(governor, superior, regent) with a dependent(modifier, inferior, subordinate)
  • Usually, dependencies form a tree(a connected, acyclic, single-root graph)

  • Check: some people draw the arrows one way; some the other way

  • Usually add a fake ROOT so every word is a dependent of precisely 1 other node

The rise of annotated data & Universal Dependencies treebanks

  • Advantages of treebank
    • Reusability of the labor
      • Many parsers, part-of-speech taggers, etc. can be built on it
      • Valuable resource for linguistic
    • Broad coverage, not just a few intuitions
    • Frequencies and distributional information(statistics)
    • A way to evaluate NLP systems

Dependency Conditioning Preferences

png

  • The sources of information for dependency parsing
    1. Bilexical affinities: The dependency $\text{discussion}\rightarrow\text{issues}$ is plausible
    2. Dependency distance: Most dependencies are between nearby words
    3. Intervening material: Dependencies rarely span intervening verbs or punctuation
    4. Valency of heads: How many dependents on which side are usual for a head?

Dependency Parsing

  • A sentence is parsed by choosing for each word what other word (including ROOT) it is a dependent of
  • Usually some constraints:
    • Only one word is a dependent of ROOT
    • Don’t want cycles $A\rightarrow B$, $B\rightarrow A$
  • This makes the dependencies a tree
  • Final issue is whether arrows can cross(be non-projective) or not

Projectivity

  • Definition of a projective parse: There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
  • Dependencies corresponding to a CFG tree must be projective
    • i.e., by forming dependencies by taking 1 child of each category as head
  • Most syntactic structure is projective like this, but dependency theory normally does allow non-projective structures to account for displaced constituents
    • You can’t easily get the semantics of certain constructions right without these nonprojective dependencies
  • e.g.
    From who did Bill buy the coffee yesterday?
    Who did Bill buy the coffee from yesterday

Methods of Dependency Parsing

  1. Dynamic programming, Eisner(1996): $O(n^3)$ complexity
  2. Graph algorithms, McDonald et al.(2005): creating a Minimun Spanning Tree
  3. Constraint Satisfaction, Karlsson(1990)
  4. “Transition-based parsing” or “deterministic dependency parsing”
    Greedy choice of attachments guided by good machine learning classifiers
    E.g., MaltParser, Nivre et al.(2008)

Greedy transition-based parsing, Nivre 2003

  • A simple form of greedy discriminative dependency parser
  • The parser does a sequence of bottom-up actions
    • Roughly like “shift” or “reduce” in a shift-reduce parser, but the “reduce” actions are specialized to create dependencies with head on left or right
  • The parser has:
    • a stack $\sigma$, written with top to the right
      • which starts with the ROOT symbol
    • a buffer $\beta$, written with top to the left
      • which starts with the input sentence
    • a set of dependency arcs A
      • which starts off empty
    • a set of actions

Basic transition-based dependency parser

png

  • Arc-standard transition-based parser
    E.g., Analysis of “I ate fish”

png
png

MaltParser, Nivre and Hall 2005

  • How we choose the next action?
    Answer: Machine Learning!
  • Each action is predicted by a discriminative classifier (e.g., softmax classifier) over legal move
    • Max of 3 untyped choices; max of $\lvert R \rvert \times 2 + 1 $ when typed
    • Features: top of stack word, POS; first in buffer word, POS; etc.
  • There is NO search(in the simplest form)
    • But you can profitably do a beam search if you wish(slower but better): You keep k good parse prefixes at each time step
  • The model’s accuracy is fractionally below the state of the art in dependency parsing, but it provides very fast linear time parsing, with high accuracy, great for parsing the web

Conventional Feature Representation

png

Evaluation of Dependency Parsing: (labeled) dependency accuracy

png