Tutorial |
:: |
Collecting and Analyzing Data with TGrep2 |
||||
---|---|---|---|---|---|---|
|
||||||
Finding Complement Clauses in Parsed CorporaExerciseTry to find all complement clauses that either have a "that" complementizer or no complementizer at all. An example would be "Neal didn't think (that) I would mess around with this". Examples like "He didn't believe what he saw." are not what we are looking for.SolutionWhen you search for 'TOP << told', you find that complement clauses are labelled with SBAR. So a do a search specifically for that tag. We add the condition that the SBAR be dominated by VP (> means 'is dominated by') so we can see the verb and its complement. The VP is marked by a backquote ` so it will be printed, instead of the head node of the pattern (as is the default).
This gives you outputs like the following: (VP (VBP think) (SBAR (IN that) (S (EDITED (RM (-DFL- \[)) (NP-SBJ (PRP they)) (, ,) (IP (-DFL- \+))) (NP-SBJ_MARKABLE_human (PRP they) (RS (-DFL- \]))) (VP (VBD had) (NP_MARKABLE_oanim (NP_MARKABLE_oanim (DT a) (JJ great) (NN deal)) (PP-UNF (IN of) (, ,) (INTJ (UH um)))))))) This is an example of what we want, but it also gets: (SBAR (WHNP_MARKABLE_human (N 400945) (WP who)) (, ,) (INTJ (UH uh)) (, ,) (S (NP-SBJ_MARKABLE (-NONE- (N 400945))) (VP (VBD ran) (NP_MARKABLE_org (NP_MARKABLE_org (DT the) (NN nursing) (NN home)) (PP-LOC (IN in) (NP_MARKABLE_place (PRP$_MARKABLE_human our) (JJ little) (NN hometown))))))) So this search not only gets complement clauses, but also gets free-relatives, which are headed by a wh-element, so we can modify the search to rule those out:
/^WH/ is a regular expression that means: "beginning with WH". The caret (^) represents the beginning of the node. The exclamation point (!) represents negation. This still gives us some complement clauses that don't start with "that": (SBAR (IN Because) (S (EDITED (RM (-DFL- \[)) (S (NP-SBJ (PRP it)) (VP-UNF (VBD was))) (, ,) (IP (-DFL- \+))) (PRN (S (NP-SBJ_MARKABLE_human (PRP you)) (VP (VBP know))) (, ,)) Although complementizer "that" is tagged IN, other complementizers, such as "because", are as well. This search removes other complementizers:
What we have added means, "make sure that the SBAR does not dominate a complementizer [IN] that does not dominate that". But now we still have to rule out reduced subject-extracted relative clauses: (SBAR -NONE- (S (NP-SBJ_MARKABLE (-NONE- (N 400AB0))) (VP (VBD was) (ADJP-PRD (JJ interesting))))) Therefore we need to require S to dominate a non-zero subject:
This search also gets extraposition structures, though: (VP (BES 's) (ADJP-PRD (JJ proven)) (SBAR (N 40500B) (IN that) (S (NP-SBJ_MARKABLE_nonconc (PRP it)) (VP (VBZ is) (RB n't) (ADJP-PRD (JJ true)))))) So we need to require SBAR to be the sister of a verb:
The backquote (`) tells Tgrep2 to print out the node it precedes. The greater-than sign (>) means "is dominated by". This search should give us all of the (properly-annotated) complement clauses with that or zero-complementizers. |
||||||
|