EECS595: Natural Language Processing
Homework 4, Fall 2023
Due 10/30/2023
Student Name: xxx — uniqname: xxx
Submission Guidelines
1. Please insert your student information in line 63 of this LATEX file;
2. Please insert your answers between each pair of \begin{solution} and \end{solution};
3. Zip the files and submit to Canvas. Checklist: hw4.pdf.
Problem 1: Probabilistic Context Free Grammar
Your friend decides to build a Treebank. He finally produces a corpus which contains the following three parse trees:
NP John
S
V1 said
VP
You then purchase the Treebank and decide to build a PCFG, and a parser, using your friend’s data. Now answer the following three questions:
1. (Written) Show the PCFG that you would derive from this Treebank.
2. (Written) Show two parse trees for the string “Jeff pronounced that Fred snored loudly”, and calculate their probabilities under the PCFG.
3. (Written) You are surprised that “Jeff pronounced that Fred snored loudly” has two possible
Solution:
Solution:
parses, and that one of them - that Jeff is doing the pronouncing loudly - has relatively high
probability. This type of high attachment is never seen in the corpus, so the PCFG is clearly
missing something. You decide to fix the Treebank, by altering some non-terminal labels in
the corpus. Show one such transformation which results in a PCFG that gives zero probability
to parse trees with high attachments. (Your solution should systematically refine some non-
terminals in the Treebank, in a way that slightly increases the number of non-terminals in the
grammar, but allows the grammar to capture the distinction between high and low attachment
to VPs.)
Problem 2: Dependency Parsing
This exercise is to get you familiar with dependency parsing and the Stanford CoreNLP [1] toolkit. You may also need to consult the inventory of universal dependency relations. You have two options to complete this exercise.
• Install the toolkit. Please check Stanza and follow the instructions to install the toolkit. You may need to use the toolkit for your final project.
• Run the demo system. You can also use the demo system without installing the toolkit.
You should experiment with different sentences and paragraphs to get some feeling about how the
parser works. In particular, you need to run the following paragraph and answer some questions.
The unveiling event for the innovative ChatGPT was shared online yesterday. This event, powered by the potent GPT-4, was projected for next month but was expedited after AI enthusiasts showed an enormous interest. All individuals now have the chance to explore its advanced capabilities. The AI community, though already familiar with preceding models, is buzzing with discussions and analyses. OpenAI confirmed that
References
[1] Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014, June). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60).