EECS595: Natural Language Processing
Homework 4, Fall 2023
Due 10/30/2023
Student Name: xxx — uniqname: xxx
Submission Guidelines
1. Please insert your student information in line 63 of this LATEX file;
2. Please insert your answers between each pair of \begin{solution} and \end{solution};
3. Zip the files and submit to Canvas. Checklist: hw4.pdf.
Problem 1: Probabilistic Context Free Grammar
Your friend decides to build a Treebank. He finally produces a corpus which contains the following three parse trees:
NP John
S
V1 said
VP
COMP that
SBAR
NP Sally
S
1
VP V2 snored
ADVP loudly
VP
S
NP Sally
VP
SBAR COMP
V1 declared
S
that
NP
Bill VP ADVP
NP Fred
S
V1 pronounced
VP
COMP that
VP
V2 quickly ran
SBAR
NP Jeff
S
VP
VP ADVP V2 elegantly
swam
You then purchase the Treebank and decide to build a PCFG, and a parser, using your friend’s data. Now answer the following three questions:
1. (Written) Show the PCFG that you would derive from this Treebank.
2. (Written) Show two parse trees for the string “Jeff pronounced that Fred snored loudly”, and calculate their probabilities under the PCFG.
3. (Written) You are surprised that “Jeff pronounced that Fred snored loudly” has two possible
Solution:
Solution:
Page 2
parses, and that one of them - that Jeff is doing the pronouncing loudly - has relatively high probability. This type of high attachment is never seen in the corpus, so the PCFG is clearly missing something. You decide to fix the Treebank, by altering some non-terminal labels in the corpus. Show one such transformation which results in a PCFG that gives zero probability to parse trees with high attachments. (Your solution should systematically refine some non- terminals in the Treebank, in a way that slightly increases the number of non-terminals in the grammar, but allows the grammar to capture the distinction between high and low attachment to VPs.)
Problem 2: Dependency Parsing
This exercise is to get you familiar with dependency parsing and the Stanford CoreNLP [1] toolkit. You may also need to consult the inventory of universal dependency relations. You have two options to complete this exercise.
• Install the toolkit. Please check Stanza and follow the instructions to install the toolkit. You may need to use the toolkit for your final project.
• Run the demo system. You can also use the demo system without installing the toolkit.
You should experiment with different sentences and paragraphs to get some feeling about how the
parser works. In particular, you need to run the following paragraph and answer some questions.
The unveiling event for the innovative ChatGPT was shared online yesterday. This event, powered by the potent GPT-4, was projected for next month but was expedited after AI enthusiasts showed an enormous interest. All individuals now have the chance to explore its advanced capabilities. The AI community, though already familiar with preceding models, is buzzing with discussions and analyses. OpenAI confirmed that GPT-3.5/GPT-4 was the driving force behind ChatGPT, leading to its accelerated launch and widespread acclaim.
Please answer the following questions:
1. (Written) Give three examples where the parsed results are incorrect.
2. (Written) What would be the correct relation for each of these examples you identified above? Consult the university dependency documentation of relations to answer this question.
Solution:
Solution:
Solution:
Page 3
3. (Written) What is your general impression on the parsed results? Does the length of the sentence affect the performance?
References
[1] Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014, June). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60).