1. Homepage
  2. Homework
  3. CISC7021 - Applied Natural Language Processing - Assignment 1: 𝑛-gram language models
This question has been solved

CISC7021 - Applied Natural Language Processing - Assignment 1: 𝑛-gram language models

Engage in a Conversation
MacauCISC7021Applied Natural Language Processing𝑛-gram language modelsSRILM

CISC7021 - Applied Natural Language Processing Assignment 1, 2023/2024
(Due date:
26 September 2023) CourseNana.COM

Introduction CourseNana.COM

In this assignment, we will prepare 𝑛-gram language models and evaluate the test set's perplexity. We will learn how to create a language model using the language model toolkit SRILM 1 (Stolcke, 2002). The toolkit can be downloaded at: http://www.speech.sri.com/projects/srilm/download.html. Basic instructions on using the SRILM toolkit can be found on the website also. CourseNana.COM

Train and Test Data CourseNana.COM

The training and testing data for this assignment come from the News Commentary, which is created to be used for training the English language model. The training data consists of 300 thousand lines of text. While the testing set consists of around 90 thousand lines of text. The data corpora are from the official website of Shared Task: Machine Translation of News.2 Both the training and testing data can be downloaded from UMMoodle. CourseNana.COM

Tasks CourseNana.COM

  1. Build word-based language models, 1-gram, 2-gram, and 3-gram, for English text given the training data, and measure the perplexity on the training and testing set. CourseNana.COM

  2. Build character-based language models, 1-gram to 6-gram, using the training data CourseNana.COM

    and measuring the perplexity of the training and test set. CourseNana.COM

  3. Collect more monolingual data from the First Conference on Machine Translation CourseNana.COM

    (WMT16) and add them to the training data. Build language models and measure the perplexity. CourseNana.COM

Environment Setup CourseNana.COM

We require all the related (development) tools for course assignments and projects are Linux/Unix programs. You need to have a Linux platform for conducting experiments and system implementation. Using a virtual machine (i.e. WM Virtual Box - https://www.virtualbox.org/) to host a Linux system (i.e. Ubuntu - http://www.ubuntu.com/) will be a good choice. We strongly recommend this. Besides, you will use different toolkits for various (pre)processing tasks in the coursework. For example, you need a g++ compiler for compiling the SRILM toolkit in this assignment. CourseNana.COM

1 http://www.speech.sri.com/projects/srilm/download.html 2 http://www.statmt.org/wmt16/translation-task.html CourseNana.COM

CourseNana.COM

In any way, there are documents for using the toolkit. If you are new to processing text on the Linux platform, there is a very good introduction given by Church (1994)3 of using Unix commands for basic text processing. CourseNana.COM

Report CourseNana.COM

You need to submit a report of your work (2~3 pages). It should clearly present what is going on in your experiments, how you achieve them, and solve problems you encountered. You should include tables (or graphs) of the data (e.g. corpora statistics), evaluated perplexities, etc. of your models. I am particularly interested to see the conclusions you draw about the models you made and the data you collected, as well as the analysis of the obtained results. The report should follow the two-column format of the ACL proceeding.4,5 CourseNana.COM



CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Macau代写,CISC7021代写,Applied Natural Language Processing代写,𝑛-gram language models代写,SRILM代写,Macau代编,CISC7021代编,Applied Natural Language Processing代编,𝑛-gram language models代编,SRILM代编,Macau代考,CISC7021代考,Applied Natural Language Processing代考,𝑛-gram language models代考,SRILM代考,Macauhelp,CISC7021help,Applied Natural Language Processinghelp,𝑛-gram language modelshelp,SRILMhelp,Macau作业代写,CISC7021作业代写,Applied Natural Language Processing作业代写,𝑛-gram language models作业代写,SRILM作业代写,Macau编程代写,CISC7021编程代写,Applied Natural Language Processing编程代写,𝑛-gram language models编程代写,SRILM编程代写,Macauprogramming help,CISC7021programming help,Applied Natural Language Processingprogramming help,𝑛-gram language modelsprogramming help,SRILMprogramming help,Macauassignment help,CISC7021assignment help,Applied Natural Language Processingassignment help,𝑛-gram language modelsassignment help,SRILMassignment help,Macausolution,CISC7021solution,Applied Natural Language Processingsolution,𝑛-gram language modelssolution,SRILMsolution,