Assignment 1 (Individual Work)
You are to build an information retrieval system based on both (1) controlled vocabulary and (2) free text search approach for the simple query and retrieval of relevant documents from the given corpus (zipped file named dataset.zip containing a list of documents). The two approaches (1) and (2) can be independent or dependent of each other.
Students are to use Python programming learned during the course to build the system.
1) Use methods to study and understand the content of the documents found in the corpus.
2) Create a controlled vocabulary and a free text search engine to query and retrieve the relevant documents on the following topics: “Covid-19”, “Covid-19 and Property”, “Covid-19 and VTL”, “Covid-19 and Omicron”, and “Covid-19 not Omicron”.
3) Describe the Covid-19 topic and situation using the documents you have retrieved.
4) Compare the advantages and disadvantages of using the controlled vocabulary and free text search.
5) Describe possible ways of improving the information retrieval systems that you have created.
Students are required to submit a report (minimum 2000 words) detailing the above five points and the Python codes (not included in the minimum words) used for the IRS via TurnitIn under Assignment 1 submission.
Assignment 1 constitutes 15% of total marks.