1. Homepage
  2. Programming
  3. 811312A Data Structures and Algorithms, Spring 2023, Assignment: Most frequently occurring words

811312A Data Structures and Algorithms, Spring 2023, Assignment: Most frequently occurring words

Engage in a Conversation
FinlandUniversity of Oulu811312AData Structures and AlgorithmsC

811312A Data Structures and Algorithms 2023 Assignment, Instructions CourseNana.COM

This document describes both general and specific requirements for the assignment of the course Data Structures and Algorithms. CourseNana.COM

1. General Instructions CourseNana.COM

Everyone shall return one assignment implemented individually in C programming language and one written report. Return the solution and the report in one zip file in the course's Moodle workspace before the deadline of 23:59 on March 24, 2023. All late submissions will get reduced grade. See more detailed requirements later. CourseNana.COM

The assignment submission shall contain the source code for your program, possible header files, and your written report. See detailed instruction and requirements further down in this document and see the checklist. CourseNana.COM

The assignment shall be returned at latest 23.59 on March 24, 2022. 2. Description of the Assignment CourseNana.COM

The task CourseNana.COM

In this assignment, you shall find the 100 most frequently occurring words from a large text file. The program shall be implemented in C language. A continuous string of characters a to z and A to Z, with possible apostrophes ’, is considered a word. Words with uppercase and lowercase letters are considered equal. For example, in the text CourseNana.COM

Herman Melville’s book Moby Dick starts, as we all know, with the sentence ”Call me Ishmael”. the words are: “herman”, “melville’s”, “book”, “moby”, “dick”, “starts”, “as”, “we”, “all”, “know”, CourseNana.COM

“with”, “the”, “sentence”, “call”, “me”, and “ishmael” CourseNana.COM

A word does not contain separators. For instance, “book”, “book,” (“book” with a comma after it), and “book.” (“book” with a full stop after it) should all exclude the separator and be counted as “book”. CourseNana.COM

A number in numeral form is not a word. For instance, “1” is not a word, while “one” is. CourseNana.COM

The name of the input text file shall be given as an input from the user. That is, the input file should not be hardcoded into the source code of the program, but the program should ask the input file from the user. CourseNana.COM

The program shall print the total number of words in the input file, the number of different words in the input file, and the 100 most frequently occurring words and their frequencies. The words are printed in descending order according to their frequencies. The program shall also measure and print the processing time. CourseNana.COM

Because the input file can be very large, you need a suitable data structure to store words and their frequencies. This can be a hash table or a binary search tree. You can find the most frequent words by sorting the structure with some fast algorithm. The challenge of the task is that the input file your program will be tested with is very large (close to 3 million words). Therefore, regular table structure is out of the question, and linked list data structure is too slow. In addition, you will need allocate memory for the data structure and the elements, and free the memory at the end. CourseNana.COM

There are examples of input files and corresponding outputs in Moodle. All the files are texts written in English. You should use these to test your program. The biggest file, Bulk.txt, does not have an output file but you can use it to test if your solution is fast. Bulk.txt can also contain words of different languages, but uses the same English alphabet, so it should not be a problem. Output of your program is not necessarily exactly the same with large input files. If they are not, think about why and include it in your report. CourseNana.COM

The report CourseNana.COM

The report must be written in English, and it shall contain CourseNana.COM

1.     Student’s name CourseNana.COM

2.     Student’s University of Oulu ID number CourseNana.COM

3.     Description of the solution and CourseNana.COM

4.     Analysis of the solution program. CourseNana.COM

In the description of your solution, you shall explain what data structures and algorithms you have chosen and why. You shall also explain the most important functions of your program. CourseNana.COM

You shall do analysis of the efficiency of your program. The size of the input is measured as the number of words in the text file. Measure the running times of your program with given test input files and make an estimate, how many words there can be in a file that you can process in 15 seconds. Based on this, estimate the maximum size (in megabytes) of files that your program can process in the mentioned time, when we know that the average word length in English is 5.1 letters. CourseNana.COM

Analysis contains thus the measurements of program’s running times and the mentioned estimates for number of words and file size in 15 seconds. The report shall naturally contain student’s name and student’s University of Oulu ID number. If you wish, you can also give feedback of the course and/or assignment in your report. CourseNana.COM

Remember to zip the files before returning. CourseNana.COM

3. Detailedrequirements CourseNana.COM

The program CourseNana.COM

The program shall read words from text file input (.txt file extension). The program shall ask for input file from user when the program runs. CourseNana.COM

The program shall store the words it reads from a file into a suitable data structure. The program shall count how many words there are in an input file.
The program shall count how many different words there are in an input file.
The program shall count the frequency of each occurring word.
CourseNana.COM

The program shall measure the time spent for processing the input file. CourseNana.COM

The program shall print the total number of words in the input file. CourseNana.COM

The program shall print the number of different words found in the input file. CourseNana.COM

The program shall print the 100 most frequent words in descending order based on frequency and include the frequency. CourseNana.COM

The program shall print the time it took for it to process the file and give the result (print-out does not have to be included in the measurement of time) CourseNana.COM

The program shall execute the given task in reasonable time even with the larges test material (Bulk.txt) (NOTE! No optimization during compiling allowed. Times will be compared on teacher’s computer against a baseline measurements of example programs.) Over 10 seconds is unacceptable and should be done in just a couple of seonds. CourseNana.COM

The code CourseNana.COM

You shall in write in the C language. So, the file extension should be .c You shall write clean and readable code. CourseNana.COM

·       -  Decide on indentation and keep it consistent. CourseNana.COM

·       -  Comment code to explain functions and variables – comments shall be in English. CourseNana.COM

·       -  Use a consistent and meaningful name scheme for variables and functions. CourseNana.COM

o No magic numbers.
o No a, b, c, x, y variables. CourseNana.COM

You shall implement all data structures and algorithms on your own do not use library functions do not copy. CourseNana.COM

You shall give reference if you use available code found in study material and explain how you have used it. CourseNana.COM

The code shall not be dependent on any IDE – it should compile and run from command line. The program shall, before it terminates, free memory it has allocated.
The program reserves enough space for the input filename.
The report CourseNana.COM

You shall use the given template for the report.
The report shall include your name.
The report shall include your University of Oulu student ID number. The report shall be written in English.
CourseNana.COM

The report shall briefly present the chosen data structures and algorithms. - Explain how you solved the problem. CourseNana.COM

The report shall briefly explain the most important functions of your program. CourseNana.COM

·       -  Why these functions, what are they used for? CourseNana.COM

·       -  Function to read input from user or read the input file does not need explanation. CourseNana.COM

The report shall include measurements of running times for different example input files (at a minimum 3 different files) CourseNana.COM

The report shall include comparison of the output with given output files and your elaboration on why they are different if they are different. CourseNana.COM

The report shall include the estimate of how big files it can process in 15 seconds (the analysis) and whether this is realistic. CourseNana.COM

Your report may include feedback regarding the course (teaching, material, methods, anything) if you wish to include it. This will not affect your grade. CourseNana.COM

Submission CourseNana.COM

All files shall be included in one zip file.
You shall submit program source code whose file extension is .c and possible header files with extension .h
CourseNana.COM

·       -  No .ccp files. CourseNana.COM

·       -  No executable files. CourseNana.COM

·       -  No input files. CourseNana.COM

·       -  No additional files. CourseNana.COM

You shall submit the report as a word file, an open office file, or a pdf file.
You shall use English in your file names, or at least the letters of the English alphabet.
CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Finland代写,University of Oulu代写,811312A代写,Data Structures and Algorithms代写,C代写,Finland代编,University of Oulu代编,811312A代编,Data Structures and Algorithms代编,C代编,Finland代考,University of Oulu代考,811312A代考,Data Structures and Algorithms代考,C代考,Finlandhelp,University of Ouluhelp,811312Ahelp,Data Structures and Algorithmshelp,Chelp,Finland作业代写,University of Oulu作业代写,811312A作业代写,Data Structures and Algorithms作业代写,C作业代写,Finland编程代写,University of Oulu编程代写,811312A编程代写,Data Structures and Algorithms编程代写,C编程代写,Finlandprogramming help,University of Ouluprogramming help,811312Aprogramming help,Data Structures and Algorithmsprogramming help,Cprogramming help,Finlandassignment help,University of Ouluassignment help,811312Aassignment help,Data Structures and Algorithmsassignment help,Cassignment help,Finlandsolution,University of Oulusolution,811312Asolution,Data Structures and Algorithmssolution,Csolution,