Homepage
Programming
[2022] UniSA - INFS 2044 System Design and Realisation Assignment 2 Case Study

[2022] UniSA - INFS 2044 System Design and Realisation Assignment 2 Case Study

Engage in a Conversation

INFS2044 Assignment 2 Case Study CourseNana.COM

In this assignment you will be developing a system for extracting information from text files. The system will process text files, compute statistics about each file, and produce output in several different formats. Statistics computed by the system include the length of the file in words and the most frequent words in each file and their frequency. CourseNana.COM

CourseNana.COM

Use Cases CourseNana.COM

CourseNana.COM

The system supports a single use case: CourseNana.COM

CourseNana.COM

UC1 Compute Summary Statistics: CourseNana.COM

The user specifies the files to be processed using a command line application, specifies the number of most frequent words to be identified, and specifies one or more output file names. The application reads each file, computes the summary statistics, and writes the result to the given output file(s) in the appropriate output format(s). CourseNana.COM

Summary statistics to be generated in this use case: CourseNana.COM

· Number of words in each file CourseNana.COM

· Most frequent N words in each file CourseNana.COM

CourseNana.COM

The output format for each output file is determined by the file extension of each output file. Extensions and output files to be supported in this use case: CourseNana.COM

· .txt: Plain Text format. CourseNana.COM

· .csv: Comma-separated format. CourseNana.COM

CourseNana.COM

The specification of these file formats is given in section “File Formats” later in this document. CourseNana.COM

Future variations: CourseNana.COM

· Additional output formats could be introduced. CourseNana.COM

· Additional summary statistics could be introduced. CourseNana.COM

· Filters for text processing could be introduced (such as different ways of identifying words in a file, ignoring selected words, etc) CourseNana.COM

· The functions may be eventually offered via a REST API in addition to the console application. CourseNana.COM

CourseNana.COM

These variations are not in scope for your implementation in this assignment, but your design must be able to accommodate these extensions. CourseNana.COM

Example Command CourseNana.COM

The following command would find the top 10 most frequent words in each of the files a.txt, b.txt, and c.txt, and output the results in two formats: CSV format in out1.csv, and text format in out2.txt: (type the command all on one line) CourseNana.COM

CourseNana.COM

$ python word_statistics_app.py --number=10 --output=out1.csv --output=out2.txt a.txt b.txt c.txt CourseNana.COM

If the number of frequent words specified on the command line exceeds the total number of unique words in a file, then output the actual number of unique words and their frequencies. CourseNana.COM

File Formats CourseNana.COM

Text format (extension .txt): CourseNana.COM

Each line shows a short text containing the name of the file, the total length of the file in words, followed by the most frequent words and their frequencies in the file (in order of descending frequency; if there are multiple words with identical frequency, show them in ascending alphabetic order). CourseNana.COM

Example: CourseNana.COM

Suppose that file a.txt has 47 words in total (some of which may be occurrences of the same word), and that the most frequent words in that file are “the” (frequency 10),“cat” (frequency8), “a” (frequency 8),apple(frequency 4). CourseNana.COM

CourseNana.COM

The corresponding line in the output file would be (all on one line): CourseNana.COM

File a.txt contains 47 words. Frequent words are: the (10), a (8), cat (8), apple (4). CourseNana.COM

CSV format (extension .csv): CourseNana.COM

The information shown is the same as for the text format, except that the file name and statistics are delimited by commas. CourseNana.COM

For the above example, the row in the file would be: CourseNana.COM

a.txt,47,the,10,a,8,cat,8,apple,4. CourseNana.COM

You can assume that the file name does not contain commas and quotation marks. CourseNana.COM

Input File Format CourseNana.COM

All input files are plain text files. Each file may contain one or more lines of text. Words are delimited by one or more whitespaces (that is, space, tab, or newline characters). CourseNana.COM

CourseNana.COM

Decomposition CourseNana.COM

You must use the following component decomposition as the basis for your implementation design: CourseNana.COM

The responsibilities of the elements are as follows: CourseNana.COM

CourseNana.COM

Elements CourseNana.COM	Responsibilities CourseNana.COM
Console App CourseNana.COM	Interact with the user (acquire user options) CourseNana.COM
Word Stats Manager CourseNana.COM	Orchestrates the use case process (reading, tokenising, summarising, formatting, outputting) CourseNana.COM
Summarising Engine CourseNana.COM	Computes the summary statistics CourseNana.COM
Tokenising Engine CourseNana.COM	Splits the input text into tokens (words) CourseNana.COM
Formatting Engine CourseNana.COM	Generates output from summaries CourseNana.COM
File Access CourseNana.COM	Interacts with the file system to read & write files CourseNana.COM

CourseNana.COM

Scope CourseNana.COM

Your implementation must respect the boundaries defined by the decomposition and include classes for each of the elements in this decomposition. CourseNana.COM

The implementation must: CourseNana.COM

• run on python 3.10, and CourseNana.COM

• correctly implement the functions described in this document, and CourseNana.COM

• it must function correctly with any given plain text file (you can assume that the entire content of this file fits into main memory), and CourseNana.COM

• it must include a comprehensive unit test suite using pytest. CourseNana.COM

CourseNana.COM

Focus your attention on the quality of your code. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信)

Last: CPT106 C++ Programming and Software Engineering II

Next: [2022] SIT232 Object‐Oriented Development - Practical Task 8 - MyStack

INFS 2044代写,System Design and Realisation代写,UniSA代写,University of Southern Australia代写,INFS 2044代编,System Design and Realisation代编,UniSA代编,University of Southern Australia代编,INFS 2044代考,System Design and Realisation代考,UniSA代考,University of Southern Australia代考,INFS 2044help,System Design and Realisationhelp,UniSAhelp,University of Southern Australiahelp,INFS 2044作业代写,System Design and Realisation作业代写,UniSA作业代写,University of Southern Australia作业代写,INFS 2044编程代写,System Design and Realisation编程代写,UniSA编程代写,University of Southern Australia编程代写,INFS 2044programming help,System Design and Realisationprogramming help,UniSAprogramming help,University of Southern Australiaprogramming help,INFS 2044assignment help,System Design and Realisationassignment help,UniSAassignment help,University of Southern Australiaassignment help,INFS 2044solution,System Design and Realisationsolution,UniSAsolution,University of Southern Australiasolution,