1. Homepage
  2. Programming
  3. Assignment 1: Kaggle and the Catalogue of US Cybersecurity Breaches

Assignment 1: Kaggle and the Catalogue of US Cybersecurity Breaches

Engage in a Conversation
ShellKaggleCybersecurity Breaches

Assignment 1 2023 CourseNana.COM

Submission deadline: 11:59pm, 20 April 2023.
To be done individually.
CourseNana.COM

This assignment will involve creating a Shell script, which will use Unix tools and/or call other Shell scripts. The top-level script has been given a name. Please make sure you use the specified name, as that is the name which the testing software will use to test your script. CourseNana.COM

You need to package the script or scripts into a single submission consisting of a zip file, and submit the zip file via cssubmit. No other method of submission will be accepted. CourseNana.COM

Kaggle and the Catalogue of US Cybersecurity Breaches CourseNana.COM

Kaggle is a remarkable web-based, data science resource which contains a a huge number of different data sets and tutorials on tools. (Highly recommended.) One particular data set is the Catalogue of Cyber Security Breaches (US data, so the states listed are the two letter US state codes). CourseNana.COM

I have downloaded for you the file and done some data cleaning, e.g. filling in missing values in the year column from the date of breach column. You can find the tidied file as a tab-separated, i.e. .tsv, file Cyber_Security_Breaches.tsv. This is similar to the the more familar .csv file, except that a <TAB> character is used to separate items in a line, rather than comma, which means it is much easier to deal with data that has embedded commas. CourseNana.COM

Before you go any further, I suggest you download Cyber_Security_Breaches.tsv and have a look at the data found in each column. (Actually, this is good general advice: Each time you start work with a new data-set, have a look at the data, in part or whole, first, to get an idea of what is expected in each column. What usually happens next is a data-cleaning step, but that has been done for you here.) CourseNana.COM

The top level program must be called cyber_breaches, and should expect two arguments each time it is called: the name of the .csv data file being used for this analysis, followed by a command. CourseNana.COM

cyber_breaches commands CourseNana.COM

There are four sorts of commands, one of which must appear as the second argument for each call to cyber_breaches. CourseNana.COM

  • maxstate
    The program should report the code for the state that has the largest number of incidents across all years, and the corresponding count. If there are more than one such state, just report one of them
  • maxyear
    Report the year with the greatest number of incidences across all the states, and the corresponding count. If there are more than one such year, just report one of them
  • A two letter state code
    For the named state, eport the year with the maximum number of incidents, and the count. (If more than one, any one of them.)
  • A four digit year
    For the named year, report the state with the maximum number of incidents for that year, and the count. (If more than one, any one of them.)

Some sample queries CourseNana.COM

Here is a sample session: CourseNana.COM

% cyber_breaches Cyber_Security_Breaches.tsv maxstate CourseNana.COM

State with greatest number of incidents is: CA with count 113 CourseNana.COM

  CourseNana.COM

% cyber_breaches Cyber_Security_Breaches.tsv maxyear CourseNana.COM

Year with greatest number of incidents is: 2013 with count 254 CourseNana.COM

  CourseNana.COM

% cyber_breaches Cyber_Security_Breaches.tsv 2010 CourseNana.COM

State with greatest number of incidents for 2010 is in TX with count 18 CourseNana.COM

  CourseNana.COM

% cyber_breaches Cyber_Security_Breaches.tsv TX CourseNana.COM

Year with greatest number of incidents for TX is in 2010 with count 18 CourseNana.COM

  CourseNana.COM

% cyber_breaches Cyber_Security_Breaches.tsv maxnear CourseNana.COM

The max commands are either maxstate or maxyear CourseNana.COM

Your submission will be tested automatically against a range of seen, and unseen example. However, a human marker will be assessing you program's outputs for the range of tests, which means that the output format your program uses does not much matter for the auto-testing. However, do be aware of readability of your code and the program outputs. (See below for discussion of Style.) CourseNana.COM

Marking criteria CourseNana.COM

The program will be marked out of 20. Marking of programs will primarily be on the basis of how the programs deal with different types of input, both input that conforms to expectations - similar to the examples - and error state input that anti-bugging should catch. For example, typos, in this case of a file name, are common. Your program should deal gracefully with all error states. First of all, by catching error inputs, you ensure that ridiculous output does not result from erroneous input. In short, stop silly things from happening. However, beyond that, error messages need to be as informative as possible, so users know what went wrong. You therefore need to consider the ways users inputs may not conform to what your system is expecting and add testing to catch those issues. CourseNana.COM

The remaining 20% will be for style/maintainability. Programs are written as much for human as for computers. As such, it is important that your code be readable and mantainable. Similarly, outputs should aim to be informative (but ever verbose). CourseNana.COM

Style Rubric CourseNana.COM

Much of this has been discussed in classes, but includes comments, meaningful variable names for significant variables (i.e. not throw away variables such as loop variables), and sensible anti-bugging. It also includes making sure your program removes any temporary files that were created along the way. CourseNana.COM

For the style/maintainability mark, the rubric is: CourseNana.COM

0 x < 1 CourseNana.COM

Gibberish, impossible to understand CourseNana.COM

1 x < 2 CourseNana.COM

Style is really poor, but can see where the train of thought may be heading CourseNana.COM

2 x < 3 CourseNana.COM

Style is acceptable with some lapses CourseNana.COM

3 x < 4 CourseNana.COM

Style is good or very good, with small lapses CourseNana.COM

4 CourseNana.COM

Excellent style, really easy to read and follow CourseNana.COM

Note: Automated testing is being used so that all submitted programs are being tested the same way. Sometimes it happens that there is one mistake in a program that means that no tests are passed. If the marker is able to spot the cause and fix it readily, then they are allowed to do that and your - now fixed - program will score whatever it scores from the tests, minus 2 marks, because other students will not have had the benefit of marker intervention. That's way better than getting zero for the run-time tess, right? (On the other hand, if the bug is too hard to fix, the marker needs to move on to other submissions.) CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Shell代写,Kaggle代写,Cybersecurity Breaches代写,Shell代编,Kaggle代编,Cybersecurity Breaches代编,Shell代考,Kaggle代考,Cybersecurity Breaches代考,Shellhelp,Kagglehelp,Cybersecurity Breacheshelp,Shell作业代写,Kaggle作业代写,Cybersecurity Breaches作业代写,Shell编程代写,Kaggle编程代写,Cybersecurity Breaches编程代写,Shellprogramming help,Kaggleprogramming help,Cybersecurity Breachesprogramming help,Shellassignment help,Kaggleassignment help,Cybersecurity Breachesassignment help,Shellsolution,Kagglesolution,Cybersecurity Breachessolution,