DTS303TC Big Data Security and Analytics
School of AI and Advanced Computing
Assessment 2 – Project
Wednesday, November 1st 23:59,2023 (China Time, GMT + 8)
DTS303TC Big Data Security and Analytics
Coursework 2 – Project
Submission deadline: 23:59, November 1st, 2023
Percentage in final mark: 60% Learning outcomes assessed: C, D Individual/Group: Individual
__________________________________________________________________
-
PART I: Data Cryptography and Access Control (20%)
Cryptography includes a set of techniques for scrambling or disguising data so that it is available only to someone who can restore the data to its original form. In current computer systems, cryptography provides a strong, economical basis for keeping data secret and for verifying data integrity. Please answer the following questions:
Question 1: (5 marks)
Perform some research and discuss the cryptosystems and encryption schemes used to securethe following applications.
-
(i) Privacy Enhanced Mail (PEM)
-
(ii) Secure Electronic Transactions (SET)
(iii) SecureSocketsLayer(SSL)
Note: Each answer only requires one or two sentences.
-
Question 2: (5 marks)
Perform some research and discuss the following criteria on how biometric data in access control
systems are evaluated.
-
(i) False reject rate
-
(ii) False accept rate
(iii) Crossovererrorrate
Note: Each answer only requires one or two sentences.
Question 3: (5 marks)
Decipher the following ciphertext which was encrypted with the Caesar cipher.
TEBKFKQEBZLROPBLCERJXKBSBKQP
What is the most likely plaintext? Show your reasoning on how you arrive at the answer.
Question 4: (5 marks)
Decipher the following ciphertext which was encrypted with the Vigenere cipher.
TSMVM MPPCW CZUGX HPECP RFAUE IOBQW PPIMS FXIPC TSQPK SZNUL
OPACR DDPKT SLVFW ELTKR GHIZS FNIDF ARMUE NOSKR GDIPH WSGVL
EDMCM SMWKP IYOJS TLVFA HPBJI RAQIW HLDGA IYOUX
What is the key and the most likely plaintext? Show your reasoning on how you arrive at the answer .
PART II: Big Data Analytics for Information Security (80%)
Task Summary
Big data analytics for security is a rising trend that is helping security analysts and tool vendors do much more with data. Machine learning techniques can help security systems identify patterns and threats with no prior definitions, rules or attack signatures, and with much higher accuracy. However, to be effective, machine learning needs very big data. The challenge is storing so much more data than ever before, analyzing it in a timely manner, and extracting new insights. An organization that utilizes security and analytics tools can detect potential threats before they can affect the company's assets and infrastructure. An important tool for organizations to manage information security is through access control and only giving access to legitimate users. In this section, we will focus on using biometrics for access control and information security.
Conduct a Big data science study in the security domain, for example, biometrics which utilizes fingerprint, face, iris or other modalities. Other examples in the security domain will be fraud
analytics, intrusion detection, etc. Write an individual report on your Big data security and analytics project. The report should be written in a clear and concise manner (and be no more than 2000 words in length). You should start by exploring a biometric modality that interests you. You need to identify a compact dataset (structured or unstructured) with a reasonable large size and number of attributes/variables in your chosen modality or modalities which can be used for the assessment. Your report should include the background of the chosen modality or modalities and the data analytics problem you attempt to solve, aims and objectives, significance of your study, and describe your analytics approach including the statistical method(s) and/or machine learning technique(s) you used to address the problem. You are required to submit an individual recorded video presentation to the Mediasite or other source which will be informed before the submission date.
Context
In recent years, information security has taken center stage in the personal and professional lives of the majority of the global population. Data breaches are a daily occurrence, and intelligent adversaries target consumers, corporations, and governments with practically no fear of being detected or facing consequences for their actions. This is all occurring while the systems, networks, and applications that comprise the backbones of commerce and critical infrastructure are growing ever more complex, interconnected, and unwieldy. Defenses built solely on the elements of faith- based security—unaided intuition and “best” practices—are no longer sufficient. The rising trend is for organizations to adopt the proven tools and techniques being used in other disciplines to take an evolutionary step into Data-Driven Security.
-
By completing this assessment item, you will acquire the knowledge of information security, data analytics and programming skills in Python to analyse the data from a security domain. You will also acquire the presentation skills necessary to present the analysis of the results in your report and recorded video to your audiences. This assessment will prepare you to address a Big data security and analytics/science problem in the real world.
Task Instructions
(1) Write a short individual project proposal to describe your Big data security and analytics project. Your project proposal should be written in a clear and concise manner (no more than 500 words or 1-page A4 size). You start by exploring an area or domain in biometrics which interests you. The project topic can be chosen from your target modality e.g., fingerprint, iris, face, palm print, etc. Show and discuss your proposal with the Teaching Assistant (TA) during the laboratory sessions. Please note that no mark will be given for
this short proposal. However, this short proposal should serve as your first document to plan for your Big data security and analytics project.
(2) Write a report on your Big data security and analytics project. The report should be written in a clear and concise manner (and be no more than 2000 words in length). Your final report should be detailed and address the following areas:
-
Clearly define the problem definition in your Big data security and analytics project.
-
Describe the significance of your Big data security and analytics project in the chosen
domain or area.
-
Identify a compact dataset (structured or unstructured) with a reasonable large size and number of attributes/variables in your chosen dataset. Some examples are shown in the table below.
Note 1: On the one hand, students aiming for “Excellent” or “Very Good” grades will pay attention to the complexity of the selected security dataset and advanced approaches/steps to perform the analytics. For example, students could demonstrate individual modality performances for palm print and knuckle print, and then show that a combined multimodality (palm print and knuckle print) approach could give higher performance. On the other hand, standard and/or conventional approaches/steps for a single modality solution would be likely awarded an “Adequate”, “Competent” or “Comprehensive” grade.
Security Domain
Fraud
Palm print and knuckle print
Fingerprint Hand tremor
Iris
Dataset
https://www.kaggle.com/datasets/kartik2112/fraud-detection
https://www.kaggle.com/datasets/michaelgoh/contactless- knuckle-palm-print-and-vein-dataset
https://www.kaggle.com/datasets/ruizgara/socofing
https://www.kaggle.com/datasets/hakmesyo/hand-tremor- dataset-for-biometric-recognition
https://www.kaggle.com/datasets/naureenmohammad/mmu- iris-dataset
-
Highlight the project aim and objectives.
-
Discuss the background of your chosen topic in the domain or area
-
Include evidence, such as tables, graphs and plots from the programming
codes, to support your results.
(3) Prepare and record a short individual presentation (5 minutes) to introduce and explain your Big data security and analytics project and its significance. Your presentation should list the data science question or problem, describe your analytics approach and the statistical and/or machine learning method(s) you used to address the data science problem. Present and discuss the results of your analysis, and provide evidence (screenshots) from your programming codes to support the results. Your presentation should be clear, should be in no more than eight PowerPoint slides, and you should not take more than 5 minutes to go through them. Your video presentation file cannot be more than 50MB.
Note: Students MUST use the tools and software packages in the lab sessions to support their data analytics involving practical scenarios.
Additionally, your final report should:
-
be clearly structured (with well-organised content); and
-
use the APA referencing style and include a reference list at the end.
For this assessment item, you are required to create programs using Python programming language in software packages from your lab sessions to analyse your data. You are also required to submit the programming source codes with the final report. Your programming source codes should be:
-
written in Python programming language;
-
use the packages studied in lab e.g., pyspark for analysis, not external packages e.g.
pandas, numpy, seaborn and sklearn;
-
can use purely visualization tool e.g., excel, Matplotlib to display, not analysis;
-
well commented upon in relation to both the main program and each individual module, such as the function module; and
-
free of errors, such as syntax errors, runtime errors, etc. Report Format
-
Cover Page: This should include the Assessment Number, Assessment Title, Student Name, Student ID and Student Email.
-
Body of the report: This should include all the relevant section headings to address each aspect as indicated/highlighted in the question and the marking rubric.
-
References: Both your in-text and the references included in the ‘References’ section the end of the report should adhere to the APA style.
-
-
• Glossary (Optional): This should include any terms frequently used in the report.
The following points are a general guide for the presentation of assessment items: Assessments items should be typed;
-
Use single spacing;
-
Use a wide left margin (as markers need space to be able to include their comments);
-
Use a standard 12-point font, such as Times New Roman, Calibri or Arial;
-
Left-justify body text;
-
Number your pages (excepting the cover page);
-
Insert a header or footer that details your name and student number on each page;
-
Always keep a copy (both hard and electronic) of your assessments; and
-
Most importantly, always run a spelling and grammar check; however, remember, such checks may not pick up all errors. You should still edit your work manually and carefully.