1. Homepage
  2. Programming
  3. INFS4203/7203 Data Mining - Project Phase II: Implementation

INFS4203/7203 Data Mining - Project Phase II: Implementation

Engage in a Conversation
UQINFS4203INFS7203Data Mining

INFS4203/7203 Project Phase II (20 marks) Semester 2, 2024 CourseNana.COM

All ass. If any assignment fails to be submitted appropriately before due, a penalty will be applied according to the ECP. Please take the responsibility to ensure your submission is successful before due time. Email submission will not be accepted. CourseNana.COM

Overview CourseNana.COM

In Phase II, you will implement your proposal submitted in Phase I, with necessary adjustment according to the empirical performance and the feedback from the proposal. This is an individual assignment. The completion of the assignment should be based on your own design and feedback from the proposal. CourseNana.COM

Track 1: Data-oriented project CourseNana.COM

In Phase II, you will be provided with the test data named test_data.csv. The first row describes features’ names. Except the first row, each row in the data file corresponds to one data point. There are 817 test data points in this file, and each column represents the same feature as the training data DM_project_24.csv. Note that the test data only has 105 columns, without labels, i.e., without the final column “Target (Column 106)” in training data DM_project_24.csv. Labels for the test data will not be released and will be used by the teaching team for marking only. CourseNana.COM

In this phase, you will need to implement the ideas in your proposal and classify the test data. In the marking phase, “F1” of the test data will be used for making. When calculating F1, “1” is counted as positive label, and “0” as negative label. You need to submit: CourseNana.COM

  • A result report on
    o Test result: the prediction on test data (in integer type) and
    o Evaluation result: the evaluated accuracy and F1 on the training data using cross- CourseNana.COM

    validation (in float type). CourseNana.COM

  • Code and readme file, which include CourseNana.COM

o Readme file, which should include
Final Choices: The final pre-processing methods, classification model, and CourseNana.COM

hyperparameters you used to achieve your reported test results. CourseNana.COM

Page 1 of 6 CourseNana.COM

  •   Environment Description: A clear and thorough description of your coding environment (operating system, programming language and version, additional installed packages, etc.). CourseNana.COM

  •   Reproduction Instructions: Detailed instructions on how to run the code so that your reported results for pre-processing, model selection, hyperparameter tuning, testing, and evaluation can be reproduced. CourseNana.COM

  •   Additional Justifications: Any additional justifications or references for the methods you implemented. CourseNana.COM

  •   The readme file can be in text format, such as .md, .docx, .pdf, or .txt. CourseNana.COM

o Training, Evaluation, and Testing Code: CourseNana.COM

Training Procedures: All code related to pre-processing, training on the training data, prediction on the test data, and generation of the result report. CourseNana.COM

  •   The code must include a main function in a main file (for example, main.py) to execute the overall process. CourseNana.COM

  •   Please fix the random seeds to ensure your results are reproducible.
    o Pre-processing Selection, Model Selection, and Hyperparameter Tuning Procedures – how you made the final choice of the selected pre-processing, model and CourseNana.COM

    hyperparameters:
    Include the code for the detailed procedures related to pre-processing selection, CourseNana.COM

    model selection, and hyperparameter tuning. CourseNana.COM

  •   If you need further explanation of the overall selection and tuning procedure, you CourseNana.COM

    may put them in the Readme file CourseNana.COM

  •   Please fix the random seeds to ensure your results are reproducible. CourseNana.COM

    Additional requirements CourseNana.COM

  • Please include the provided training and test files into your submitted .zip file for reproducing CourseNana.COM

    your results in the marking phase. The generated result report file (same as submitted) should be CourseNana.COM

    in the root directory. CourseNana.COM

  • Any programming language can be used. However, if you use Python, please submit .py files CourseNana.COM

    instead of any other formatted files. If you use the Jupyter Notebook or Colab, please submit .py CourseNana.COM

    file instead of .ipynb file. CourseNana.COM

  • Please submit your best prediction according to your cross-validation results. Multiple test results CourseNana.COM

    submitted will not be marked CourseNana.COM

    Format for the Result Report CourseNana.COM

The result report should be named as sxxxxxxx.infs4203 (sxxxxxxx, an s followed by seven digits is your student username) with the same Submission Title when submitting through the “Report Submission” Turnitin link provided. For example, if your student username is s1234567, then the result report should be named as s1234567.infs4203 and submitted with the same Submission Title. CourseNana.COM

The result report should be composed of 818 rows. For the first 817 rows, the 𝑖𝑖th row gives the prediction of the 𝑖𝑖th test instance, either 1 or 0 (in integer type). The last row (row 818) gives the accuracy (first column, rounded to the nearest 3rd decimal place) and F1 (second column, CourseNana.COM

Page 2 of 6 CourseNana.COM

rounded to the nearest 3rd decimal place) evaluated by yourself through cross-validation on the CourseNana.COM

training data, both in float type. CourseNana.COM

  • Please separate the values in each row and column with commas, and ensure each row ends with CourseNana.COM

    a comma. CourseNana.COM

  • You could refer to result_report_example.infs4203, which provides an example (Note: This is NOT CourseNana.COM

    the groundtruth) of the result report. CourseNana.COM

    Note that result report submitted in other forms or names will not be accepted or marked. Format for the Code and Readme File CourseNana.COM

  • Together with the result report, you need to submit a readme file and all your codes. CourseNana.COM

  • The readme file and your codes should be compressed into one zip file named sxxxxxxx.zip (sxxxxxxx is your student username) with the same Submission Title when submitting through the CourseNana.COM

    “Readme and code submission” Turnitin link provided. CourseNana.COM

    Note that code and readme file submitted in other forms or names will not be accepted or marked. CourseNana.COM

    We recommend you follow the Google Style Guides (https://google.github.io/styleguide/) for the programming style. Following such style is not mandatory for this assignment but using it may benefit your future career as a data scientist! CourseNana.COM

    Submission CourseNana.COM

    Only your last submitted version will be marked. All required files need to be submitted before due. Otherwise, penalty will be applied according to ECP, i.e., CourseNana.COM

    A penalty of 10% of the maximum possible mark will be deducted per 24 hours from time submission is due for up to 7 days. After 7 days, you will receive a mark of 0. CourseNana.COM

    • Result report should be submitted through the “Report submission” Turnitin link provided on Blackboard -> Assessment -> Project Phase II -> Report submission before the deadline, with the Submission Title sxxxxxxx.infs4203. CourseNana.COM

    • Compressed file of readme and codes should be submitted through the “Readme and code submission” Turnitin link provided on Blackboard -> Assessment -> Project Phase II -> Readme and code submission before the deadline, with the Submission Title sxxxxxxx.zip. CourseNana.COM

      Marking standard CourseNana.COM

      Submissions satisfying the following four conditions will be accepted and marked CourseNana.COM

Page 3 of 6 CourseNana.COM

  1. The selected best pre-processing, model and hyperparameter can be reproduced from the submitted readme file and codes. CourseNana.COM

  2. The classifiers used to do classification can be reproduced by the submitted readme file and codes. CourseNana.COM

  3. The classifiers are generated by using only techniques delivered in INFS4203/7203 lectures. CourseNana.COM

  4. The test and evaluation results can be reproduced by the submitted readme file and codes. CourseNana.COM

  5. The test and evaluation results are generated by applying the learned classifiers to the data. CourseNana.COM

When the above five conditions are satisfied, the result report will be marked according to the F1 result on the test data in the following way (rounded to the nearest 1st decimal place) CourseNana.COM

F1 Mark 0.3 1 0.33 2 0.36 3 0.39 4 0.42 5 0.45 6 0.48 7 0.51 8 0.54 9 0.57 10 0.60 11 0.63 12 0.66 13 0.69 14 0.72 15 CourseNana.COM

  1. 0.75  16 CourseNana.COM

  2. 0.76  17 CourseNana.COM

  3. 0.77  18 CourseNana.COM

  4. 0.78  19 CourseNana.COM

  5. 0.79  20 CourseNana.COM

Training time or prediction time will not be counted into marking. CourseNana.COM

(End of Track 1. See the next page for Track 2 specifications.) Page 4 of 6 CourseNana.COM

Track 2: Competition-oriented project CourseNana.COM

In this phase, you need to submit: CourseNana.COM

  • A result report of the Public Leader Board results, including a screenshot and an URL of the Public Leader Board. CourseNana.COM

  • A readme file with clear and thorough description of your coding environment (operation system, hardware requirement, programming language and its version, additional packages installed etc.) and instructions on how to run the code such that your final submission to Kaggle can be reproduced CourseNana.COM

  • Your implemented codes including training and test codes which have a main function to generate the final submission to Kaggle. CourseNana.COM

    Marking Standard CourseNana.COM

    The following marking standard will apply, unless otherwise discussed with the teaching team for exceptionally challenging competitions. CourseNana.COM

    You need to submit the evidence of your achievements in the public leading board by the end of the project deadline to earn your marks. Your username in the public leading board must be your student username (sxxxxxxx, each x represents a digit). CourseNana.COM

    If your targeted competition ends before the project deadline, you could show by cross-validation that you have achieved comparable performance to a particular competitor on the public leading board before the project deadline. Your project could then be assessed by the competitor’s corresponding rank percentage on the public leading board. CourseNana.COM

    You have to earn a public Leader Board top ranking index (your rank divided by the total number of competitors) by the project deadline CourseNana.COM

    Earned marks = max (20 – max (public_LB_top_ranking_index – 0.4, 0)*30, 0) CourseNana.COM

    That is, you earn 20 marks when having Public Leader Board top ranking to be within top 40% of all competitors. CourseNana.COM

    Format CourseNana.COM

The result report should be named as sxxxxxxx.pdf or sxxxxxxx.doc/docx (sxxxxxxx is your student username). For example, if your student username is s1234567, then the result report should be named as s1234567.pdf/doc/docx. CourseNana.COM

Note that result report submitted in other forms or names will not be accepted or marked. CourseNana.COM

Together with the report, you need to submit all your code and a readme file. The readme file and your code should be compressed into one zip file named sxxxxxxx.zip (sxxxxxxx is your student username). CourseNana.COM

Note that code and readme file submitted in other forms or names will not be accepted or marked. Page 5 of 6 CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
UQ代写,INFS4203代写,INFS7203代写,Data Mining代写,UQ代编,INFS4203代编,INFS7203代编,Data Mining代编,UQ代考,INFS4203代考,INFS7203代考,Data Mining代考,UQhelp,INFS4203help,INFS7203help,Data Mininghelp,UQ作业代写,INFS4203作业代写,INFS7203作业代写,Data Mining作业代写,UQ编程代写,INFS4203编程代写,INFS7203编程代写,Data Mining编程代写,UQprogramming help,INFS4203programming help,INFS7203programming help,Data Miningprogramming help,UQassignment help,INFS4203assignment help,INFS7203assignment help,Data Miningassignment help,UQsolution,INFS4203solution,INFS7203solution,Data Miningsolution,