1. Homepage
  2. Programming
  3. DTS301TC Data Mining Individual Project: COVID-19 tweet analysis

DTS301TC Data Mining Individual Project: COVID-19 tweet analysis

Engage in a Conversation
XJTLUDTS301TCData MiningStatistic AnalysisData VisualizationData Analysis

DTS301TC Data Mining
CourseNana.COM

School of AI and Advanced Computing
Individual Project
Sunday, October 29th 23:59 (Beijing Time), 2023
Category A CourseNana.COM

DTS301TC Data Mining Individual Project
CourseNana.COM

Deadline: Sunday, October 29th 23:59 (Beijing Time), 2023 Percentage in final mark: 60% CourseNana.COM

Learning outcomes assessed: CourseNana.COM

D. Develop skills of using recent data mining software for solving practical problems. CourseNana.COM

E. Gain experience of doing independent study and research. CourseNana.COM

Late policy: 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the submission date, up to a maximum of five working days. CourseNana.COM

Risks: CourseNana.COM

  • Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may result in loss of marks. CourseNana.COM

  • The assignment must be submitted via Learning Mall to the correct drop box. Only electronic submission is accepted and no hard copy submission. CourseNana.COM

  • All students must download their file and check that it is viewable after submission. Documents may become corrupted during the uploading process (e.g. due to slow internet connections). However, students themselves are responsible for submitting a functional and correct file for assessments. CourseNana.COM

  • Academic Integrity Policy is strictly followed. Overview CourseNana.COM

    The objective of this project is to apply data mining techniques in a real-world dataset to gain a better understanding of real-world data mining applications. In this project, you need to identify one appropriate data mining problem from a COVID-19 related twitter dataset and apply data mining algorithms to extract useful information from the dataset using R or Python. According to the learning outcome E, you are expected to do some independent study and research in this individual project. CourseNana.COM

    Dataset CourseNana.COM

    The project uses a sample of GeoCoV19 Twitter dataset (https://crisisnlp.qcri.org/covid19). The dataset contains a large number of geo-tagged COVID-19 tweets during the period of Feb 1st March 31st, 2020, from various locations in the United States. CourseNana.COM

The dataset is stored in a CSV file and needs to be processed with your R or Python program. Each record (row) contains information about a tweet. The columns are explained as follows. CourseNana.COM

  • tweet_id the ID of a tweet CourseNana.COM

  • created_at the time when a tweet is published CourseNana.COM

  • user_id the ID of a user CourseNana.COM

  • country_code in which country the tweet is published CourseNana.COM

  • state in which state the tweet is published CourseNana.COM

  • text the actual tweet message CourseNana.COM

    Requirements and Tasks CourseNana.COM

    You are allowed to use existing R or Python libraries to solve the following tasks. Mark breakdown for each task can be found from the DTS301TC Project Marking Criteria at the end of this document. CourseNana.COM

    T1 Statistic Analysis and Data Visualization: CourseNana.COM

    T1-1: Find how many different tweets and users included in this dataset. CourseNana.COM

    T1-2: Find the top 10 users who tweeted the most. CourseNana.COM

    T1-3: Draw a figure to show the number of tweets posted on each day (From Feb 1st to March 31st, 2020). CourseNana.COM

    T1-4: Draw a figure to show the number of tweets posted from each state. T2 Text Data Cleaning, Pre-processing and Visualization: CourseNana.COM

    T2-1: Raw tweets are highly unstructured and often contain redundant and problematic information. For instance, the links, emojis and symbols (e.g., #, @) in a tweet may not be necessary for the text mining tasks. Use R or Python to clean and pre-process raw tweets. CourseNana.COM

    T2-2: Apply necessary text mining preprocessing techniques, e.g., tokenization, stemming, stop word removal, etc. CourseNana.COM

    T2-3: Generate a word cloud to show the frequently used words in the COVID-19 tweet dataset. You can further pre-process the dataset based on the topic you choose in T3. CourseNana.COM

    T3 Data Processing and Analysis: CourseNana.COM

    Identify one data mining problem and use data mining algorithm(s) to extract useful information from the given dataset. You can choose your own topic. Please make sure your topic is appropriate and have some research value. Some potential topics are listed for your reference. CourseNana.COM

  • Identifying trending topics of COVID-19 on twitter CourseNana.COM

  • Extracting tweets related to specific topic, e.g., China, vaccine, policy, mask, etc. CourseNana.COM

  • Spatial and temporal analysis and sentiment analysis of tweets CourseNana.COM

  • Topic modeling of COVID-19 tweets CourseNana.COM

  • etc. CourseNana.COM

    Report CourseNana.COM

    You need to write a report to show all the contents for this project. In general, the report must be in English and should include the following contents: CourseNana.COM


    Source code and results for T1 Statistic Analysis and Data Visualization. You can add CourseNana.COM

    1. one or two paragraphs to explain anything that is not obvious. CourseNana.COM

    2. Source code and results for T2 Text Data Cleaning, Pre-processing and Visualization. CourseNana.COM

      You need to give some examples to show the tweet content before and after data pre- CourseNana.COM

      processing. You can also add one or two paragraphs to explain anything that is not obvious. CourseNana.COM

    3. For T3 Data processing and Analysis you should include the following contents: CourseNana.COM

      1. Introduction: State clearly what is the topic, why you chose the topic, show the originality and significance of the topic, and discuss if there are some existing studies related to the topic. CourseNana.COM

      2. Methodology: State what data mining algorithm(s) you use to solve the problem, explain how to use it and identify the novelty of your method (if any). CourseNana.COM

      3. Experiments: Include your code and some brief explanation. CourseNana.COM

      4. Evaluation: Show all the results (e.g., tables, figures, etc.) you get from your method and give the corresponding explanation. You can also discuss the pros and CourseNana.COM

        cons of different models if you implemented multiple models for your topic. CourseNana.COM

      e. Conclusion: Summary of the results, list some current limitations and future CourseNana.COM

      directions. f. Reference CourseNana.COM

    If you refer to any work from other sources, the original work must be cited. CourseNana.COM

    Maximum 2500 words for the report excluding source code. (Clarity and brevity are valued over length). CourseNana.COM

    Submission CourseNana.COM

    Electronic submission on Learning Mall is mandatory. You need to submit a zip file (named IDnumber_Name_DTS301TC_Project.zip (e.g.: 1900000_ZhangSan_DTS301TC_Project.zip)) containing all your source code in R or Python and your report in pdf format. CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
XJTLU代写,DTS301TC代写,Data Mining代写,Statistic Analysis代写,Data Visualization代写,Data Analysis代写,XJTLU代编,DTS301TC代编,Data Mining代编,Statistic Analysis代编,Data Visualization代编,Data Analysis代编,XJTLU代考,DTS301TC代考,Data Mining代考,Statistic Analysis代考,Data Visualization代考,Data Analysis代考,XJTLUhelp,DTS301TChelp,Data Mininghelp,Statistic Analysishelp,Data Visualizationhelp,Data Analysishelp,XJTLU作业代写,DTS301TC作业代写,Data Mining作业代写,Statistic Analysis作业代写,Data Visualization作业代写,Data Analysis作业代写,XJTLU编程代写,DTS301TC编程代写,Data Mining编程代写,Statistic Analysis编程代写,Data Visualization编程代写,Data Analysis编程代写,XJTLUprogramming help,DTS301TCprogramming help,Data Miningprogramming help,Statistic Analysisprogramming help,Data Visualizationprogramming help,Data Analysisprogramming help,XJTLUassignment help,DTS301TCassignment help,Data Miningassignment help,Statistic Analysisassignment help,Data Visualizationassignment help,Data Analysisassignment help,XJTLUsolution,DTS301TCsolution,Data Miningsolution,Statistic Analysissolution,Data Visualizationsolution,Data Analysissolution,