CourseNana | Machine Learning in Practice Assignment: ML Solutions for Misinformation Detection in Social Media

Machine Learning in Practice – 2024 S1 – Assignment CourseNana.COM

ML Solutions for Misinformation Detection in Social Media CourseNana.COM

ML Project: Jupyter Lifecycle Expedition CourseNana.COM

Machine Learning in Practice – 2024 S1 – Assignment Introduction CourseNana.COM

Social media, particularly X (formally known as Twitter), has revolutionized the way information spreads, but it's also an incubator for fake news and misinformation. Misinformation on platform X can evolve from diverse forms and may stem from various sources, whether intentional or not, taking advantage of the platform's viral nature to widen its dissemination. As we approach major events like elections, the urgency to address this challenge becomes increasingly apparent. As there is no specific form that misinformation is presented in, there is an increasing need to develop more innovative and novel approaches to addressing it. CourseNana.COM

Machine learning and natural language processing (NLP) offer promising solutions to identify trends and detect misinformation. However, free-text data is challenging to incorporate into classification models due to its lack of structure. To overcome this challenge, latent variable models such as topic models or feature generation can be used to infer intermediary representations that can be used as structured data for classification tasks. CourseNana.COM

In this project, you will showcase the significance of integrating data sourced from X alongside newly engineered features to classify the authenticity of news-related tweets. A dataset obtained from X has been web-scraped, and the various sections of this assignment will establish one kind of exploratory strategy for addressing a classification challenge. CourseNana.COM

2 CourseNana.COM

Machine Learning in Practice – 2024 S1 – Assignment Dataset CourseNana.COM

The Assignment dataset consists of an assortment of news headlines, along with associated X posts relating to the headline. The dataset consists of 134,198 rows and 15 columns. There are 3 types of feature variables and only 1 target variable: CourseNana.COM

Feature Variables CourseNana.COM

➢ Textual Data: CourseNana.COM

news_author (str) author of a news headline. CourseNana.COM
news_headline (str) – headline of a news article. CourseNana.COM
related_tweet (str) – X post relating to the news headline posted by a user. CourseNana.COM

➢ Post Metadata CourseNana.COM

post_replies (int) - number of replies on the post. CourseNana.COM
post_retweets (int) - number of retweets on the post. CourseNana.COM
post_favourites (int) - number of favourites on the post. CourseNana.COM
post_quotes (int) - number of times the post has been quote tweeted. CourseNana.COM

➢ User Metadata CourseNana.COM

user_followers (int) - number of followers. CourseNana.COM
user_following (int) - number of following users. CourseNana.COM
user_friends (int) - number of friends (mutual following). CourseNana.COM
user_tweet_count (int) – total number of tweets the user has made. CourseNana.COM
user_favourites_count (int) – total number of favourites user has across all tweets. CourseNana.COM
user_mentions (int) – total number of of users mentioned (@) in related_tweet CourseNana.COM
user_tweet_count_lists (int) – total number of tweets the user has in their lists. CourseNana.COM

Target Variable CourseNana.COM

➢ Misinformation (bool) – a T/F value representing if a tweet is false. CourseNana.COM

Machine Learning in Practice – 2024 S1 – Assignment CourseNana.COM

Specification Summary CourseNana.COM

Type: Project report, individual assignment CourseNana.COM
Deliverable: Report in the format of Python script only (.ipynb) CourseNana.COM

The aim of this assignment is to provide you with experience in the steps involved in text preparation, feature generation, and creating, evaluating, and improving classification models. You will need to research NLP, and python functionalities if you aim to achieve excellent marks and discover innovative techniques/methods. CourseNana.COM

Exploration, Preparation & Feature Generation CourseNana.COM

This section requires you to explore various aspects of your dataset and prepare the data for future sections. It is important you take time to carefully explore your data and make decisions on preparation or generation that make sense. CourseNana.COM

Preprocessing steps are essential to clean and standardize data before feature generation and enhance the quality of extracted features. Classification models that harness generated features may enable models to better understand and analyze data or to better learn patterns and relationships, compared to regular models. CourseNana.COM

Further, X or Twitter recently open sourced their algorithms and many articles provide insights into what features of a tweet are important. Knowing this may help to better understand how to classify a tweet as misinformation. CourseNana.COM

Your task is to
➢ Explore and prepare your data. CourseNana.COM

o Inthistask,youcouldperformthenecessarycleaningandpre-processingtasks,explore or try to understand and profile your data through various techniques (i.e. clustering, topic modelling, etc.). CourseNana.COM

➢ Generate new features from your data.
o You should have a good understanding of your data from above and can now CourseNana.COM

experiment with feature generation. In this task you should consider what can be generated to improve your classification model. CourseNana.COM

4 CourseNana.COM

Machine Learning in Practice – 2024 S1 – Assignment Classification (Model Building and Evaluation) CourseNana.COM

It is important to try multiple variations of features/parameters in model building to achieve the best performance. Additionally, you should elaborate on the performance metrics you have used to evaluate your model and explain why they suit the available data. CourseNana.COM

Your task CourseNana.COM

➢ Experiment developing and evaluating classification models to find a model that has the best overall performance. CourseNana.COM

o Once you find the best performing model, you should only show how you built and evaluated that specific one. CourseNana.COM

➢ Elaborate on the major tasks you have undertaken to improve the best-performing model and explain why the performance metrics suit the available data. CourseNana.COM

Submission CourseNana.COM

Your report should be delivered in an .ipynb file. A notebook template is provided to show how to structure your work. You need to use the template (Assignment_Template.ipynb) and strictly follow its format which is designed based on the provided Assignment rubric. CourseNana.COM

It can be useful that add some in-line comments (using #) next to your codes to explain it briefly. CourseNana.COM

You will get a better mark if your approach is innovative. This means no other student has applied it, or a few others have applied a similar approach with some differences. Therefore, it is highly advised that you do not share your creative work with anyone else. You can still discuss preliminary ideas and help each other, just remember your submission must be your own work. CourseNana.COM

You will only need to submit one .ipynb file and should use the provided Python template file. Before submission: CourseNana.COM

➢ Ensure that your code can run without errors. If your code returns an error at any point, your assignment will only be marked up until the error, and the remainder of your code won't earn any marks. Example errors may include: Syntax issues or Name Errors. CourseNana.COM
➢ Make sure that all the important outputs are shown in your notebook. However, avoid showing trivial outputs. For example, you should remove codes randomly displaying the whole DataFrame, etc. CourseNana.COM
➢ Your marker will first look at your generated output as a reference without running your notebook (unless deemed necessary). Therefore, your significant outputs need to be generated, and the elaboration should be provided in the notebook, as shown in the template. CourseNana.COM

Machine Learning in Practice Assignment: ML Solutions for Misinformation Detection in Social Media

Get in Touch with Our Experts