1. Homepage
  2. Programming
  3. MGSC 416 Data-driven models for Operations Analytics Problem Set 2: Enron email network

MGSC 416 Data-driven models for Operations Analytics Problem Set 2: Enron email network

Engage in a Conversation
CanadaMcGillMGSC 416Data-driven models for Operations AnalyticsEnron email networkR

MGSC 416, Winter 2023 Data-driven models for Operations Analytics

Problem Set 2-Individual Assignment CourseNana.COM

Please submit your R code (R script or RMarkdown file) with comments. You also should submit a word/pdf document where you summarize your findings and answer each question. Please paste the necessary supporting graphs/tables from R in your document (or use a RMarkdown pdf). CourseNana.COM

Problem: Enron email network (40pts) In this assignment, we explore the email network of Enron during their investigation. The objective is to use and discuss different network metrics to test a hypothesis. There are two files: • nodes email.csv : this files contains the list of nodes of the graph where each node represents an email address. • edges email.csv : this files contains the list of edges of the graph where an edge is present from A to B if an email was sent from A to B. The weight on the edge represents how many emails were sent. The goal of the problem is to narrow down a short list of people that we should investigate first. A good investigator forms many hypotheses and goes through them. If some people show up under various hypothesis, that’s even more reason to flag them as suspicious. CourseNana.COM

Note: please use edge.arrow.size=0.1 and vertex.label=NA for all your network plots. CourseNana.COM

0.1 Step 1: the number of received emails (16pts) CourseNana.COM

We can look up statistics on the number of emails people receive and send. It is suspected that employees who receive many emails have higher chanced of being implicated in the fraud. CourseNana.COM

  1. Who are the top 8 employees with the most received emails? (4pts).
  2. Let us calculate for each employee the ratio of the number of received emails over the number of emails sent. Make a scatter plot with the following information: • x-axis: number of email sent. • y-axis: ratio of email received over sent. • Size of point: total number of emails exchanged (sent and received) Make sure your graph and axes are properly titled. (5pts)
  3. We will focus on employees who have sent at least 10 emails and have a ratio of received emails vs sent that is at least 1.5. (a) Who are these suspected employees? (2pts) (b) We want to highlight the list of these suspects in the network by changing their color and size. Plot the network where the suspected employees are represented by red nodes with size = 7 and the rest are represented by black nodes of size = 3. (5pts)

0.2 Step 2: filter out obvious non-suspects (7pts) CourseNana.COM

Actually in the dataset, some emails are outside the Enron domain, meaning they do not end in enron.com. We will filter out these emails from our dataset and network. CourseNana.COM

  1. Plot the network where we highlight the emails that are outside of Enron domain. We will use the color blue and size=7 for these nodes and the color black and size=3 for the rest. In particular, the function grepl( ’enron.com’,email data$name), returns a true/false array that we can use to subset the data. (5pts)
  2. Plot a new network where these nodes are deleted. (2pts)

0.3 Step 3: Compare metrics (17pts) CourseNana.COM

From now on we look at the filtered dataset of only employees with an Enron email (make sure to only consider this subset of your emails), and the resulting network from Q2.2. Next we will only keep emails that are in the largest connected component of a the network: We first inspect the membership value of each employee in the network (for example named net) (i.e. which connected component they belong to). Then we only keep the ones that belong to the largest one. We recover the induced subgraph: CourseNana.COM

This is a necessary step for calculating metrics for centrality. All metrics will be calculated for the resulting sub-graph, and the associated emails to the subgraph. Note. To calculate quantiles, you can use the function quantile(x,p) that calculates the percentile p in a vector x. CourseNana.COM

  1. Calculate the closeness centrality for these employees. Who are the employees that are above the 96% percentile in terms of closeness centrality? (4pts)
  2. Calculate the betweenness centrality for these employees. Who are the employees that are above the 96% percentile in terms of betweenness centrality ? (4pts)
  3. We want to highlight all of these employees in the plot of the network. Make the plot with the following highlights: we will use the color blue for the top 4% in only closeness, the color red for the top 4% in only betweenness, the color green for those that rank in the top 4% for both metrics, and the color black for the rest of nodes. All the nodes that are not black will have a size of 7, the black nodes will have a size 3. Who are your top suspect employees that rank in the top 4% for both closeness and betweenness? (9pts)

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Canada代写,McGill代写,MGSC 416代写,Data-driven models for Operations Analytics代写,Enron email network代写,R代写,Canada代编,McGill代编,MGSC 416代编,Data-driven models for Operations Analytics代编,Enron email network代编,R代编,Canada代考,McGill代考,MGSC 416代考,Data-driven models for Operations Analytics代考,Enron email network代考,R代考,Canadahelp,McGillhelp,MGSC 416help,Data-driven models for Operations Analyticshelp,Enron email networkhelp,Rhelp,Canada作业代写,McGill作业代写,MGSC 416作业代写,Data-driven models for Operations Analytics作业代写,Enron email network作业代写,R作业代写,Canada编程代写,McGill编程代写,MGSC 416编程代写,Data-driven models for Operations Analytics编程代写,Enron email network编程代写,R编程代写,Canadaprogramming help,McGillprogramming help,MGSC 416programming help,Data-driven models for Operations Analyticsprogramming help,Enron email networkprogramming help,Rprogramming help,Canadaassignment help,McGillassignment help,MGSC 416assignment help,Data-driven models for Operations Analyticsassignment help,Enron email networkassignment help,Rassignment help,Canadasolution,McGillsolution,MGSC 416solution,Data-driven models for Operations Analyticssolution,Enron email networksolution,Rsolution,