1. Homepage
  2. Programming
  3. CISC 5950 Big Data Programming - Project 1: Hadoop MapReduce-based program - NY Parking Violations

CISC 5950 Big Data Programming - Project 1: Hadoop MapReduce-based program - NY Parking Violations

Engage in a Conversation
JavaPythonHadoopMapReduceNY Parking ViolationsCISC 5950Big Data ProgrammingUSFordham University

Big Data Programming CourseNana.COM

CISC 5950 — Project 1

In CISC 5950, we have learned the following topics, CourseNana.COM

  1. Set up a 3-node cluster with Hadoop Distributed File System and run examples.
  2. On top of HDFS, set up the cluster with MapReduce programming framework.
  3. Run examples of MapReduce programs.
  4. Scheuling on the Cloud.

In this project, we are going to design our own Hadoop MapReduce-based program to analyze the data. The project consist of two parts. CourseNana.COM

NY Parking Violations

The NYC Department of Finance collects data on every parking ticket issued in NYC ( 10M per year!). This data is made publicly available to aid in ticket resolution and to guide policymakers. You can find the data from the Link of NYC Parking Data. CourseNana.COM

The above figure shows several records, where each row represents a parking ticket and the columns are the details of the tickets. To start the project, you have to, CourseNana.COM

  1. Start the 3-node cluster
  2. Set up the HDFS
  3. Store the data in HDFS
  4. Set up the MapReduce framework along with the scheduler for resource management.

By analyzing the data, we need to answer the following, • When are tickets most likely to be issued? • What are the most common years and types of cars to be ticketed? • Where are tickets most commonly issued? • Which color of the vehicle is most likely to get a ticket? CourseNana.COM

NBA Shot Logs

https://www.kaggle.com/dansbecker/nba-shot-logs This is the DATA (https://www.kaggle.com/dansbecker/nba-shot-logs ) on shots taken during the 2014-2015 season, who took the shot, where on the floor was the shot taken from, who was the nearest defender, how far away was the nearest defender, time on the shot clock, and much more. The column titles are generally self-explanatory. CourseNana.COM

The above figure shows several records, where each row represents a shot and the columns are the details of the shot, e.g. the game ID, who is the defender, what is the distance between them. CourseNana.COM

By analyzing the data, we need to answer the following, • For each pair of the players (A, B), we define the fear sore of A when facing B is the hit rate, such that B is closet defender when A is shoting. Based on the fear sore, for each player, please find out who is his ”most unwanted defender”. • For each player, we define the comfortable zone of shooting is a matrix of, {SHOT DIST, CLOSE DEF DIST, SHOT CLOCK} CourseNana.COM

Please develop a MapReduce-based algorithm to classify each player’s records into 4 comfortable zones. Considering the hit rate, which zone is the best for James Harden, Chris Paul, Stephen Curry and Lebron James. CourseNana.COM

Bonus Question

The biggest challenge when using K-Means is to decide on the number of clusters. Having more clusters creates some small classes with very few records, while having less clusters leads to classes that are too general. Based on a K-Means algorithm above, try to answer the following question, • Given a Black vehicle parking illegally at 34510, 10030, 34050 (street codes). What is the probability that it will get an ticket? (very rough prediction). • At 10 am, I want to go to Lincoln Center and I just want to walk within 0.5 mile. Where should I park? (Divided into zones). CourseNana.COM

Grading Rubric

You should complete the lab in groups, up to 3 students. (70%) P1: NY Parking Violations (17.5% 4); (20%) P2: NBA Shot Logs (10% 2); (10%) Two Reports the your design and experiments, please as detail as possible and must include your screenshots; In addition, you also need to write two README files for P1 and P2. (5%) Bonus Question; CourseNana.COM

Submission

You are expected to upload a zip(or tar) file before the deadline to Blackboard. The zip file should include two (or three) folders, • Part1: your codes, report and README • Part2: your codes, report and README • Bonus: your codes, report and README CourseNana.COM

Userful Links

  1. Analysis of NYC Parking Tickets.
  2. Preliminary Data Visualization.
  3. Exploring 42.3M NYC Parking Tickets.
  4. NY Parking Violations Issued .
  5. Insights From Raw NBA Shot Log Data.
  6. Investigating the hot hand phenomenon in the NBA (CODE).
  7. Parallel K-Means Clustering Based on MapReduce.
  8. NBA 16-17 regular season shot log.
  9. The Fear Factor.
  10. The Best And Worst Defenders.
  11. NBA Classification.
  12. Stephen Curry’s Decision Tree.
  13. Points per Match (ATL vs WAS only).
  14. MapReduce-kmeans.

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Java代写,Python代写,Hadoop代写,MapReduce代写,NY Parking Violations代写,CISC 5950代写,Big Data Programming代写,US代写,Fordham University代写,Java代编,Python代编,Hadoop代编,MapReduce代编,NY Parking Violations代编,CISC 5950代编,Big Data Programming代编,US代编,Fordham University代编,Java代考,Python代考,Hadoop代考,MapReduce代考,NY Parking Violations代考,CISC 5950代考,Big Data Programming代考,US代考,Fordham University代考,Javahelp,Pythonhelp,Hadoophelp,MapReducehelp,NY Parking Violationshelp,CISC 5950help,Big Data Programminghelp,UShelp,Fordham Universityhelp,Java作业代写,Python作业代写,Hadoop作业代写,MapReduce作业代写,NY Parking Violations作业代写,CISC 5950作业代写,Big Data Programming作业代写,US作业代写,Fordham University作业代写,Java编程代写,Python编程代写,Hadoop编程代写,MapReduce编程代写,NY Parking Violations编程代写,CISC 5950编程代写,Big Data Programming编程代写,US编程代写,Fordham University编程代写,Javaprogramming help,Pythonprogramming help,Hadoopprogramming help,MapReduceprogramming help,NY Parking Violationsprogramming help,CISC 5950programming help,Big Data Programmingprogramming help,USprogramming help,Fordham Universityprogramming help,Javaassignment help,Pythonassignment help,Hadoopassignment help,MapReduceassignment help,NY Parking Violationsassignment help,CISC 5950assignment help,Big Data Programmingassignment help,USassignment help,Fordham Universityassignment help,Javasolution,Pythonsolution,Hadoopsolution,MapReducesolution,NY Parking Violationssolution,CISC 5950solution,Big Data Programmingsolution,USsolution,Fordham Universitysolution,