1. Homepage
  2. Subject
  3. Big Data 大数据
CS544 Intro to Big Data Systems - P4: HDFS Partitioning and Replication
CS544Intro to Big Data SystemsHDFS Partitioning and ReplicationPython
In this project, you'll deploy a small HDFS cluster and upload a large file to it, with different replication settings. You'll write Python code to read the file. When data is partially lost (due to a node failing), your code will recover as much data as possible from the damaged file.
CS439 Introduction to Data Science - Homework 1: MapReduce, Association Rules, Locality-Sensitive Hashing
CS439Introduction to Data ScienceMapReduceAssociation RulesLocality-Sensitive HashingJava
Write a MapReduce program in Hadoop that implements a simple “People You Might Know” social network friendship recommendation algorithm. The key idea is that if two people have a lot of mutual friends, then the system should recommend that they connect with each other.
CS544 Intro to Big Data Systems - P3: Large, Thread-Safe Tables
CS544Intro to Big Data SystemsgRPCPython
In this project, you'll build a server that handles the uploading of CSV files, storing their contents, and performing operations on the data. You should think of each CSV upload as containing a portion of a larger table that grows with each upload.
Machine Learning Fundamentals Group Assessment: Model comparison
Machine LearningRMSEFeature EngineeringKNNRegression
Background Information Kevin is a professional real-estate manager. In the past, he relied on using a few important features for home valuation. His boss recently asked him to take the initiative to learn to use big data and machine learning algorithms to value home prices in order to better communicate with customers.
CSE3BDC Big Data Management On The Cloud Assignment: Analysing Bank Data and Twitter Time Series Data
CSE3BDCBig Data Management On The CloudSparkSparkRDDSpark SQLAnalysing Twitter Time Series Data
A script which puts all of the data files into HDFS automatically is provided for you. Whenever you start the docker container again you will need to run the following script to upload the data to HDFS again, since HDFS state is not maintained across docker runs
Final Exam: Inverted index and information retrieval with Spark
SparkInverted indexInformation retrieval
Build an inverted index and retrieve relevant documents for the queries. Information retrieval is the science of searching for information in a document or collection of documents. In this assignment you are given a collection of documents and a set of queries. The main tasks for this assignment are:
COMP9313 Big Data Management Project 2: Top-k most frequent co-occuring term pairs
COMP9313Big Data ManagementPythonTop-k most frequent co-occuring term pairs
In this problem, we are still going to use the dataset of Australian news from ABC. Your task is to find out the top-k most frequent co-occurring term pairs in each year. The co-occurrence of (w, u) is defined as: u and w appear in the same article headline (i.e., (w, u) and (u, w) are treated equally).
COMP9313 Big Data Management Project 3: Finding Similar News Article Headlines Using Pyspark
COMP9313Big Data ManagementPythonSimilar News Article HeadlinesSpark
In this problem, we are still going to use the dataset of Australian news from ABC. Similar news may appear in different years. Your task is to find all similar news article headline pairs across different years.
CS350 Fundamentals of Computing Systems - Project 3: MapReduce
CS350Fundamentals of Computing SystemsMapReduce
In this lab you'll build a MapReduce system. You'll implement a worker process that calls application Map and Reduce functions and handles reading and writing files, and a coordinator process that hands out tasks to workers and copes with failed workers.
CS7280 Special Topics in Database Management - Project 3: Big Data Analytics
CS7280Special Topics in Database ManagementDatabasePySparkHadoopBig Query
The main purpose of this project is to become familiar with Big Data platform, including Hadoop system, MapReduce programming, and cloud based big data solutions (e.g., Google Big Query).
Big Data代写,MapReduce代写,Hadoop代写,Spark代写,HBase代写,大数据代写,Big Data代编,MapReduce代编,Hadoop代编,Spark代编,HBase代编,大数据代编,Big Data代考,MapReduce代考,Hadoop代考,Spark代考,HBase代考,大数据代考,Big Datahelp,MapReducehelp,Hadoophelp,Sparkhelp,HBasehelp,大数据help,Big Data作业代写,MapReduce作业代写,Hadoop作业代写,Spark作业代写,HBase作业代写,大数据作业代写,Big Data编程代写,MapReduce编程代写,Hadoop编程代写,Spark编程代写,HBase编程代写,大数据编程代写,Big Dataprogramming help,MapReduceprogramming help,Hadoopprogramming help,Sparkprogramming help,HBaseprogramming help,大数据programming help,Big Dataassignment help,MapReduceassignment help,Hadoopassignment help,Sparkassignment help,HBaseassignment help,大数据assignment help,Big Datasolution,MapReducesolution,Hadoopsolution,Sparksolution,HBasesolution,大数据solution,