1. Homepage
  2. Programming
  3. INFS 2011 Database for the Enterprise Assignment 2 – Data Warehousing and Integration

INFS 2011 Database for the Enterprise Assignment 2 – Data Warehousing and Integration

Engage in a Conversation
UniSAINFS2011Database for the EnterpriseData Warehousing and IntegrationXMLJson

  CourseNana.COM

Assignment 2 – Data Warehousing and Integration

Assessment Due Date CourseNana.COM

Weighting:     25% CourseNana.COM

Due Date:       23:59 Wed 5th of Jun CourseNana.COM

Group or individual assignment. CourseNana.COM

  CourseNana.COM

Submission CourseNana.COM

Submit a zip file to contain the following files CourseNana.COM

people.txt CourseNana.COM

mydblp.xml CourseNana.COM

report.docx CourseNana.COM

extract.xql CourseNana.COM

transform1.xql CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Assignment Overview CourseNana.COM

This assignment aims for you to apply your knowledge of XML, Json, data warehousing and integration. CourseNana.COM

You complete the assignment in a group of 2 students whom you choose yourself.  If you decide to do it individually, you must complete all parts. Please complete the people.txt file by adding the usernames, names, and the responsible parts of group members before you start the work. You can backup your work by sending the report periodically to an email run by a third-party company like outlook or gmail. CourseNana.COM

If you do the assignment in a group, only one of you submits. CourseNana.COM

The assignment is reasonably open. As a result, justification in decision points is important. Make sure that your writing is logical and cohesive so that your answers are not misunderstood. CourseNana.COM

  CourseNana.COM

Assignment Tasks CourseNana.COM

Task 1: Retrieve data.  (1 mark) CourseNana.COM

Retrieve all publications of a major type and a minor type as defined below from dblp.xml: CourseNana.COM

Major types:    CourseNana.COM

A:     article CourseNana.COM

B:     inproceedings CourseNana.COM

Minor types: CourseNana.COM

1:     mastersthesis CourseNana.COM

2:     phdthesis CourseNana.COM

3:     www CourseNana.COM

4:     proceedings CourseNana.COM

5:     book CourseNana.COM

6:     incollection CourseNana.COM

From the major type choose A if the last digit of your student ID (like 110123123) is an even number (including 0) and B if the last digit is an odd number. CourseNana.COM

From the minor type, choose n = (d % 6)+1 where d is the last digit of your student id and n is the number against the type you should choose. For example, if the last digit is 8, n=(8%6)+1=2+1=3, the student should use www as his/her minor type.   CourseNana.COM

Your code to do this should be in extract.xql. CourseNana.COM

The retrieved data should be in mydblp.xml. CourseNana.COM

  CourseNana.COM

Task 2: Design a star schema for publication analyses (8 marks) CourseNana.COM

a)    Publication data can be analyzed for different purposes and in different ways to quantitively measure the amount of research outputs of organizations and/or individuals, journal popularity and impact, hot research directions etc. CourseNana.COM

Design a star schema for this purpose. The analysis subject and its metric are given. The subject is publication performance, and the metric for the subject is the number of publications. You design dimension tables for the subject based on your mydblp.xml. CourseNana.COM

Describe the fact table and dimension tables in the following format. CourseNana.COM

facttable(SK1,Sk2,…,Skn [, kpi]) CourseNana.COM

                 constraints like keys and foreign keys CourseNana.COM

dim1(SK1, … ) CourseNana.COM

                 constraints CourseNana.COM

CourseNana.COM

dimn (SKn,…) CourseNana.COM

                 constraints CourseNana.COM

In the above schema, ‘SK’ means surrogate key, ‘…’ means more, kpi means a metric of the number of publications. [,kpi] means kip is optional. When kpi is not appearing, you must use an aggregate function to derive a metric, and this definition must be listed between the fact table schema and its constraints. Please replace the place holders for table names and attributes with your own. CourseNana.COM

b)   For each dimension table, describe how the dimension is helpful in an analysis. CourseNana.COM

Here are some notes about dimension tables: If a dimension table does not support any analysis, it is not needed. Time must be represented in one of the dimensions. A dimension should model an entity type and an entity type should be modelled in one dimension table only. For example, you should not put the author information and journal information in one dimension. CourseNana.COM

c)    Create your fact table and dimension tables in an oracle database with all keys and foreign keys defined. Please note that your table and attribute names mut be consistent in the design and in the implementation. Add table creation statements to the report. CourseNana.COM

Display the schema of each table by ‘select * from tablename;’, take a screenshot of only the schema line as below and add it to the report after the table name as shown in the following. CourseNana.COM

tablename1: CourseNana.COM

tablename2: CourseNana.COM

Display the keys and foreign keys of all the tables. CourseNana.COM

By the state ‘select owner, table_name, constraint_type, constraint_name, status from user_constraints; CourseNana.COM

Take a screenshot of the output including the schema line and data records and add the screenshot to the report. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Task 3: Transform XML data and load to the tables. (8 marks) CourseNana.COM

a)    You transform the data in mydblp.xml and load it into the warehouse tables in the oracle database server. This part is up to your freedom. You can do it by generating csv and then uploading csv, or by a program transforming and directly inserting data to oracle.  CourseNana.COM

Your code for this part should be named as ‘transform?.*’ where ‘?’ stands for a step number and ‘*’ is your code extension. An example of your code file can be tranform1.xql. The code files must be submitted as part of the zip. CourseNana.COM

In the report, add summaries for these transformation files and, if applicable, for all manual processes. For each file, the summary should include the aim, the input file, main processes, and the output. For a manual process, you describe the main steps. CourseNana.COM

Take a screenshot of top 5 records of each table from oracle and include them in the report. CourseNana.COM

b)     Use two example publication entries, one per type, to justify that your processes, CourseNana.COM

         preserve all values, and   CourseNana.COM

         preserve all one-to-many relationships. CourseNana.COM

         Data from tables should be presented in screenshots, and the screenshots must be minimal. CourseNana.COM

c)    Data for an analysis, often called a dataset, is a table from some joins of relevant dimensional tables and the fact table. This implies that the data in the warehouse is often modelled differently from the data for an analysis (e.g., multiple dimensional visualization). CourseNana.COM

         Write an SQL query to generate the dataset for one of the analyses you describe in Task2. CourseNana.COM

         Include the query and a screenshot of the schema and the first five lines of output in the report. CourseNana.COM

  CourseNana.COM

Task 4: Integrate more data to the warehouse.  (7 marks) CourseNana.COM

For this task, you present a possible data collection and integration idea for the warehouse. The idea must be presented with logical details for practical scrutiny which will be defined in the section. The answer for this part is very open.    CourseNana.COM

Data warehouse data often needs to be integrated from multiple sources. The data from dblp.xml does not support analyses against author institutions and journal impact data.  We aim to find more data from other sources. The newly identified data can be additional columns to existing tables or a new dimension table. CourseNana.COM

This task requires you to find a data source that can be integrated into the warehouse to support more analyses. You can use any source such as the internet (such as http://scholar.google.com for paper citation and author search, https://www.sciencedirect.com/search for journal quality search), or organizational webpages. The data is often unstructured. Show the following items in the report. CourseNana.COM

a)     An example of data from some source(s) for two data records, which can be presented as small screenshots. Data sources must be presented together with the example.   CourseNana.COM

b)     The data ready for integration from the source data after transformation, which would be a small table with two records. These records will be added to a table in the current star schema. CourseNana.COM

c)      Assumptions and processes used for identifying and extracting the source data and for transforming the source data in Item (a) to data ready for integration in Item (b). This part is very important as it validates whether an idea is practical. You do not need to write code, by the description must be logical.   CourseNana.COM

d)     The purpose of new data. Does it extend an existing dimension or add a new dimension? Present the updated schema for the relevant table. CourseNana.COM

e)     The way in which the new data records will be linked/matched to existing records. CourseNana.COM

f)      The challenges for Item 3) to be automated. CourseNana.COM

  CourseNana.COM

The core of this part is two data records and processes for making them ready for integration. You do not need to write code, but your descriptions must be cohesive and tight in logical. If they are not logical, the reader will not be able to follow them to repeat your steps. If they are not cohesive, the reader will not get what you want to deliver. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Marking Criteria CourseNana.COM

All code submitted should run. CourseNana.COM

Design decisions must be justified. CourseNana.COM

Writing must be logical and cohesive. CourseNana.COM

Plagiarism CourseNana.COM

If your solution, or part of it, is not written by you (your idea and your own typing), you commit plagiarism. The investigation will involve an oral or written test. If you commit plagiarism, you will be penalised and a record will be kept against you.  CourseNana.COM

Extensions CourseNana.COM

Extensions for assignments are available under the following conditions CourseNana.COM

·       permanent or temporary disability, or CourseNana.COM

·       compassionate grounds CourseNana.COM

In all cases, documentary evidence (e.g. valid medical certificate, road accident report, obituary) must be presented to the Course Coordinator.  A medical certificate produced on or after the due date will not be accepted unless you are hospitalized. CourseNana.COM

If you apply for extension within 24 hours before the deadline, you must see the course coordinator in person unless you are in an emergency situation like being admitted in a hospital. CourseNana.COM

  CourseNana.COM

Late Penalties CourseNana.COM

Unless you have an extension, late submission will incur a penalty of 30% deduction per day (or part of it) of lateness. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
UniSA代写,INFS2011代写,Database for the Enterprise代写,Data Warehousing and Integration代写,XML代写,Json代写,UniSA代编,INFS2011代编,Database for the Enterprise代编,Data Warehousing and Integration代编,XML代编,Json代编,UniSA代考,INFS2011代考,Database for the Enterprise代考,Data Warehousing and Integration代考,XML代考,Json代考,UniSAhelp,INFS2011help,Database for the Enterprisehelp,Data Warehousing and Integrationhelp,XMLhelp,Jsonhelp,UniSA作业代写,INFS2011作业代写,Database for the Enterprise作业代写,Data Warehousing and Integration作业代写,XML作业代写,Json作业代写,UniSA编程代写,INFS2011编程代写,Database for the Enterprise编程代写,Data Warehousing and Integration编程代写,XML编程代写,Json编程代写,UniSAprogramming help,INFS2011programming help,Database for the Enterpriseprogramming help,Data Warehousing and Integrationprogramming help,XMLprogramming help,Jsonprogramming help,UniSAassignment help,INFS2011assignment help,Database for the Enterpriseassignment help,Data Warehousing and Integrationassignment help,XMLassignment help,Jsonassignment help,UniSAsolution,INFS2011solution,Database for the Enterprisesolution,Data Warehousing and Integrationsolution,XMLsolution,Jsonsolution,