Database Development and Design (DTS207TC)
Assessment 001: Individual Coursework
Overview
This assignment aims to gain experience in understanding the internal functionality of different database management systems, including RDBMS, XML, Object-relational databases, and gain experience in designing data warehouses to perform OLAP operations. The course work will be assessed for the following learning outcomes:
A. Identify and apply the principles underpinning transaction management within DBMS.
B. Demonstrate an understanding of advanced SQL topics.
C. Illustrate the issues related to Web technologies and DBMS and XML as a semi-structured data representation formalism.
D. Identify the principles underlying object-relational models.
E. State the main concepts in data warehousing and data mining.
Assessment Tasks
1. Advanced SQL, Triggers, Indexing, and Query Optimization
2. Transaction Management
3. Querying XML Data
4. Object-Relational Database
5. Data Warehousing and OLAP
1: Triggers, and Indexing
(20 Marks)
Q1: Consider the COMPANY database provided along with this assignment. Using the syntax of Oracle triggers, write rules to do the following:
Q1(a): Whenever an employee’s project assignments are changed, check if the total hours per week spent on the employee’s projects are less than 30 or greater than 40; if so, notify the employee’s direct supervisor. (5 marks)
Q1(b): Whenever an employee is deleted, delete the PROJECT tuples and DEPENDENT tuples related to that employee, and if the employee manages a department or supervises employees, set the Mgr_ssn for that department to NULL and set the Super_ssn for those employees to NULL. (5 marks)
You are required to provide the trigger code, SQL statements to test the triggers and screenshots of the database state of affected rows before and after executing trigger test statements and an explanation of your logic.
Q1(c) In the next task, you will be working on the same database to work on indexes by considering the following scenario: (10 marks)
Employee.SSN, Department.dnumber
Employee.SSN, Department.dname
Employee.SSN, Employee.salary
Execute the following query and estimate the operation cost for each combination of indexes. For each pair, discuss why or why not that index pair is the best choice based on the cost estimation.
FROM employee, department
Where employee.dno = department.dnumber AND employee.salary > 15000 AND department.dname < 'Research';
2: Transaction Management (20 Marks)
Q2(a): Consider a relation R(A) containing {(4),(5)} and two transactions:
T1: Update R set A = A+2;
T2: Update R set A = 4*A.
Suppose both transactions are submitted under the isolation and atomicity properties. Provide all possible final states of R (values of R) by considering all possible serializable and non-serializable schedules. Provide an explanation for each value it returns. (5 marks)
Q2(b) Consider a table R(A) containing {(3),(4)} and following two transactions run concurrently:
T1: Update R set A = 3*A; commit;
T2: Select avg(A) from R; commit;
If transaction T2 executes using READ UNCOMMITTED, what are the possible average values of T2 returns? Provide an explanation for each value it returns. (5 marks)
Q2(c) Consider table R(A) containing {(6),(7)}. Suppose we have the following two transactions running concurrently:
T1: Update T set A=3*A;
insert into R values (6);
Commit;
T2: Select avg(A) from R;
Select avg(A) from R ;
Commit;
If transaction T2 executes using REPEATABLE READ, what are the possible values returned by T2 in its SECOND select statement? Provide an explanation for each value it returns. (5 marks)
S1 | ||
T1 | T2 | T3 |
w1(A); w1(B); abort; | r2(A); w2(A*4) r2(B); commit; | r3(B); r3(B+6); commit; |
S2 | ||
T1 | T2 | T3 |
r1(A); r1(B); w1(B); abort; | r2(A); w2(A*4) | r3(B); w3(B+6); |
3: Querying XML Data (20 Marks)
Q3 In this task, you are required to use the below DTD to create an XML document and retrieve data from the XML document using Xpath/XQuery expressions.
<!DOCTYPE Sections [
<!ELEMENT Sections(Section*)>
<!ELEMENT Section(Number,Year,Qtr,Course_number,Course_name,Student*)>
<!ELEMENT Student(Ssn,Name,Class,Grade)>
<!ELEMENT Number(#PCDATA)>
<!ELEMENT Year(#PCDATA)>
<!ELEMENT Qtr(#PCDATA)>
<!ELEMENT Course_number(#PCDATA)>
<!ELEMENT Course_name(#PCDATA)>
<!ELEMENT Ssn(#PCDATA)>
<!ELEMENT Name(#PCDATA)>
<!ELEMENT Class(#PCDATA)>
<!ELEMENT Grade(#PCDATA)>
]>
Q3 (a): Create a well-formed XML document corresponding to the above DTD. Provide at least three instances for multiple elements. (5 marks)
Q3(b): Write queries in XQuery on the XML document you created in the previous step to do the following. For each query, provide query expression and code explanation. Execute each query and provide its results. (5*3=15 Marks)
4: Object-Relational Database (20 Marks)
Answer questions 4(a), 4(b), and 4(c) by considering the following scenario:
Suppose we want to develop a university teaching system to keep track of instructors and their teaching load. Each instructor has an office number and a unique name. Each instructor can only be either a Professor or a Teaching Assistant. A professor is identified by rank (lecturer, associate, or full professor), and a Teaching Assistant is identified by the number of years they have been assisting a course. A course is offered by a department and identified by a unique course name. A course can have multiple projects, whereas each project must belong to a specific course. An assistant can have a teaching load of at least three and at most six courses, whereas each course may have at most two assistants. A professor may teach only two courses, whereas a course must have at least one and at most three professors.
Q4(a) Create class models to directly map the entities and relationships shown in the above UML diagram to objects. Provide complete code for each model class in a clear readable format and explain any assumptions made. Provide a screenshot of the database structure (tables) created in SQLite as a result of running the model classes. (7 marks)
Q4(b) After defining all the model classes in the previous question, create ‘write.py’ file for defining methods to create objects/instances for populating all the tables in the database. For each table store at least five rows. Provide complete code for each method and a screenshot of each table populated with rows as a result of running the ‘write.py’ file. (7 marks)
Q4(c) At this point, you have defined models and created objects. Now you will apply your skills of querying objects. In this task, you are required to create ‘Read.py’ file under your project directory. Create code to perform the following queries: (2*3= 6 marks)
5: Data Warehousing and OLAP (20 Marks)
Q5 (a): A data warehouse can be modelled by either a star schema or a snowflake schema. Briefly describe the similarities and the differences of the two models, and then analyze their advantages and disadvantages with regard to one another. Give your opinion of which might be more empirically useful and state the reasons behind your answer. (5 marks)
Q5(b): In this task, you will perform “Online Analytical Processing” (OLAP) style queries over a simple “star schema” given below. The corresponding schema, along with the database, is provided with this assignment. You are required to load the schema in Oracle Database and use SQL developer to perform the following queries: (5*3=15 marks )
i. Starting with the base cuboid [Item, Store, Customer, TimeDim], what specific OLAP operations should one perform in order to list the total sales for Item Category “Computer Accessories” for US stores in the Year 2015? Describe the operations and illustrate the results of each function on the above multidimensional data warehouse.
ii. Summarize (SUM, MIN, and COUNT) store sales for Canada and USA in 2014 by store zip code and year without using subtotal operators. Provide query code and results of executing the query. Illustrate your results to discuss the total number of group-by’s and why or why not the query is effective for information processing assuming huge data in the warehouse.
iii. Use the same problem from part one and generate all possible subtotals by zip code and year using the data CUBE operator. Provide query code and compare your results with the previous question by comparing and contrasting results based on the number of rows and subtotals groups. Explain why or why not each type of subtotal is useful for supporting business-specific queries.