1. Homepage
  2. Programming
  3. INFS3200 Advanced Database Systems - Prac 1: Distributed Databases

INFS3200 Advanced Database Systems - Prac 1: Distributed Databases

Engage in a Conversation
AustralianUQUniversity of QueenslandINFS3200Advanced Database SystemsData IntegrationSQLReplicationPartial ReplicationDistributed Query Processing

INFS3200 Advanced Database Systems CourseNana.COM

Prac 1: Distributed Databases (5%)

Introduction

Learning Objectives:

  • Learn how to use Oracle DBMS through SQL Plus and SQL Developer. Oracle will be used in both Pracs 1 & 2.
  • Get familiar with the basic SQL queries and keywords. Write your SQL queries for data retrieval.
  • Simulate horizontal and vertical fragmentation using centralized Oracle. Understand how to update records on a distributed database with data replications.
  • Apply semi-join algorithm to simulate data transmission cost reduction over computer networks. Understand why semi-join could be faster sometimes.

Marking Scheme:

  • 2 marks: Receive one mark for completing two of Task 1 questions, full marks for completing all three questions;
  • 1 mark: Answer the question of Task 2 correctly, receive 0.5 mark when unnecessary updates or inappropriate explanation appears;
  • 1 mark: Finish Task 3;
  • 1 mark: Finish Task 4;

Screenshot your results and provide necessary scripts and explanations. Please make sure your screenshots contain your student ID (your student ID will be included in the name of the users you created) as the proof of originality. Put all your content in a word/pdf document or leave scripts in separate files and pack all files into a zip/rar package. Please format your document and code nicely to help tutor’s marking process. A poorly formatted document may receive a reduced mark. Submit your work to the Blackboard site by 4:00pm, Friday March 24th. CourseNana.COM

Late Penalties (from ECP):

“Where an assessment item is submitted after the deadline, without an approved extension, a late penalty will apply. The late penalty shall be 10% of the maximum possible mark for the assessment item will be deducted per calendar day (or part thereof), up to a maximum of seven (7) days. After seven days, no marks will be awarded for the item. A day is considered to be a 24 hour block from the assessment item due time. Negative marks will not be awarded.” CourseNana.COM

Part 1: Oracle 12c Enterprise Basics

1. Find the Oracle software

From the Windows 10 start menu, you can type “SQL Plus” and “SQL Developer” to search the tools, as shown below. We will use “SQL Plus” to create database users, and use “SQL Developer” to connect as these users and interact with the database. CourseNana.COM

2. Login and create users

In SQL Plus Command Line window, first log in with username “SYS AS SYSDBA” and password “Password1”, a successful login should be similar as follows (see troubleshooting below if you cannot log in). CourseNana.COM

Execute the commands below to create users. You can copy them, excluding comments (words in green), and right click your mouse in “SQL Plus” to paste. CourseNana.COM

NOTE: All “S1234567” mentioned in the document should be replaced by your student ID to distinguish your work from others, case insensitive. CourseNana.COM

/*Enable user creation*/
ALTER SESSION SET "_ORACLE_SCRIPT"=TRUE;
/* Create a user named “USER_S1234567” with password “w” */
CREATE USER USER_S1234567 IDENTIFIED BY w ACCOUNT UNLOCK DEFAULT TABLESPACE
"USERS" TEMPORARY TABLESPACE "TEMP" PROFILE "DEFAULT";
/* Grant DBA privilege to “USER_S1234567” */
GRANT DBA TO USER_S1234567;
/* Check if “USER_S1234567” has been created */
SELECT USERNAME FROM DBA_USERS;

Proceed to step 3 if you complete the above processes successfully. Otherwise, we provide several solutions to the problems you may encounter. CourseNana.COM

Troubleshooting: (1) TNS: Protocol adapter error: Oracle services are closed, check the service state: From the Windows 10 “Start” menu, search for “services”. In Services, check if “OracleOraDB12Home2MTSRecoveryService”, “OracleOraDB12Home2TNSListener” and “OracleServiceORCL” are running. If not, right-click and start them. CourseNana.COM

(2) Logon denied: Incorrect password, retry the password or reset it as follows: Open C:\app\ntadmin\virtual\product\12.2.0\dbhome_2\network\admin\sqlnet.ora, and change the 8th line to: SQLNET.AUTHENTICATION_SERVICES= (NONE) Then open a command prompt and type the following (you can simply copy and paste): orapwd file=C:\app\ntadmin\virtual\product\12.2.0\dbhome_2\database\PWDorcl.ora password=Password1! force=y CourseNana.COM

(3) “USER_S1234567” conflicts with another user: When creating user, drop the existing user by running: / WARNING: this will drop everything under that user / DROP USER USER_S1234567 CASCADE; CourseNana.COM

Then redo the CREATE and GRANT commands. CourseNana.COM

3. Use Oracle SQL Developer

Open SQL Developer in the start menu. We will use it to connect as the user we just created. Click the green “+” button as shown below. CourseNana.COM

Fill in the connection information in the prompted dialog window as shown below. The connection name is specified by the user. Username should be a user already existing in the database. In this example, it is the “USER_S1234567” we just created. Password is the password for that user, namely “w” in this case. You should also change SID to “orcl”. Then press the “Connect” button to connect to the user. CourseNana.COM

4. Import data to database

Select “Open” command in the “File” menu. Open the SQL scripts we provide for Part 1 as shown below (folder: “…\P1\Part 1\”). CourseNana.COM

It will show the script in the window. The script contains a table creation command (CREATE TABLE) and a list of record insertions (INSERT INTO). Click the second button (run all the scripts in the current tab) on its menu bar. A dialog will pop up asking for connection details. You should choose which connection you want to run the script. In this step, we choose the USER_S1234567. CourseNana.COM

After running a script, press the sixth button on the menu bar, as shown below, to commit the insertion. We perform the same process for all four scripts in the folder to complete the insertion. CourseNana.COM

Troubleshooting: (1) Maximum cursors exceed: Disconnect the current connection by right-clicking on it in the connections panel and reconnect. (2) Unique constraint: Check the top-right dropdown list and make sure you are in the correct connection. Also check if the data is already imported by following step 5. CourseNana.COM

5. View the imported data

In the left panel, expand your connection and find your tables under the Table directory. The basic information will be shown in the right window. Click the second tab “Data” in the right window to find the data you just imported, as shown below. CourseNana.COM

6. Interact with database using SQL

You can write any SQL query in the corresponding connection window and run it by clicking either the first button (run a single query under current cursor) or the second one (run every script in the current tab). For example, the query below tries to find out how many records we have imported to table “Athlete1”. We will ask you a few similar questions in task 1 which requires you understand the meaning of the table attributes and help you review the basic SQL keywords (SELECT, WHERE, GROUP BY, JOIN, etc.) which are necessary for the rest of the course. CourseNana.COM

Task 1: Write SQL queries to answer the following questions. Your queries and the result screenshots should be included in your submission. (1) Count the number of players from Australia (country code=AUS) in Athlete2 table. (2) For all Russian (RUS) players in table Athlete3, count the number of players participating in each sport. The result should be a list containing records like: CourseNana.COM

(3) Create a new table named “ATHLETE_FULL” which combines all records from tables Athlete1, 2 and 3. Use this table along with the country information from the Country table to count the total number of players from Europe. CourseNana.COM

Part 2: Distributed DB Design

In part 2, we aim to simulate a distributed database using a standalone computer. To do so, we are acting as a global site, which possesses the global conceptual schema and data replication info, and deal with all the incoming queries, then we create multiple user accounts in Oracle 12c (refer to “Part1: Oracle 12c Enterprise Basics” for how to create a user account), each of which represents a distributed local site. The data transferred among relations belonging to different user accounts corresponds to the data transferred over computer network among different sites. Given a global conceptual schema, perform the following tasks: CourseNana.COM

We provide you an Athlete table: CourseNana.COM

Athlete[AthleteID,FName,LName,DOB,CountryCode,SportID] CourseNana.COM

There are 24,591 records with AthleteID ranging from 1 to 24,591. CourseNana.COM

1. Horizontal fragmentation

There are three types of replication strategies: Full Replication, Partial Replication and No Replication. In this task, you are asked to simulate these three strategies on a distributed database which contains three local sites (USER1, 2 and 3). Each strategy requires 3 users for simulation, that is to say, you need to create 9 users in total for Task 2. After creating the users, you are asked to run the scripts in the respective folders to perform different data replications. CourseNana.COM

Job 1 - Full Replication

The data is split into three fragments: CourseNana.COM

  • Athlete1: 1<= AthleteID < 7656
  • Athlete2: 7657<= AthleteID < 17318
  • Athlete3: 17319 <= AthleteID <= 24591

Each fragment will be a relation located on every site in the computer network (i.e. each site has a full copy of each fragment). You should create three sites to simulate the full replication in SQL Plus command line: CourseNana.COM

  • USER1_HF_FULL_S1234567
  • USER2_HF_FULL_S1234567
  • USER3_HF_FULL_S1234567

To load fragments into site USER1_HF_FULL_S1234567, connect to user “USER1_HF_FULL_S1234567” in SQL Developer and run all script files in folder “…\P1\Part 2\HF\HF-Full\USER1_HF_FULL\”. Repeat the same process for other sites. CourseNana.COM

Job 2 – Partial Replication

The data is split into three fragments in the same way as in Job 1. Each fragment will be a relation located on some of the sites in the computer network (i.e., more than one site may have a copy of this fragment, but not all of them. You should read through the scripts to understand how fragments are replicated and allocated). You should create three sites: CourseNana.COM

USER1_HF_PA_S1234567 USER2_HF_PA_S1234567 USER3_HF_PA_S1234567 CourseNana.COM

In order to load the above fragments into site USER1_HF_PA_S1234567, connect to user “USER1_HF_PA_S1234567” in SQL Developer and run all script files in folder “…\P1\Part 2\HF\HFPartial\USER1_HF_PA\”. Repeat the same process for other sites. CourseNana.COM

Job 3 – No Replication

The data is split into three fragments in the same way as in Job 1. Each fragment will be a relation located on only one site in the computer network. You should create three sites: CourseNana.COM

  • USER1_HF_NO_S1234567
  • USER2_HF_NO_S1234567
  • USER3_HF_NO_S1234567

In order to load the above fragments into site USER1_HF_NO_S1234567, connect to user “USER1_HF_NO_S1234567” in SQL Developer and run all script files in folder “…\P1\Part 2\HF\HF-No\USER1_HF_NO\”. Repeat the same process for other sites. CourseNana.COM

Task 2: Given the update query below, write a set of SQL queries (or one transaction preferably) which applies this update to the system under each replication strategy, respectively. (Hint: Three sets of SQL queries (or three transactions) in total for three different strategies. Each of your update transaction should guarantee consistency between copies and should not perform update to sites which are not possible to have the record). Query: Change the country code of the player whose ID is 305 to “AUS”. Put your SQL queries/transactions, your result screenshots and the explanation of the differences of update operations between three replication strategies in your submission. CourseNana.COM

2. Vertical Fragmentation

Vertical Fragmentation: CourseNana.COM

AthleteV1[AthleteID, FName, LName] AthleteV2[AthleteID, DOB, CountryCode, SportID] CourseNana.COM

Create two users to simulate two sites, and load the data from folder “…\P1\Part 2\VF\”. CourseNana.COM

USER1_VF_S1234567 USER2_VF_S1234567 CourseNana.COM

Task 3: Write a SQL query to retrieve the full name and DOB of all the athletes satisfying 305<= AlthleteID<=310. Include your query and the screenshot of the result in your submission. CourseNana.COM

Part 3: Distributed Query Processing

In part 3, we still simulate a distributed database using this centralised Oracle database. However, the distributed sites are now organised in a peer-to-peer architecture, which means the queries can be issued at any of the local sites and receive answer from it. In this part, you are asked to perform inner join queries in such distributed environment efficiently, which means you should find the optimal execution plan for the join queries, as mentioned in Lecture 3, the optimal execution plan refers to the one with minimum data transmission cost. Therefore, to help you trace the data transmission cost, we provide the following statistical tool: CourseNana.COM

Open SQL Plus and login as “SYS AS SYSDBA” (applicable to other users you created). Type in the following commands to turn on query statistics. CourseNana.COM

SQL> SET TIMING ON;
/* enable timing */
SQL> SET AUTOTRACE ON STATISTICS; /* enable statistics */

We use a simple query as an example. Here is the statistical result of the following query:
select a.AthleteID, a.CCODE, b.CONTINENT
from "USER2_VF_S1234567"."ATHLETE_V2" a,
"USER_S1234567"."COUNTRY" b
where a.CCODE=b.CCODE;

Step Two – Perform the semi join on USER2

select a.AthleteID, a.CCODE from "USER2_VF_S1234567"."ATHLETE_V2" a where a.CCODE in (select distinct(CCODE) from "USER_S1234567"."COUNTRY"); CourseNana.COM

Step Three – Perform the final join on USER select a.AthleteID, a.CCode, b.CONTINENT from "USER_S1234567"."COUNTRY" b, (select a.AthleteID, a.CCODE from "USER2_VF_S1234567"."ATHLETE_V2" a where a.CCODE in (select distinct(CCODE) from "USER_S1234567"."COUNTRY")) a where a.CCODE= b.CCODE; CourseNana.COM

Therefore, the transmission cost of the semi-join plan is 3017(step 1) + 589666(step 2) = 592683. The example shows that by decomposing the query into a step-by-step plan, we can track the data transmission cost by aggregating the sizes of their intermediate results. Additionally, we provide you with several tips for this task: Tip 1: For the same query, the size of the final results should always be identical regardless of what execution plan you take. Like in the above example, the final result size is always 604782. CourseNana.COM

Task 4: Suppose that we want to retrieve all the information for Australian athletes from the vertical fragments created in Task 2, which can be achieved by the following join query: select b.AthleteID, b.FName, b.SName, c.BDate, c.CCode, c.SportID from "USER1_VF_S1234567"."ATHLETE_V1" b, "USER2_VF_S1234567"."ATHLETE_V2" c where b.AthleteID= c.AthleteID and c.CCODE='AUS'; CourseNana.COM

If the join query is issued at site “USER1”, write a semi-join and an inner-join execution plans in such vertically fragmented distributed database, respectively. Follow the above example to calculate their respective data transmission cost and decide which plan is better. Include step-by-step queries, cost calculations and your choice in your submission, supported by the screenshots of the query statistics. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
Australian代写,UQ代写,University of Queensland代写,INFS3200代写,Advanced Database Systems代写,Data Integration代写,SQL代写,Replication代写,Partial Replication代写,Distributed Query Processing代写,Australian代编,UQ代编,University of Queensland代编,INFS3200代编,Advanced Database Systems代编,Data Integration代编,SQL代编,Replication代编,Partial Replication代编,Distributed Query Processing代编,Australian代考,UQ代考,University of Queensland代考,INFS3200代考,Advanced Database Systems代考,Data Integration代考,SQL代考,Replication代考,Partial Replication代考,Distributed Query Processing代考,Australianhelp,UQhelp,University of Queenslandhelp,INFS3200help,Advanced Database Systemshelp,Data Integrationhelp,SQLhelp,Replicationhelp,Partial Replicationhelp,Distributed Query Processinghelp,Australian作业代写,UQ作业代写,University of Queensland作业代写,INFS3200作业代写,Advanced Database Systems作业代写,Data Integration作业代写,SQL作业代写,Replication作业代写,Partial Replication作业代写,Distributed Query Processing作业代写,Australian编程代写,UQ编程代写,University of Queensland编程代写,INFS3200编程代写,Advanced Database Systems编程代写,Data Integration编程代写,SQL编程代写,Replication编程代写,Partial Replication编程代写,Distributed Query Processing编程代写,Australianprogramming help,UQprogramming help,University of Queenslandprogramming help,INFS3200programming help,Advanced Database Systemsprogramming help,Data Integrationprogramming help,SQLprogramming help,Replicationprogramming help,Partial Replicationprogramming help,Distributed Query Processingprogramming help,Australianassignment help,UQassignment help,University of Queenslandassignment help,INFS3200assignment help,Advanced Database Systemsassignment help,Data Integrationassignment help,SQLassignment help,Replicationassignment help,Partial Replicationassignment help,Distributed Query Processingassignment help,Australiansolution,UQsolution,University of Queenslandsolution,INFS3200solution,Advanced Database Systemssolution,Data Integrationsolution,SQLsolution,Replicationsolution,Partial Replicationsolution,Distributed Query Processingsolution,