1. Homepage
  2. Programming
  3. COMP9315 DBMS Implementation Assignment 2 - Multi-attribute Linear Hashed Files

COMP9315 DBMS Implementation Assignment 2 - Multi-attribute Linear Hashed Files

Contact Us On WeChat
CDBMS ImplementationCOMP9315DatabasePostgreSQLSQLAustraliaUNSWMulti-attribute Linear Hashed Files

Assignment 2 Multi-attribute Linear Hashed Files

Most recent changes are shown in red ... older changes are shown in brown. CourseNana.COM


This assignment aims to give you an understanding of CourseNana.COM

  • how database files are structured and accessed
  • how multi-attribute hashing is implemented
  • how linear hashing is implemented The goal is to build a simple implementation of a linear-hashed file structure that uses multi-attribute hashing.


Linear hashed files and multi-attribute hashing are two techniques that can be used together to produce hashed files that grow as needed and which allow all attributes to contribute to the hash value of each tuple. See the course notes and lecture slides for further details on linear hashed files and multi-attribute hashing. CourseNana.COM

In our context, multi-attribute linear-hashed (MALH) files are file structures that represent one relational table, and can be manipulated by three commands: CourseNana.COM

A create command

Creates MALH files by accepting four command line arguments: CourseNana.COM

  • the name of the relation
  • the number of attributes
  • the initial number of data pages (rounded up to nearest 2n)
  • the multi-attribute hashing choice vector

This gives you storage for one relation/table, and is analogous to making an SQL data definition like: CourseNana.COM

create table R ( a1 text, a2 text, ... an text );

Note that, internally, attributes are indexed 0..n-1 rather than 1..n. CourseNana.COM

The following example of using create makes a table called abc with 4 attributes and 8 initial data pages: CourseNana.COM

$ ./create abc 4 6 "0,0:0,1:1,0:1,1:2,0:3,0"

Note that 6 will be rounded up to the nearest 2n (i.e. to 8). If we'd written 8, we would have gotten the same result. CourseNana.COM

The choice vector (fourth argument above) indicates that CourseNana.COM

  • bit 0 from attribute 0 produces bit 0 of the MA hash value
  • bit 1 from attribute 0 produces bit 1 of the MA hash value
  • bit 0 from attribute 1 produces bit 2 of the MA hash value
  • bit 1 from attribute 1 produces bit 3 of the MA hash value
  • bit 0 from attribute 2 produces bit 4 of the MA hash value
  • bit 0 from attribute 3 produces bit 5 of the MA hash value

The following diagram illustrates this scenario: CourseNana.COM

The above choice vector only specifies 6 bits of the combined hash, but combined hashes contain 32 bits. The remaining 26 entries in the choice vector are automatically generated by cycling through the attributes and taking bits from the high-order hash bits from each of those attributes. CourseNana.COM

An insert command

Reads tuples, one per line, from standard input and inserts them into the relation specified on the command line. Tuples all take the form val1,val2,...,valn. The values can be any sequence of characters except ',' and '?'. The bucket where the tuple is placed is determined by the appropriate number of bits of the combined hash value. If the relation has 2d data pages, then d bits are used. If the specified data page is full, then the tuple is inserted into an overflow page of that data page. CourseNana.COM

A select command

Takes a "query tuple" on the command line, and finds all tuples in either the data pages or overflow pages that match the query. Queries take the form val1,val2,...,valn, where some of the vali can be '?' (without the quotes). Such "attributes" represent wild-cards and can match any value in the corresponding attribute position. Some example query tuples, and their interpretation are given below. You can find more examples in the lecture slides and course notes. CourseNana.COM

?,?,? # matches any tuple in the relation
10,?,? # matches any tuple with 10 as the value of attribute 0
?,abc,?  # matches any tuple with abc as the value of attribute 1
10,abc,? # matches any tuple with 10 and abc as the values of attributes 0 and 1

A MALH relation R is represented by three physical files: CourseNana.COM

  • R.info containing global information such as CourseNana.COM

  • a count of the number of attributes CourseNana.COM

  • the depth of main data file (d for linear hashing) CourseNana.COM

  • the page index of the split pointer (sp for linear hashing) CourseNana.COM

  • a count of the number of main data pages CourseNana.COM

  • the total number of tuples (in both data and overflow pages) CourseNana.COM

  • the choice vector (cv for multi-attribute hashing) CourseNana.COM

  • R.data containing data pages, where each data page contains CourseNana.COM

  • offset of start of free space CourseNana.COM

  • overflow page index (or NO_PAGE if none) CourseNana.COM

  • a count of the number of tuples in that page CourseNana.COM

  • the tuples (as comma-separated C strings) CourseNana.COM

  • R.ovflow containing overflow pages, which have the same structure as data pages When a MALH relation is first created, it is set to contain a 2n pages, with depth d=n and split pointer sp=0. The overflow file is initially empty. The following diagram shows an MALH file R with initial state with n=2. CourseNana.COM

After 294 tuples have been inserted, the file might have the following state (depending on field value distributions, tuple sizes, etc): CourseNana.COM

Pages in MALH files have the following structure: a header with three unsigned integers, strings for all of the tuple data, free space containing no tuple data. The following diagram gives an exmple of this: CourseNana.COM

We have developed some infrastructure for you to use in implementing multi-attribute linear-hashed (MALH) files. You may use this infrastructure or replace parts of it (or all of it) with your own, but your MALH files implementation must conform to the conventions used in our code. In particular, you should preserve the interfaces to the supplied modules (e.g. Reln, Page, Query, Tuple) and ensure that your submitted ADTs work with the supplied code in the create, insert and select commands. CourseNana.COM

Setting Up

You should make a working directory for this assignment and put the supplied code there. Read the supplied code to make sure that you understand all of the data types and operations used in the system. CourseNana.COM

$ mkdir Your/ass2/Directory
$ cd Your/ass2/Directory
$ unzip /web/cs9315/22T1/assignments/ass2/ass2.zip

You should see the following files in the directory: CourseNana.COM

  • create.c ... a main program that creates a new MALH relation
  • dump.c ... a main program that lists all tuples in an MALH relation
  • insert.c ... a main program that reads tuples and insert them
  • select.c ... a main program that finds tuples matching a PMR query
  • stats.c ... a main program that prints info about an MAH relation
  • gendata.c ... a main program to generate random tuples
  • bits.h, bits.c ... an ADT for bit-strings
  • chvec.h, chvec.c ... an ADT for choice vectors
  • hash.h, hash.c ... the PostgreSQL hash function
  • page.h, page.c ... an ADT for data/overflow pages
  • query.h, query.c ... an ADT for query scanners (incomplete)
  • reln.h, reln.c ... an ADT for relations (partly complete)
  • tuple.h, tuple.c ... an ADT for tuples (partly complete)
  • util.h, util.c ... utility functions

This gives you a partial implementation of MALH files; you need to complete the code so that it provides the unctionality described below. The supplied code actually produces executables that work somewhat, but are missing a working query scanner implementation (from query.c), a proper MA hash function (from tuple.c), and splitting and data file increase (from reln.c). CourseNana.COM

Effectively, they give a static hash file structure with overflows. CourseNana.COM

To build the executables from the supplied code, do the following: CourseNana.COM

gcc -Wall -Werror -g -std=c99 -c -o create.o create.c
gcc -Wall -Werror -g -std=c99 -c -o query.o query.c
gcc -Wall -Werror -g -std=c99 -c -o page.o page.c
gcc -Wall -Werror -g -std=c99 -c -o reln.o reln.c
gcc -Wall -Werror -g -std=c99 -c -o tuple.o tuple.c
gcc -Wall -Werror -g -std=c99 -c -o util.o util.c
gcc -Wall -Werror -g -std=c99 -c -o chvec.o chvec.c
gcc -Wall -Werror -g -std=c99 -c -o hash.o hash.c
gcc -Wall -Werror -g -std=c99 -c -o bits.o bits.c
gcc create.o query.o page.o reln.o tuple.o util.o chvec.o hash.o bits.o -o gcc -Wall -Werror -g -std=c99 -c -o dump.o dump.c
gcc dump.o query.o page.o reln.o tuple.o util.o chvec.o hash.o bits.o -o dugcc -Wall -Werror -g -std=c99 -c -o insert.o insert.c
gcc insert.o query.o page.o reln.o tuple.o util.o chvec.o hash.o bits.o -o gcc -Wall -Werror -g -std=c99 -c -o select.o select.c
gcc select.o query.o page.o reln.o tuple.o util.o chvec.o hash.o bits.o -o gcc -Wall -Werror -g -std=c99 -c -o stats.o stats.c
gcc stats.o query.o page.o reln.o tuple.o util.o chvec.o hash.o bits.o -o sgcc -Wall -Werror -g -std=c99 -c -o gendata.o gendata.c
gcc gendata.o query.o page.o reln.o tuple.o util.o chvec.o hash.o bits.o -o

This should not produce any errors on the CSE servers; let me know ASAP if this is not the case. CourseNana.COM

Once you have the executables, you could build a sample database as follows: CourseNana.COM

./create R 3 4 "0,0:0,1:0,2:1,0:1,1:2,0"
cv[0] is (0,0)
cv[1] is (0,1)
cv[2] is (0,2)
cv[3] is (1,0)
cv[4] is (1,1)
cv[5] is (2,0)
cv[6] is (0,31)
cv[7] is (1,31)
cv[8] is (2,31)
cv[30] is (0,23)
cv[31] is (1,23)![![](https://)](https://)

This command creates a new table called R with 3 attributes. It will be stored in files called R.info, R.data and R.ovflow. The data file initially has 4 pages (so depth d=2). The overflow file is initially empty. he lower-order 6 bits of the choice vector are given on the command line; the remaining bits!(https://) are auto-generated. Given the file size (4 pages), only two of the hash bits are actually needed. You could check the status of the files for table R via the stats command: CourseNana.COM

$ ./stats R
Global Info:
#attrs:3 #pages:4 #tuples:0 d:2 sp:0
Choice vector
Bucket Info:
# Info on pages in bucket
0 (d0,0,1012,-1)
1 (d1,0,1012,-1)
2 (d2,0,1012,-1)
3 (d3,0,1012,-1)

The insert command prints the hash value for the tuple (based on just the first attribute), and then inserts it into the file. CourseNana.COM

This shows that each data page has one overflow page, and that each data page has roughly the same number of tuples. The bucket starting at data page 0 has a few more tuples than th other buckets, because it has more tuples (15) in the overflow page. Note that page IDs in the overflow pages are distinguished by starting with "ov". Note also that e.g. the data page at position 3 in the data file has an overflow page at position 1 in the overflow file; this is because page 3 filled up before pages 1 and 2. CourseNana.COM

One other thing to notice here is that the file has not expanded. It still has the 4 original data pages. Even if you added thousands of tuples, it would still have only 4 data pages. This is because linear hashing is not yet implemented. Implementing it is one of your tasks. CourseNana.COM

You could then use the select command to search for tuples using a command like: CourseNana.COM

$ ./select R 101,?,?

This aims to find any tuple with 101 as the ID value; there will be one such tuple, since ID values are unique. This returns no solutions because query scanning is not yet implemented. mplementing it is another of your tasks. CourseNana.COM

Task 1: Multi-attribute Hashing

The current hash function does not use the choice vector to produce a combined hash value. It simply uses the hash value of the first attribute (the ID value) to generate a hash for the tuple. Your first task is to modify the tupleHash() function to use the relevant bits from each attribute hash value to form a composite hash. The choice vector determines the "relevant" bits. You can find more details on how a multi-attribute hash value is produced in the lecture slides and notes. CourseNana.COM

Task 2: Selection (Querying)

The query scan data type is found in query.c and query.h and is used only in select.c. At present, the data type is incomplete. You need to design a suitable query scanning data structure and implement the operations on it. The functions contain rough approximations to the algorithms you will need to build; you can find more details in the lecture slides and course notes. Most (all?) of the helper functions you'll need are in other data types, but you can add any others that you find necessary. CourseNana.COM

Task 3: Linear Hashing

As noted above, the current implementation is essentially a static version of single-attribute hashing. You need to add functionality to ensure that the file expands after every c insertions, where c is the page capcity c = floor(B/R) ≈ 1024/(10*n) where n is the number of attributes. Add one page at the end of the file and distribute the tuples in the "buddy" page (at index 2d less) between the old and new pages. Determine where each tuple goes by considering d+1 bits of the hash value. This will involve modifying the addToRelation() function, and will most likely require you to add new functions into the reln.c file (and maybe other files). CourseNana.COM

You can simplify the standard version of linear hashing by not removing overflow pages from the overflow chain of the data page hey are attached to. This may result in some data pages having multiple empty overflow pages; this is ok if they are eventually used to hold more tuples. The following diagram shows an example of what might occur during a page split: CourseNana.COM

How we Test your Submission

We will compile your submission for testing as follows: CourseNana.COM

$ unzip
$ tar xf
# extracts our copies of ...
# create.c dump.c insert.c select.c stats.c
$ make
# should produce executables ...
# create dump insert select stats

Get Expert Help On This Assignment

Scan above qrcode with Wechat

C代写,DBMS Implementation代写,COMP9315代写,Database代写,PostgreSQL代写,SQL代写,Australia代写,UNSW代写,Multi-attribute Linear Hashed Files代写,C代编,DBMS Implementation代编,COMP9315代编,Database代编,PostgreSQL代编,SQL代编,Australia代编,UNSW代编,Multi-attribute Linear Hashed Files代编,C代考,DBMS Implementation代考,COMP9315代考,Database代考,PostgreSQL代考,SQL代考,Australia代考,UNSW代考,Multi-attribute Linear Hashed Files代考,Chelp,DBMS Implementationhelp,COMP9315help,Databasehelp,PostgreSQLhelp,SQLhelp,Australiahelp,UNSWhelp,Multi-attribute Linear Hashed Fileshelp,C作业代写,DBMS Implementation作业代写,COMP9315作业代写,Database作业代写,PostgreSQL作业代写,SQL作业代写,Australia作业代写,UNSW作业代写,Multi-attribute Linear Hashed Files作业代写,C编程代写,DBMS Implementation编程代写,COMP9315编程代写,Database编程代写,PostgreSQL编程代写,SQL编程代写,Australia编程代写,UNSW编程代写,Multi-attribute Linear Hashed Files编程代写,Cprogramming help,DBMS Implementationprogramming help,COMP9315programming help,Databaseprogramming help,PostgreSQLprogramming help,SQLprogramming help,Australiaprogramming help,UNSWprogramming help,Multi-attribute Linear Hashed Filesprogramming help,Cassignment help,DBMS Implementationassignment help,COMP9315assignment help,Databaseassignment help,PostgreSQLassignment help,SQLassignment help,Australiaassignment help,UNSWassignment help,Multi-attribute Linear Hashed Filesassignment help,Csolution,DBMS Implementationsolution,COMP9315solution,Databasesolution,PostgreSQLsolution,SQLsolution,Australiasolution,UNSWsolution,Multi-attribute Linear Hashed Filessolution,