The first practical assignment consists of four main tasks. The first assesses your knowledge of RDF(S)Links to an external site. as a modelling language. The second tests your understanding of deductive reasoning in RDF(S), finally the third and fourth allow you to explore the practical use of the SPARQLLinks to an external site. query language and how to process queries using a programming language. The solution to the four tasks should be submitted electronically through CANVAS as a Zip. In preparing the solution file and naming it please follow carefully the submission instructions given below. The assignment deadline is March 13th, 2023 at 17.00.
Task 1 (10 marks) Consider the RDF graph expressed in the picture, that models the recipe for Lemon Tiramisu. In the graph, ellipsis correspond to classes, rectangles to literals, and edges are annotated with the name of the relation. Translate the graph in Turtle syntax. You should ensure your Turtle file uses the relevant namespaces. For your convenience, you should use Qnames for your node and edge labels, and these should be specified in your graph. The full marks will be awarded for providing an accurate representation of the graph (download the PDF of the image Actions here).
Task 2 (25 marks) Consider the following RDF graph G, expressed in Turtle syntax, where the triples have been numbered to improve readability. Assume that rdf and rdfs are the usual namespaces, and that ex: http://example.org/stock is also used:
prefix ex: <http://www.example.org/stock/>
ex:hasComponent rdfs:domain
ex:Computer .
ex:hasComponent rdfs:range ex:Component .
ex:hasComponent rdfs:subPropertyOf
ex:contains .
ex:Component rdfs:subClassOf
ex:StockItem .
ex:PC ex:hasComponent
ex:MotherBoard .
ex:Keyboard rdfs:subClassOf
ex:StockItem .
A. Determine whether the following graph can be entailed from G. If the graph can be entailed provide a proof for it, if not explain why : (5 marks)
ex:PC ex:contains _:x .
_:x ex:hasComponent ex:Keyboard .
B. Find four triples that can be entailed from G, and prove how they can be entailed. Avoid trivial RDFS valid triples e.g. ex:Computer rdf:type rdfs:Class . (20 marks, 5 marks each)
For your convenience the RDFS entailment rules are:
Task 3 (65 marks in total: 10 marks for Task 3a, 50 marks for Task 3b, 5 marks for quality of solution) The NobelLaureates Download NobelLaureatesdataset is an RDF knowledge graph (in Turtle) containing information about the Nobel prize winners and their prizes from 2013 until 2022. The dataset is modelled according to the ontology (schema) specified in the Nobel Prize Linked Data VocabularyLinks to an external site.. The vocabulary uses several common namespaces such as DBpedia Ontology or FOAF, and models three primary classes: NobelPrizes, Laureates, and LaureateAward. A Nobel Prize can be shared between up to three persons, while the same person can receive multiple Nobel Prizes. Therefore, every NobelPrize contains between one and three LaureateAwards that, among other things contain a motivation. The Laureate class is linked both to the NobelPrize and LaureateAward classes, and is a subclass of foaf:Person (or foaf:Organization if the Laureate is an organization) containing generic biographic information. Other classes used to model the graph are taken from other vocabularies, e.g. dbo:University or dbo:City.The full description of the schema can be found in the Noble Prize Linked Data VocabularyLinks to an external site..
As part of this task you should write two programs, that perform the two tasks below. You can choose to write your programmes either in Java or in Python, but you must ensure that you follow the instructions provided and that you don't use any additional library or build automation tool. The aim of the tasks is to assess your understanding of RDF as a data and schema model and your ability to process and query RDF graphs programmatically, rather than your programming ability. Therefore, these tasks will be assessed with respect to the logic of the programme rather than to the programming ability.
Your queries should be documented in a short PDF report where, for each query, you will describe the exact set of prefixes to be used in the query and the terms you use from the chosen vocabularies, the query expression. and a brief explanation (half a page max) of how your query works. Full marks will be awarded (for both Tasks 3a and 3b) for a query that provides the correct results, uses the appropriate namespaces, and for which a valid explanation is provided. Consideration will be given to the efficiency of the query (e.g it is almost always better to include selective patterns in the query rather than doing filtering on the client side, it is better to avoid unnecessary patterns to match etc). Solutions that are particularly well written (e.g. compact queries, avoid repeated results or elegant code) will be awarded an additional 5 marks for Task 3.
Task 3a (10 marks) As part of this task you will write code that uses SPARQL to generate a new graph from the NobelLaureates dataset. This new graph will contain the name, date of birth, city of birth, Nobel prize category and gender of all Nobel prize winners born in the UK and will be structured using the model in the original graph, therefore it should model instances of nobel:Laureate where the name is the value of a foaf:name predicate or the date of birth is the value of the predicate http://dbpedia.org/property/dateOfBirthLinks to an external site. or foaf:birthday, etc. Save this new graph in Turtle format with the name laureate-details.ttl.
You will find it useful to ensure that you are familiar with the Query app API in Jena Links to an external site.if you are writing your programme in Java, or with the rdflib SPARQL implementation in PythonLinks to an external site.. Task 3.b (50 marks) In this task you will write and execute the SPARQL queries that correspond to the following questions. Each of the query below is worth 10 marks: A. Find all the female Nobel laureates with their date of birth and their date of death, if they are no longer alive. B. Find all Nobel laureates born in the US who were born between 1958 and 1968, together with the category for which they won the award and their share of the prize. C. Find all the Nobel prizes that were not awarded to a person, and their categories . D. Count all the Nobel laureates who did not work in the same country of their birth at the time of the award. E. List the affiliations in descending order of number of Nobel laureates that were affiliated to them at the time of the award. For your convenience, here is the complete list of the namespaces used in the Nobel dataset. You can change the abbreviations to something that is more convenient for you: PREFIX dbpedia-owl: http://dbpedia.org/ontology/ PREFIX nobel: http://data.nobelprize.org/terms/ PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX yago: http://yago-knowledge.org/resource/ PREFIX viaf: http://viaf.org/viaf/ PREFIX dcterms: http://purl.org/dc/terms/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX d2r: http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/config.rdf# PREFIX dbpedia: http://dbpedia.org/resource/ PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX map: http://data.nobelprize.org/resource/# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX freebase: http://rdf.freebase.com/ns/ PREFIX dbpprop: http://dbpedia.org/property/ PREFIX skos: http://www.w3.org/2004/02/skos/core#
You might find it useful to try your query first on the Nobel prize SPARQL endpointLinks to an external site. that offers some debugging functionalities. However, remember that the endpoint will return the complete set of results, whereas this assignment is using a smaller local dataset.
Submission details
Your complete submission should be contained in one Zipped file named COMP318yourSurnameFirstName.zip (e.g COMP318-SmithJohn.pdf). Only use the archive tools provided on your University machines to compress your file. If we cannot open the file, the assignment will be marked with 0. The file should contain:
- the graph representing the dataset in Task 1
- a PDF report with the derivations and the new triples inferred in Task 2
- the PDF report explaining the queries, and the Python (.py files) or Java (.java files) code executing them.
Regarding your code:
When writing the code, make sure that you don't use any additional library or build automation tools (e.g. Maven). If your chosen programming language is Python, print the query results using Python print rather than Pandas. Make sure your code runs on the University computers, and does not need any additional configuration. If we cannot execute your programme, you will be awarded 0 marks for the task.
Submit the zipped file through Canvas. Unlimited attempts will be allowed, but only the most recent one will be marked.