1. Homepage
  2. Homework
  3. Fuqua - Data Infrastructure Homework #4: Dognition Database
This question has been solved

Fuqua - Data Infrastructure Homework #4: Dognition Database

Engage in a Conversation
DukeFuquaData InfrastructureDognitionDatabaseData CleaningSQL

Homework #4 CourseNana.COM

Assignment Type:  Team (You must work with your MQM-assigned team and make a single group submission. You may not collaborate with anyone from any other team.) CourseNana.COM

Due Date:  8:00 AM EST on Monday, October 9, 2023 CourseNana.COM

Instructions: CourseNana.COM

--Your group will submit one file: a Jupyter notebook (i.e. an .ipynb file) that includes your queries, output, and commentary AND includes an embedded picture (i.e. a .jpg file) of your relational schema that you will create in ERDPlus CourseNana.COM

--I recognize that formatting your Jupyter cells can be a little bit of a pain. However, I do ask that you try to get your queries into a similar format to what we’ve been utilizing thus far in the course (and you can always refer to the example below)… CourseNana.COM

                        CourseNana.COM

Example Question: CourseNana.COM

Select “bmi” and “age” for 5 records.  Order the output from highest to lowest “bmi.”  Include your output. CourseNana.COM

            CourseNana.COM

Answer: CourseNana.COM

  CourseNana.COM

            Query CourseNana.COM

  CourseNana.COM

SELECT    bmi, CourseNana.COM

               age CourseNana.COM

     FROM      health CourseNana.COM

     ORDER BY  bmi DESC CourseNana.COM

     LIMIT 5; CourseNana.COM

     CourseNana.COM

     Output    CourseNana.COM

  CourseNana.COM

     bmi       age CourseNana.COM

     6388.49   48 CourseNana.COM

5858.59   54 CourseNana.COM

4324.40   27 CourseNana.COM

4320.96   48 CourseNana.COM

3745.48   66 CourseNana.COM

  CourseNana.COM

            --Notes about the formatting:  CourseNana.COM

                        --Put each SQL clause on a new line CourseNana.COM

                        --Use all caps for all SQL clauses and keywords CourseNana.COM

                        --Write each field in the SELECT statement on a new line CourseNana.COM

                        --Clearly separate SQL clauses from field names, table names, etc. using spaces                               or tabs CourseNana.COM


CourseNana.COM

  CourseNana.COM

Background CourseNana.COM

You have been hired by Dognition (https://www.dognition.com) to examine their data, identify potential issues, and provide some general insights regarding their product and users. The Dognition team is aware that their database is somewhat “messy” but they are unclear about the extent of the problems. There are six tables in the database and the team has provided you with some information about the fields in each table. None of the six tables have a properly-defined primary key. Given all of this information, you have been assigned three main tasks… CourseNana.COM

  CourseNana.COM

Task #1 (Broad Data Cleaning and Relational Schema) CourseNana.COM

·      Drop duplicate records in each table. Be careful if you try to use COUNT(DISTINCT Col1, Col2, Col3, …). If you COUNT(DISTINCT …) with multiple columns at once, a row will be excluded from the COUNT(DISTINCT …) if it has a NULL value in any of those columns. This is simply how COUNT(DISTINCT …) works when you pass it multiple columns at once. Also note that I’m not asking you to actually remove duplicates from each table—you don’t have the permissions to alter the data in the database. However, in order to perform the next step, you definitely need to get rid of duplicates first (or else no set of columns will uniquely identify each row in a table). Also note that SELECT DISTINCT *… is a perfectly legitimate query (and something you might want to explore). CourseNana.COM

·      After eliminating duplicates, determine which field(s) in a table could comprise a primary key for that table IN ITS CURRENT STATE. Put differently, find the column or set of columns that can be used to uniquely identify each row in a table. You do not need to worry about any additions to the database—just work with what you have! ALSO, PLEASE NOTE THAT THE SET OF COLUMNS THAT UNIQUELY IDENTIFY EACH ROW IN A TABLE MAY NOT BE A “VALID” (COMPOSITE) PRIMARY KEY BECAUSE THEY MAY CONTAIN SOME NULL VALUES. WE WILL IGNORE THIS POTENTIAL ISSUE WHEN BUILDING THE RELATIONAL SCHEMA. Some notes… CourseNana.COM

o   The “exam_answers” table is especially messy. After dropping all duplicate records in the table, you should also exclude all records that have a NULL value for dog_guid or end_time. After doing that, I will tell you that the combination of script_detail_id, start_time, end_time, loop_number, and dog_guid can uniquely identify each row in the table. You still need to verify this. CourseNana.COM

o   The “site_activities” table is also somewhat messy—I believe that the combination of the description, created_at, and user_guid fields can uniquely identify each row in the table. However, I would like you to verify this too. CourseNana.COM

o   For the “users” table, I would like for you to set things up such that the user_guid field can uniquely identify each row in the table (basically you’ll have to deal with 13 records that have multiple values for one field—keep the ones that have an “informative” value for that field). Do NOT individually hard-code these 13 cases—figure out a way to eliminate them (I used two CTEs and a LEFT JOIN, but I’m sure there are other methods). CourseNana.COM

·      In your Jupyter Notebook, be sure to include all of your work that got you to your final relational schema! CourseNana.COM

·      If you can embed your relational schema in your Jupyter notebook that would be best! Simply upload a .jpg to your Jupyter file space and then you can embed it into any markdown cell with Edit > Insert Image. CourseNana.COM

·      In everything that follows, ALWAYS USE YOUR “CLEANED” DATA TABLES THAT YOU GENERATED ABOVE. This is easily accomplished by copy/pasting the same CTEs in all subsequent Jupyter cells. CourseNana.COM

  CourseNana.COM

Task #2 (Searching for Potential Issues) CourseNana.COM

·      Examine each table for potential issues, inconsistencies, etc. CourseNana.COM

·      Summarize the “problems” for each table CourseNana.COM

·      Since we do not have access to anyone at Dognition, make assumptions and restrictions that address these issues. For example, if it appears that a certain set of records are blatantly erroneous, state that you are excluding them from all further analyses. CourseNana.COM

·      While this task is tedious it definitely is not trivial. Please be sure to spend ample time investigating these potential issues. However, I recognize that you will by no means be able to discover and vet everything. CourseNana.COM

·      Just to give you an idea of my expectations here, I want you to work as a group for roughly 1 hour on each the dogs and users tables. For all of the other 4 tables an individual team member can spend roughly 30 minutes investigating each one. Be sure to include your work and any final conclusions you make for each table. CourseNana.COM

  CourseNana.COM

Task #3 (Summarize the Data) CourseNana.COM

a)     Use the data to analyze “user sign-ups.” In particular, investigate the “user_guid,” “created_at,” and “membership_type” fields in the “users” table. Discuss potential issues associated with these fields. After identifying and managing these issues (by making reasonable assumptions—there is by no means a “single correct answer” here), generate a result set that contains 8 columns: CourseNana.COM

o   Year CourseNana.COM

o   Month CourseNana.COM

o   Count of user sign-ups in that year/month combination CourseNana.COM

o   Percentage of the total count of user sign-ups in that year/month combination that have membership_type = 1 CourseNana.COM

o   Percentage of the total count of user sign-ups in that year/month combination that have membership_type = 2 CourseNana.COM

o   Percentage of the total count of user sign-ups in that year/month combination that have membership_type = 3 CourseNana.COM

Be sure to examine your results and attempt to explain any irregularities (perhaps using a quick Google search)! CourseNana.COM

b)    Use your results from the “user sign-ups” analysis to investigate the correlation between time of joining (created_at in the users table), type of subscription, and number of tests completed. This analysis is more open-ended and again, does not have a single answer. I believe you will need to use data from the “complete_tests,” “dogs,” and “users” tables (at a minimum). Be sure to include plenty of comments/markdown to describe your approach and any final conclusions. CourseNana.COM

c)     The Dognition team believes that they are having a hard time retaining customers. Some team members believe that the Dognition assessment is too complicated (so that many users get to a certain point, become frustrated, and quit). Other team members believe that there may be issues with the Dognition website, where certain webpages are prone to issues, resulting in user confusion. There is also a hypothesis that the assessment itself is simply better suited to certain “types” of owners and/or dogs. Help Dognition investigate these ideas. (Note that there is definitely no “correct” answer here. You will be judged on your creativity/potential impact and the scope of your analysis.) CourseNana.COM

  CourseNana.COM

Grading CourseNana.COM

·      IN THE EVENT THAT A TEAM MEMBER (OR MULTIPLE TEAM MEMBERS) DO NOT CONTRIBUTE SUFFICIENTLY TO THE PROJECT, I RESERVE THE RIGHT TO ASSIGN DIFFERENT GRADES WITHIN A TEAM. ANY INSTANCE OF THIS TYPE OF ISSUE CAN BE REPORTED DIRECTLY TO ME VIA A DM IN SLACK. CourseNana.COM

·      The assignment will be graded on a 10-point scale. Below is a general rubric for how each task will be evaluated (where we reserve the right to assign point values between these cut-points)… CourseNana.COM

o   Task #1:  2 points CourseNana.COM

--2/2—Showed significant effort and the relational schema is mostly correct. Plenty of work in your Jupyter notebook supporting your relational schema. This is equivalent to a "check plus" grade. CourseNana.COM

--1/2—Showed good effort. Relational schema may have significant issues.  Potentially less supporting work in your Jupyter notebook. This is equivalent to a "check" grade. CourseNana.COM

--0/2—Did not show enough effort. Relational schema is either not submitted or largely incorrect. Very little work in your Jupyter notebook. This is equivalent to a "check minus" grade. CourseNana.COM

o   Task #2:  2 points CourseNana.COM

--2/2—Showed significant effort. Thoroughly investigated potential data issues and summarized these issues for each table. This is equivalent to a "check plus" grade. CourseNana.COM

--1/2—Showed good effort. Potentially a less-detailed investigation or missing altogether for certain tables. This is equivalent to a "check" grade. CourseNana.COM

--0/2—Did not show enough effort. Little to no querying/explanation. This is equivalent to a "check minus" grade. CourseNana.COM

o   Task #3:  4 points CourseNana.COM

--4/4—Showed significant effort. All parts a.), b.), and c.) are completed.  Queries/output are mostly correct, applicable, and interesting. This is equivalent to a "check plus" grade. CourseNana.COM

--3/4—Showed good effort. Analysis is not quite as thorough. More errors in queries/output. Results might be less informative. This is equivalent to a "check" grade. CourseNana.COM

--1/4—Did not show enough effort. Queries/results might be blatantly wrong or unsubstantiated. This is equivalent to a "check minus" grade. CourseNana.COM

o   Formatting:  2 points CourseNana.COM

--2/2—In general, the relational schema and Jupyter Notebook are clear and easy to follow. Queries are formatted properly. There are sufficient notes/annotation for someone familiar with SQL (but not necessarily the dataset) to easily follow along. CourseNana.COM

--1/2—Formatting just not quite to the level of a 2/2, making it difficult to follow along at points. Missing some notes/annotations. CourseNana.COM

--0/2—Serious formatting issues making it incredibly difficult to follow along CourseNana.COM


CourseNana.COM

  CourseNana.COM

Dognition Database Information CourseNana.COM

  CourseNana.COM

Dognition (https://www.dognition.com) is a company that teaches you how to build a deeper CourseNana.COM

connection with your dog by giving you an unprecedented perspective on your dog’s personality CourseNana.COM

and capabilities. CourseNana.COM

  CourseNana.COM

As a customer, you purchase the opportunity to follow detailed instructions and how-to videos CourseNana.COM

about how to play 20 fun and interactive games with your dog that were created by scientists, CourseNana.COM

trainers, and behavioral specialists. These 20 games comprise the Dognition Assessment. When CourseNana.COM

you finish the assessment, you receive a 10-15 page report about your dog’s unique personality CourseNana.COM

dimension. CourseNana.COM

  CourseNana.COM

The Profile Report gives you individualized insight into the cognitive strategies your dog uses to CourseNana.COM

interact with the world, and in-depth breakdowns of how your dog performed in each game CourseNana.COM

compared to other dogs. You are also told your dog's Personality Profile, which is one of nine CourseNana.COM

profiles or “dimensions” with titles like “Socialite,” “Einstein,” or “Maverick.” CourseNana.COM

  CourseNana.COM

Some important notes… CourseNana.COM

·      The 20-game Dognition Assessment assesses 5 core dimensions of cognition:  empathy, communication, cunning, memory, and reasoning. CourseNana.COM

·      With a few exceptions, all games are presented to customers in the same order. CourseNana.COM

·      Customers are not able to advance through tasks or trials out of order. This rule is implemented to make it as easy as possible for participants to follow all the steps of the games correctly. CourseNana.COM

·      After completing the 20-game Dognition Assessment, customers can sign up to receive additional games and activities at the rate of one game and one activity per month. CourseNana.COM

·      The Dognition team ran an experiment to determine whether a promotion that gave customers free access to the first 4 games of the Dognition Assessment would entice them to pay to complete the rest of the Dognition Assessment. CourseNana.COM

·      All information that could be used to identify specific users has been removed. CourseNana.COM

·      More details about the Dognition data collection method can be found at: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135176 (Stewart, Laughlin, et al. “Citizen science as a new tool in dog cognition research.” PloS One 10.9 (2015): e0135176.) CourseNana.COM

  CourseNana.COM

NOTE:  the “data dictionary” provided to us by Dognition (below) may not be fully accurate/complete.  But, we need to make the best with what we were given! CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Field Name CourseNana.COM

Description CourseNana.COM

activity_type CourseNana.COM

Type of physical interaction with the website (ie: “cancel_monthly”, “video_player”, “pdf_report_render”, etc.) CourseNana.COM

birthday CourseNana.COM

Birth year of the dog CourseNana.COM

breed CourseNana.COM

Name of breed CourseNana.COM

breed_group CourseNana.COM

Group to which a dog’s breed belongs (Herding, Hound, Non- Sporting, Sporting, Terrier, Toy, Working) CourseNana.COM

breed_type CourseNana.COM

Type of dog breed (pure breed, mixed breed, cross breed, popular hybrid) CourseNana.COM

category_id CourseNana.COM

All data is NULL in this field of the database CourseNana.COM

city CourseNana.COM

City of the user CourseNana.COM

country CourseNana.COM

Country of the user CourseNana.COM

created_at CourseNana.COM

Time stamp when the record in that row was created.  PLEASE NOTE THAT THIS FIELD DOES NOT NEED TO BE CONSISTENT FOR A SPECIFIC USER_GUID OR DOG_GUID ACROSS TABLES.  IT IS FOR THE RECORD OF INTEREST IN THE TABLE OF INTEREST. CourseNana.COM

description CourseNana.COM

Description related to script_detail_id CourseNana.COM

dimension CourseNana.COM

One of Dognition’s 9 personality profiles (Ace, Charmer, Einstein, Expert, Maverick, Protodog, Renaissance-dog, Socialite, Stargazer) CourseNana.COM

dna_tested CourseNana.COM

Flag for whether a dog’s DNA has been tested (1=Yes/0=No) CourseNana.COM

dog_fixed CourseNana.COM

Flag for whether a dog is neutered (also known as “fixed”) (1=Yes/0=No) CourseNana.COM

dog_guid CourseNana.COM

Unique ID for a dog CourseNana.COM

end_time CourseNana.COM

Timestamp when the user submitted their answer to a question in a Dognition test. CourseNana.COM

exclude CourseNana.COM

Flag provided by Dognition indicating whether an entry should be excluded (reasons not documented in the current data set; (1=Exclude, 0=Don’t necessarily exclude) CourseNana.COM

free_start_user CourseNana.COM

Flag indicating whether a user received a “free start” by getting the first 4 games for free (1=free start, 0=not free start) CourseNana.COM

gender CourseNana.COM

Gender of the dog CourseNana.COM

last_active_at CourseNana.COM

Time-stamp of user’s last activity in his/her Dognition account CourseNana.COM

loop_number CourseNana.COM

Exam questions often have multiple “loops.” This signifies which loop the customer was on. Negative numbers represent “re-dos.” CourseNana.COM

max_dogs CourseNana.COM

Number of dogs associated with a user CourseNana.COM

mean iti (days) CourseNana.COM

Mean inter-test-interval (ITI) between each test a dog completed (in days) CourseNana.COM

mean iti (minutes) CourseNana.COM

Mean inter-test-interval (ITI) between each test a dog completed (in minutes) CourseNana.COM

median iti (days) CourseNana.COM

Median inter-test-interval (ITI) between each test a dog completed (in days) CourseNana.COM

median iti (minutes) CourseNana.COM

Median inter-test-interval (ITI) between each test a dog completed (in minutes) CourseNana.COM

membership_id CourseNana.COM

Unique ID CourseNana.COM

membership_type CourseNana.COM

Type of subscription. 1=Dognition Assessment of initial 20 games, 2=Annual, 3=Monthly, 4=Free, 5=Subscription type of membership chosen by the user.Annual subscriptions provide the Dognition Assessment plus 12 months of subscription service (one new test and activity are “unlocked” per month, beginning of the month after purchase). Monthly subscriptions provide the Dognition Assessment plus the subscription service billed monthly. Free subscriptions were either offered through a “free start” promotion or through the Dognition MOOC. These users have access to the first 4 games for free and can then upgrade to unlock the rest. The “Subscription” option (value of 5) is a recent test that offers the entire 20-game Dognition Assessment for free, but then allows user to upgrade to a monthly subscription CourseNana.COM

rank_by_dogid CourseNana.COM

The chronological rank of each test a dog completed (e.g. 1=first test dog completed, 3=third test dog completed, etc.) CourseNana.COM

rank_by_userid CourseNana.COM

The chronological rank of each test a user completed (e.g. 1=first test user completed, 3=third test user completed, etc.) CourseNana.COM

rating CourseNana.COM

The answer to the question “How surprising were [your dog’s name]’s choices?” Users could choose any number between 1 (not surprising) to 9 (very surprising) CourseNana.COM

script_detail_id CourseNana.COM

Number corresponding with the exact webpage a customer was on CourseNana.COM

script_id CourseNana.COM

An ID representing the type of Dognition activity users are engaged in. Numbers represent personality questionnaire (first set of questions after registering a dog), Dognition Assessment versions, other surveys, and monthly games available after the initial Dognition Assessment is completed CourseNana.COM

sign_in_count CourseNana.COM

Number of times a user signs in to his/her Dognition account CourseNana.COM

start_time CourseNana.COM

Timestamp when user received a question in a Dognition test CourseNana.COM

state CourseNana.COM

State of the user CourseNana.COM

step_type CourseNana.COM

Key for whether the test item was a question or a stopwatch CourseNana.COM

subcategory_name CourseNana.COM

Name of the cognitive sub-category a test belongs to (Communication, Cunning, Empathy, Expression Game, Impossible Task, Laterality, Memory, Numerosity, Perspective Game, Reasoning, Self Control Game, Shaker Game, Shell Game, Smell Game, Social Bias, Spatial Navigation). CourseNana.COM

subscribed CourseNana.COM

Flag for whether a user has a paid subscription (1=Yes/0=No) CourseNana.COM

test_name CourseNana.COM

Name of Dognition test CourseNana.COM

time diff between first and last game (days) CourseNana.COM

Time difference between the first and last game completed by a dog (in days) CourseNana.COM

time diff between first and last game (minutes) CourseNana.COM

Time difference between the first and last game completed by a dog (in minutes) CourseNana.COM

total tests completed CourseNana.COM

Number of tests completed by the dog CourseNana.COM

updated_at CourseNana.COM

Time stamp when the record in that row was created (will be the same as created_at if the record was never updated).  PLEASE NOTE THAT THIS FIELD DOES NOT NEED TO BE CONSISTENT FOR A SPECIFIC USER_GUID OR DOG_GUID ACROSS TABLES.  IT IS FOR THE RECORD OF INTEREST IN THE TABLE OF INTEREST. CourseNana.COM

user_guid CourseNana.COM

Unique ID for a human user CourseNana.COM

weight CourseNana.COM

Weight of the dog (lbs) CourseNana.COM

zip CourseNana.COM

Zip code of the user CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Duke代写,Fuqua代写,Data Infrastructure代写,Dognition代写,Database代写,Data Cleaning代写,SQL代写,Duke代编,Fuqua代编,Data Infrastructure代编,Dognition代编,Database代编,Data Cleaning代编,SQL代编,Duke代考,Fuqua代考,Data Infrastructure代考,Dognition代考,Database代考,Data Cleaning代考,SQL代考,Dukehelp,Fuquahelp,Data Infrastructurehelp,Dognitionhelp,Databasehelp,Data Cleaninghelp,SQLhelp,Duke作业代写,Fuqua作业代写,Data Infrastructure作业代写,Dognition作业代写,Database作业代写,Data Cleaning作业代写,SQL作业代写,Duke编程代写,Fuqua编程代写,Data Infrastructure编程代写,Dognition编程代写,Database编程代写,Data Cleaning编程代写,SQL编程代写,Dukeprogramming help,Fuquaprogramming help,Data Infrastructureprogramming help,Dognitionprogramming help,Databaseprogramming help,Data Cleaningprogramming help,SQLprogramming help,Dukeassignment help,Fuquaassignment help,Data Infrastructureassignment help,Dognitionassignment help,Databaseassignment help,Data Cleaningassignment help,SQLassignment help,Dukesolution,Fuquasolution,Data Infrastructuresolution,Dognitionsolution,Databasesolution,Data Cleaningsolution,SQLsolution,