1. Homepage
  2. Programming
  3. MATH1309/2142 Assessment Task 3: Assessing Druggability in Drug Discovery: A Bioinformatics Study

MATH1309/2142 Assessment Task 3: Assessing Druggability in Drug Discovery: A Bioinformatics Study

Engage in a Conversation
RMITMATH1309MATH2142Machine LearningSAS

MATH1309/2142 Assessment Task 3
5 Questions, Total Marks = 315, Worth = 40% of final course grade CourseNana.COM

Assessing Druggability in Drug Discovery: A Bioinformatics Study CourseNana.COM

(Dataset: Refer to the “Drugbank dataset” excel file, N=400) Project Description: CourseNana.COM

  • Drug-likeness is not a precisely defined concept in drug discovery. Predicting druggability is of high practical relevance in pharmaceutical research. In vitro absorption, distribution, metabolism, and elimination (ADME) assays are now being conducted throughout the drug discovery process, but there is still a need to develop faster and better analytic methods to enhance the 'developability' of drug leads, and to formalise strategies for ADME assessment of good molecular candidates in the drug discovery and pre-clinical stages. CourseNana.COM

  • This study involves 400 small molecules data retrieved from the DrugBank 3.0 database a unique chem-informatics resource analysed by Hudson et al., (2014, 2017, 2019, 2020). CourseNana.COM

  • The data set contains 9 physico-chemical variables (MW, PSA, log P, Log D, ...), and the molecule’s mode of delivery (oral versus non-oral). See Table 1 below. CourseNana.COM

    Table 1: CourseNana.COM

    In addition, the data set contains new druggability rules (score functions counting the number of violations for each molecule on each of the 9 variables) developed by Hudson et al. These account for the molecule’s size, permeability etc., but use new cutpoints for each of 9 molecular parameters (Table 2), different to those conventionally used by the Food and Drug Advisory group (FDA) (Lipinski’s rule Ro5, Table 2). CourseNana.COM

    Hudson et al based on the 9 molecular variables (ADME variables) found distinct clusters of the molecules identified as “poor” versus “good” druggables. The data set contains the 9 ADME variables (Table 1), and a scoring function (score9_LogD) along with the molecule’s mode of delivery (oral versus non-oral). CourseNana.COM

    The score is denoted as score9_ LogD. Note that the function score9_LogD is a continuous variable of range 0 to 9 - comprised of the 4 traditional parameters of the rule of five (Ro5) (Lipinski, 2016) (Table 1) plus 4 extra parameters (PSA, number of rotatable bonds, rings, N and O atoms) with 2 extra candidates lipophicility, log P or logD, the latter is the distribution coefficient, recently suggested as a possible preferable predictor for permeation, preferable to Lipinski’s traditional partition coefficient, Log P, an often used predictor for permeation. CourseNana.COM

We also dichotomise the score9_LogD_ into 2 groups based on the cutpoint of 4 violations: Cutpoint <=4 is a non-violator molecule
Cutpoint >4 is violator (non-druggable) molecule. CourseNana.COM

This is equivalent to: Score9_Log D_group <=4 (non-violators) versus Score9_log D_group >4 (violators) CourseNana.COM

Table 2: Values above the cutpoints score a 1.0 CourseNana.COM

Description of the drugbank dataset of N = 400 molecules: CourseNana.COM

A random sample of 12 molecules’ data is shown below as an example (this is not the full N=400 dataset): CourseNana.COM

data columns continued... CourseNana.COM

Question 1: PCA analysis [85 marks]
Perform the following in SAS (ensure to include your code and outputs and interpretations): CourseNana.COM

  1. a)  Perform a principal component analysis using SAS on the correlation matrix for the 9 ADME variables. Show your full SAS code and output. (10 marks) CourseNana.COM

  2. b)  Ensure you obtain the following 5 types of plots related to PROC PCA. (All plots should be placed in clearly labelled Appendices). (10 marks) CourseNana.COM

    • Scree plot
    • Profile plot
    • Component Pattern plots • Score plots
    • Loading Plots
    CourseNana.COM

  3. c)  Report the eigenvalues and the eigenvectors. (5 marks) CourseNana.COM

  4. d)  What percentage of the total sample variation and cumulative variation is accounted for by each of the PCs? (5 marks) CourseNana.COM

  5. e)  Write out the formulation for the PCs. (10 marks) CourseNana.COM

  6. f)  Interpret the PCs via eigenvalues, your component pattern profiles AND your loading plots from SAS. (10 marks) CourseNana.COM

  7. g)  Label your score plot for PC2 versus PC1 by violator and non-violator status and summarise any trends and findings. (5 marks) CourseNana.COM

  8. h)  Label your score plot for PC2 versus PC1 by oral status and summarise any trends and findings. (5 marks) CourseNana.COM

  9. i)  Label your score plot for PC3 versus PC2 by violator and non-violator status and summarise any trends and findings. (5 marks) CourseNana.COM

  10. j)  Label your score plot for PC3 versus PC2 by oral status and interpret any trends and findings. (5 marks) CourseNana.COM

  11. k)  Using BOTH a formal test of hypothesis and relevant plots can the data be effectively summarized in fewer than 9 dimensions, k< p? Report k and justify your answer and establish what your k is via the relevant hypothesis test. Show your SAS code and formula. (15 marks) CourseNana.COM

Question 2: PCA with reduced k < p for plots [40 marks] CourseNana.COM

Using your reduced dimensionality k determined in Question 1 (k), rerun the PCA on the 9 ADME variables for the violators and the non-violators groups separately (where violatory status is delineated by score_9 log D ). CourseNana.COM

  1. a)  Recreate the 5 plots related to PROC PCA for your given k.
    (All plots should be placed in clearly labelled Appendices) (10 marks) CourseNana.COM

  2. b)  Interpret the PCs via eigenvalues, your component pattern profiles AND your loading plots from SAS based on your reduced dimensionality k and k PC’s. (15 marks) CourseNana.COM

  3. c)  Label your score plot for PC2 versus PC1 by oral status and summarise any trends and findings. (5 marks) CourseNana.COM

  4. d)  Label your score plot for PC2 versus PC1 by violatory status, summarise any trends and findings. (5 marks) CourseNana.COM

  5. e)  Which of the k PCs are skewed? Use matrix plots of the PC scores to answer this. (5 marks) CourseNana.COM

Question 3: DISCRIMINANT ANALYSIS ON 9 ADME VARIABLES BY 2 GROUPS OF MOLECULES [55 marks] CourseNana.COM

Aim: to run PROC DISCRIM to investigate how the 9 ADME variables discriminate the violators from the non-violators. CourseNana.COM

  1. a)  Generate the means, standard deviations, and variance-covariance matrix of the data for the violators. (5 marks) CourseNana.COM

  2. b)  Generate the means, standard deviations, and variance-covariance matrix of the data for the non- violators (5 marks) CourseNana.COM

  3. c)  Produce the correlation matrix with associated p values, and a matrix scatterplot of the inputted data for the violators. (5 marks) CourseNana.COM

  4. d)  Produce the correlation matrix with associated p values, and a matrix scatterplot of the inputted data for the non-violators. (5 marks) CourseNana.COM

  5. e)  Run SAS DISCRIM and from your resultant outputs answer the following questions.
    HINT; Use priors: "violators"=0.30 "non-violators"=0.70. Ensure your output is clearly labelled in an Appendix. (10 marks)
    CourseNana.COM

  6. f)  Is Σ1= Σ2 justify your answer based on the appropriate test statistic and output from SAS. (5 marks) CourseNana.COM

  7. g)  How is a molecule with X0T = (MW, LogP, LogD, Hdonors, Hacceptors, PSA, ROT, NATOM, NRING) = (445.429, -2.7, -3.28938, 8, 12, 207.27, 9, 55, 3) allocated? (10 marks) CourseNana.COM

  8. h)  Report the LDFs obtained from the output and describe what they mean? (5 marks) CourseNana.COM

  9. i)  Show the resultant confusion matrix and interpret it. (5 marks) CourseNana.COM

Question 4: STEPWISE DISCRIM ON 4 GROUPS OF MOLECULES [90 marks] CourseNana.COM

Now perform a stepwise DISCRIM using oral by violatory status groups defined below. CourseNana.COM

  1. a)  Create the following variable i.e., an interaction term between oral status and score 9_ Log D violation status at 4 levels as defined below: (5 marks) CourseNana.COM

  2. b)  Obtain a cross-table in SAS or otherwise of oral by violatory status for the whole data set. How many molecules and percentages are in each of these 4 levels? Along with the table create an appropriate histogram. Interpret your results (10 marks) CourseNana.COM

  3. c)  Generate the means, standard deviations, and the variance-covariance matrix and correlation matrices of the ADME data for each of the 4 levels defined by the Oral status by_violatory status variable. Interpret your descriptive profiles in terms of how the variables differ across the 4 levels. (20 marks) CourseNana.COM

  4. d)  Generate matrix plots of the 9 ADME variables for the 4 levels defined by the Oral status by_ violatory status variable. Interpret how the variables differ in distribution, correlation, across the 4 levels. (15 marks) CourseNana.COM

  5. e)  Run a STEPWISE DISCRIM analysis using the 9 ADME variables (Table 1) as the input and the above 4 level grouping variable, Oral status by_violatory status. (25 marks) CourseNana.COM

  6. f)  Which variables best discriminate the 4 Oral status by_violatory status classes? (5 marks) CourseNana.COM

  7. g)  Give the mean, variances and correlations between these best discriminating variables across the 4 level Oral status by_violatory status variable and interpret trends. (10 marks) CourseNana.COM

CourseNana.COM

Question 5: [45 marks] CourseNana.COM

  1. a)  Run a STEPWISE DISCRIM analysis using your subset of k PCs from Question 2, now as the input variables and the above 4 level grouping variable, Oral status by _violatory status.
    (20 marks)
    CourseNana.COM

  2. b)  Which PC variables best discriminate between the 4 oral by violatory groups/classes? (5 marks) CourseNana.COM

  3. c)  Give the mean vector, variance-covariance matrix and correlations between these chosen PCs CourseNana.COM

    variables for each of the 4 oral by violatory groups/classes and interpret trends. (10 marks) CourseNana.COM

  4. d)  For the PC variables selected by the stepwise discriminant analysis determine the correlation between them and the original data (i.e., the 9 ADME variables in Table 1). (10 marks) CourseNana.COM

--------------------- THE END  CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
RMIT代写,MATH1309代写,MATH2142代写,Machine Learning代写,SAS代写,RMIT代编,MATH1309代编,MATH2142代编,Machine Learning代编,SAS代编,RMIT代考,MATH1309代考,MATH2142代考,Machine Learning代考,SAS代考,RMIThelp,MATH1309help,MATH2142help,Machine Learninghelp,SAShelp,RMIT作业代写,MATH1309作业代写,MATH2142作业代写,Machine Learning作业代写,SAS作业代写,RMIT编程代写,MATH1309编程代写,MATH2142编程代写,Machine Learning编程代写,SAS编程代写,RMITprogramming help,MATH1309programming help,MATH2142programming help,Machine Learningprogramming help,SASprogramming help,RMITassignment help,MATH1309assignment help,MATH2142assignment help,Machine Learningassignment help,SASassignment help,RMITsolution,MATH1309solution,MATH2142solution,Machine Learningsolution,SASsolution,