ConformationalSignatureAnalysis.Rmd
This vignette explains how to perform conformational signature analysis (CSA). The essential idea is when you have an experimental design in which you have numerous states (such as ligand binding, mutants, antibodies) and in each state you performed HDX-MS and potentially measured a downstream outcome. You can use conformational signature analysis to identify the regions of the protein that are most likely to be associated for the observed outcome. This can then inform design choices. There are three options currently implemented:
Unsupervised Conformational Signature Analysis - This is the most basic form of the analysis. It will identify the regions of the protein that are most variable across the states. This is useful for identifying regions that are associated with the variation in the data. By performing the analysis in this way you will identify the states that are most similar and different from each other.
Supervised Conformational Signature Analysis with discrete outcomes - This is a more advanced form of the analysis using a discrete outcome (such as “monovalent” or “antagonist”). It will identify the regions of the protein that are most variable across the states and are also associated with the outcome. This is useful for identifying regions that are associated with the outcome. This is done by dimensionality reduction that accounts for the outcome.
Supervised Conformational Signature Analysis with continuous outcomes - This is a more advanced form of the analysis using a continuous outcome such as binding affinity. It will identify the regions of the protein that are most variable across the states and are also associated with the outcome. This is useful for identifying regions that are associated with the outcome. This is done by dimensionality reduction that accounts for the continuous outcome.
The analysis currently supports missing values in the outcomes but not the HDX-MS data.
The trickest part of the analysis is setting up the data correctly. You can do this in a excel file and load as csv but we will demonstrate using R manually below. Essentially you want to create a dataset where the rownames are the different states and columns indicate the labels associated with the states.
First, we will load the library and the data. The data is a list of dataframes where each dataframe is a state. Each dataframe should have the following columns
This is as in the single protein analysis. We suggest performing that analysis carefully, first, before proceeding to the conformational signature analysis. We recommend storing the reference or apo state in a separate dataframe. The analysis expects consistency between the experiments (e.g. replicates and number of timepoints, region of the protein) but you do not need to measure the exact same peptides in each experiment. You will also need to name the states you are interested in and store them in a vector.
library(RexMS)
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
# for conformation signature analysis organise data into a list
data("LXRalpha_compounds") # data for different compounds as a list
data("LXRalpha_processed") # data for the reference state
# usual important terms
numTimepoints <- length(unique(LXRalpha_compounds[[1]]$Exposure))
Timepoints <- unique(LXRalpha_compounds[[1]]$Exposure)
numPeptides <- length(unique(LXRalpha_compounds[[1]]$Sequence))
R <- max(LXRalpha_compounds[[1]]$End)
# names of compounds
states <- names(LXRalpha_compounds)
# let look at states
states
## [1] "T0.091317" "WAY.254011" "F1" "AZ876" "AZ1"
## [6] "AZ2" "AZ3" "AZ4" "GW3965" "BMS.852927"
## [11] "AZ6" "AZ7" "LXR.623" "AZ8" "AZ9"
## [16] "AZ5" "X24.25EC"
Our analysis has 18 compounds and a reference state. We will use the
reference state as the reference for the conformational signature
analysis. Whichever route you follow (e.g. supervised or unsupervised)
the initial steps are identical. We will start with the list in an
lapply
function. To each element in this list (which is an
entire HDX dataset) we apply the rex
function. This
function will perform the usual rex model. There are only 3 timepoints
in this analysis so we will set the tCoef
to
c(0, 1, 1)
. We will also set the seed to 1 for
reproducibility. We will use the SerialParam
for the
BPPARAM
argument. This is because the analysis is not too
computationally intensive but if you run for many iterations and chains
we suggest using parrallel processing. We will store the results in a
list. For brevity we will run the analysis for 100 iterations but we
recommend much more (typically 5000) for final analysis. The following
function takes a while to run so we will not run them here but pre-load
the results from the package the code is placed for completeness.
set.seed(1)
rex_lxr <- lapply(LXRalpha_compounds, function(xx) {
rex(HdxData = DataFrame(xx),
numIter = 100,
R = R,
numtimepoints = numTimepoints,
timepoints = Timepoints,
density = "laplace",
seed = 1L,
tCoef = c(0, rep(1, 2)),
BPPARAM = SerialParam())})
The next step is to apply RexProcess
to the output of
the rex
function. Remember there are 18 compounds so we
will use lapply
again. We will set the range to 50:100 -
removing the first 50 as burnin, though we would typically recommend you
run about 5000 iterations. This is mcmc iteration we are interested in.
We will store the results in a list. Again, we do not actually run the
code here but pre-load the results from the package.
rex_lxr <- lapply(seq_along(rex_lxr),
function(i)
RexProcess(HdxData = DataFrame(LXRalpha_compounds[[i]]),
params = rex_lxr[[i]],
range = 50:100))
We will now load the results from the package. The results are stored
in the rex_lxr
object.
data("rex_lxr")
In the next step we would perform the same differential analysis as
in the differential analysis tutorial but 18 times because there are 18
compounds. We will use the same reference state for each compound. We
will store the results in a list. Again, we do not actually run the code
here but pre-load the results from the package. Remember
params
are the various states and the HdxData
is the reference state HDX data. The params
is the output
of the rex
function, which has been processed. The
whichChain
is the chain of the analysis you are interested
in using to generate the results. We will use chain 1, but recommend
assessing the quality of multiple chains.
out_lxr_compound_proccessed <- lapply(rex_lxr, function(xx)
processDifferential(HdxData = DataFrame(LXRalpha_processed),
params = xx,
whichChain = 1))
We will now load the results from the package. The results are stored
in the out_lxr_compound_proccessed
object.
data("out_lxr_compound_proccessed")
To perform the unsupervised conformational signature analysis, we
will use the unsupervisedCSA
function. This function will
take the output of the processDifferential
function. Whilst
for unsupervised analysis you do not need to specify the outcome, we
will generate the labels results anyway to highlight the details of
setting up that analysis.
For our protein there are 18 compounds and a reference state. Each of the compounds induce one outcome in two scenarios or the outcome is unknown (which is allowed in our analysis). We will generate the labels for the outcomes. We will store the results in a dataframe with the states as the rownames. For proteins where the outcome is unknown, we will set the outcome to “Unknown”. For the two scenarios, the first is ABCA1 induction and the second is lipogencity. We will set the outcome to “low” or “high” for ABCA1 induction. We will set the outcome to “Non-Lipogenic” or “Lipogenic” for lipogenicity. Of course, you may adapt these outcomes to your own analysis. We will then convert the outcomes to factors with the levels in the order we are interested in. You can of course load this dataframe from something that was made in excel and covert to a dataframe.
# First set up the dataframe with unknowns and correct rownames
labels <- data.frame(ABCA1 = rep("Unknown", length(states)),
lipogenic = rep("Unknown", length(states)))
rownames(labels) <- states
# Now set the outcomes with ABCA1 induction
labels$ABCA1[rownames(labels) %in% c("LXR.623", "AZ9", "AZ8", "AZ5")] <- "low"
labels$ABCA1[rownames(labels) %in% c("Az1", "AZ2", "AZ3", "AZ4", "AZ6",
"AZ7", "AZ876", "T0.901317", "WAY.254011",
"F1", "GW3965", "BMS.852927")] <- "high"
# Now set the outcomes with lipogenicity
labels$lipogenic[rownames(labels) %in% c("AZ6", "AZ7", "AZ9",
"AZ8", "GW3965", "BMS.852927",
"LXR.623")] <- "Non-Lipogenic"
labels$lipogenic[rownames(labels) %in% c("AZ876", "AZ1",
"T0.901317", "F1", "WAY.254011")] <- "Lipogenic"
# Now convert to factors with the correct levels
labels$ABCA1 <- factor(labels$ABCA1,
levels = c("low", "high", "Unknown"))
labels$lipogenic <- factor(labels$lipogenic,
levels = c("Non-Lipogenic", "Lipogenic", "Unknown"))
We will now run the unsupervised conformational signature analysis.
We will set the quantity
to “TRE” as this is the quantity
we use to generate the signature. We will set the states
to
the states we are interested in. We will set the
whichTimepoint
to 600. This is the timepoint used to
generate the conformational signature. We will store the results in the
ucsa
object. Since this is the unsupervised analysis, we do
not need to specify the labels
argument.
ucsa <- UnsupervisedCSA(RexDifferentialList = out_lxr_compound_proccessed,
quantity = "TRE",
states = states,
whichTimepoint = 600)
We will now look at the results. The results are stored in a list
with the first entry pca_states
being the Principal
Component Analysis of the RexMS
results. The rotation
matrix can be used as the conformational signature - providing the modes
of conformational variation in the data. This contains all the results
from a PCA analysis. The second entry states_wide
is a
dataframe with the states as the columns and the residue as the rows.
The values are the TRE values that were used to generate the
conformational signature.
You can perform any downstream analysis you would like with the
results. For example, you could plot the PCA results. We will plot the
first two components of the PCA analysis. The labels are not used unless
you pass them to the pca_params
argument even if you supply
a labels
argument.
Note that the results look strange because we have not run the algorithm for enough iterations.
## Loading required package: ggplot2
# doesn't use the labels even though they are passed as an argument
plotUCSA(pca_states = ucsa$pca_states,
states_wide = ucsa$states_wide,
labels = labels)
Until now, we have not used the labels as part of the analysis - only to interpret the results of the unsupervised analysis. Supervised conformational signature analysis is more advanced and uses the labels to generate the conformational signature. The idea is to identify the regions of the protein that are most variable across the states and are also associated with the outcome.
To perform the supervised conformational signature analysis with
discrete outcomes, we will use the supervisedCSA
function.
This function will take the output of the
processDifferential
function. We will generate the labels
for the outcomes as before. We will store the results in a dataframe
with the states as the rownames. For proteins where the outcome is
unknown, we will set the outcome to “Unknown”. For the two scenarios,
the first is ABCA1 induction and the second is lipogencity. We will set
the outcome to “low” or “high” for ABCA1 induction. We will set the
outcome to “Non-Lipogenic” or “Lipogenic” for lipogenicity. Of course,
you may adapt these outcomes to your own analysis. We will then convert
the outcomes to factors with the levels in the order we are interested
in. You can of course load this dataframe from something that was made
in excel and covert to a dataframe.
We will now run the supervised conformational signature analysis with
discrete outcomes. We will set the quantity
to “TRE” as
this is the quantity we use to generate the signature. We will set the
states
to the states we are interested in. We will set the
whichTimepoint
to 600. This is the timepoint used to
generate the conformational signature. We will store the results in the
scsa
object. We will set the whichlabel
to
“ABCA1” as this is the outcome we are interested in. We will also set
whichlabel
to “lipogenic” to generate the conformational
signature for lipogenicity. Since this is the supervised analysis we
need to specify the labels
argument from the dataframe we
generated earlier. We will set the orthoI
to 1. This is the
number of orthogonal components to use in the analysis. We will set this
to 1 for simplicity. We will store the results in the scsa
object.
The analysis used the labels to generate the conformational signature
by using orthogonal partial least squares discrimant analysis (OPLS-DA).
The results return a formal opls
object. This object
contains the results of the analysis.
Note that the results look strange because we have not run the algorith for enough iterations earlier. The results are allowed to contain “Unknown” outcomes.
Note that the algorithm will output a number of warnings - the first is that the number of predictive components is set to 1. This is the desire behaviour. The second is that the variance in some residue elements is below machine precision. This is typical for residues that were not measure in the analysis or where the uptake was 0 (such as Prolines). You do not need to worry about this as the sequence labelling is handled behind the scenes.
scsa <- supervisedCSA(RexDifferentialList = out_lxr_compound_proccessed,
quantity = "TRE",
states = states,
labels = labels,
whichlabel = "ABCA1",
whichTimepoint = 600,
orthoI = 1)
## Warning: OPLS: number of predictive components ('predI' argument) set to 1
## Warning: The variance of the 263 following variables is less than 2.2e-16 in the full or partial (cross-validation) dataset: these variables will be removed:
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 204, 208, 214, 215, 234, 235, 237, 239, 242, 244, 258, 259, 260, 261, 262, 268, 272, 276, 279, 280, 291, 300, 308, 317, 318, 319, 320, 332, 333, 334, 335, 336, 338, 342, 350, 357, 358, 359, 360, 361, 366, 370, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 428, 436, 437
## OPLS-DA
## 14 samples x 184 variables and 1 response
## standard scaling of predictors and response(s)
## 263 excluded variables (near zero variance)
## R2X(cum) R2Y(cum) Q2(cum) RMSEE pre ort pR2Y pQ2
## Total 1 0.398 0.358 0.395 1 1 0.15 0.1
scsa2 <- supervisedCSA(RexDifferentialList = out_lxr_compound_proccessed,
quantity = "TRE",
states = states,
labels = labels,
whichlabel = "lipogenic",
whichTimepoint = 600,
orthoI = 1,
type = "catagorical")
## Warning: OPLS: number of predictive components ('predI' argument) set to 1
## Warning: The variance of the 263 following variables is less than 2.2e-16 in the full or partial (cross-validation) dataset: these variables will be removed:
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 204, 208, 214, 215, 234, 235, 237, 239, 242, 244, 258, 259, 260, 261, 262, 268, 272, 276, 279, 280, 291, 300, 308, 317, 318, 319, 320, 332, 333, 334, 335, 336, 338, 342, 350, 357, 358, 359, 360, 361, 366, 370, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 428, 436, 437
## OPLS-DA
## 11 samples x 184 variables and 1 response
## standard scaling of predictors and response(s)
## 263 excluded variables (near zero variance)
## R2X(cum) R2Y(cum) Q2(cum) RMSEE pre ort pR2Y pQ2
## Total 1 0.12 -0.0506 0.529 1 1 0.95 0.4
We can now look at the results of the analysis. The results are
stored in the opls
object. For each analysis, we can plot
the loading of the conformational signature. This is the regions of the
protein that are most variable across the states and are also associated
with the outcome. We will plot the loadings of the outcomes. The
predictive component is the component that is associated with the
outcome. The orthogonal component is the component that is not
associated with the outcome. We will plot the loadings for ABCA1 and
lipogenicity for both predictive and orthogonal components.
Note again the results look strange because we have not run the algorithm for enough iterations earlier.
plotLoadingSCSA(states.plsda = scsa,
labels = labels,
whichLoading = "predictive",
whichlabel = "ABCA1")
plotLoadingSCSA(states.plsda = scsa,
labels = labels,
whichLoading = "orthogonal",
whichlabel = "ABCA1")
plotLoadingSCSA(states.plsda = scsa2,
labels = labels,
whichLoading = "predictive",
whichlabel = "lipogenic")
plotLoadingSCSA(states.plsda = scsa2,
labels = labels,
whichLoading = "orthogonal",
whichlabel = "lipogenic")
You can also plot the scores of the conformational signature. This is the scores of the samples from the predictive and orthogonal components. We will plot the scores for ABCA1 and lipogenicity for both predictive and orthogonal components.
plotSCSA(states.plsda = scsa2,
labels = labels,
whichlabel = "lipogenic")
We can also perform the supervised conformational signature analysis
with continuous outcomes. This is similar to the discrete outcomes but
the outcome is continuous. To specify the labels we will generate the
labels for the outcomes as before. We will store the results in a
dataframe with the states as the rownames. For proteins where the
outcome is unknown, we will set the outcome to NA
. In this
case, we will use the ED50
as the outcome. We will log the
ED50
values. You can of course load this dataframe from
something that was made in excel and covert to a dataframe. We keep
these values as numeric rather than factors.
labels$ED50 <- NA
labels[, "ED50"] <- log(c(4.11, NA, NA, 0.956, NA, 9.64, 1.49,
5.65, 0.969, 2.10, 11.3, NA, 31.5, 341, 32.2, 17.2, NA))
We will now run the supervised conformational signature analysis with
continuous outcomes. We will set the quantity
to “TRE” as
this is the quantity we use to generate the signature. We will set the
states
to the states we are interested in. We will set the
whichTimepoint
to 600. This is the timepoint used to
generate the conformational signature.
To perform the supervised conformational signature analysis with
continuous outcomes, we will use the supervisedCSA
function. In this case, we will set the type
to
“continuous” so that it does not perform a discrete analysis. We will
set the whichlabel
to “ED50” as this is the outcome we are
interested in.
scsa3 <- supervisedCSA(RexDifferentialList = out_lxr_compound_proccessed,
quantity = "TRE",
states = states,
labels = labels,
whichlabel = "ED50",
whichTimepoint = 600,
orthoI = 1,
type = "continuous")
## Warning: OPLS: number of predictive components ('predI' argument) set to 1
## Warning: The variance of the 263 following variables is less than 2.2e-16 in the full or partial (cross-validation) dataset: these variables will be removed:
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 204, 208, 214, 215, 234, 235, 237, 239, 242, 244, 258, 259, 260, 261, 262, 268, 272, 276, 279, 280, 291, 300, 308, 317, 318, 319, 320, 332, 333, 334, 335, 336, 338, 342, 350, 357, 358, 359, 360, 361, 366, 370, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 428, 436, 437
## OPLS
## 12 samples x 184 variables and 1 response
## standard scaling of predictors and response(s)
## 263 excluded variables (near zero variance)
## R2X(cum) R2Y(cum) Q2(cum) RMSEE pre ort pR2Y pQ2
## Total 0.997 0.363 0.0176 1.52 1 1 0.15 0.3
We can now look at the results of the analysis. The results are
stored in the opls
object. We can plot the loading of the
conformational signature. This is the regions of the protein that are
most variable across the states and are also associated with the
outcome. We will also plot the scores of the conformational signature.
This is the scores of the samples on the predictive and orthogonal
components. We will plot the loadings and scores for the
ED50
outcome.
plotSCSA(states.plsda = scsa3,
labels = labels,
whichlabel = "ED50",
type = "continuous") + labs(col = "log ED50")
plotLoadingSCSA(states.plsda = scsa3,
labels = labels,
whichLoading = "predictive",
whichlabel = "ED50")
plotLoadingSCSA(states.plsda = scsa3,
labels = labels,
whichLoading = "orthogonal",
whichlabel = "ED50")
Until now we havent used the uncertainty in the analysis. We can use
the uncertainty in the analysis to generate the conformational
signature. We sample from the distribution of the parameters to generate
the conformational signature. We will use the
sampleTREuncertainty
function. This function will take the
outputs of the analysis and sample from the distribution of the
parameters to quantify the uncertainty in the conformational signature.
The important argument below is the num_montecarlo
which is
the number of samples to take from the distribution. We will set this to
5000. The whichSamples
argument is the samples to use from
the orginal analysis. We will use the first 50 samples - note that since
the results have been processed the samples are actually the first 50
samples of the processed results.
We will not run the code here but pre-load the results from the package.
TRE_dist <- sampleTREuncertainty(HdxData = LXRalpha_processed,
RexParamsList = rex_lxr,
quantity = "TRE",
states = states,
whichChain = 1,
whichSamples = seq.int(1, 50),
whichTimepoint = 600,
num_montecarlo = 5000)
We will now load the results from the package. The results are stored
in the TRE_dist
object.
data("TRE_dist")
To plot the results we will use the
plotTREuncertaintyLoadings
function. This function will
take the output of the sampleTREuncertainty
function and
plot the results. We also need to pass the pca_states
object from the unsupervised analysis. We will plot the results. This
analysis reports the distribution of the loadings.
gg <- plotTREuncertaintyLoadings(df_all = TRE_dist,
pca_states = ucsa$pca_states,
states = states)
gg
To visualise the results we can use the plotTREuncertainty
function. This function will take the output of the
sampleTREuncertainty
function and plot the results. We also
need to pass the pca_states
object from the unsupervised
analysis. This plots the unsupervised analysis with the uncertainty as
contours in the pca coordinates.
Again the results suffer because of the lack of iterations in the analysis.
plotTREuncertainty(df_all = TRE_dist,
pca_states = ucsa$pca_states,
states = states)
## Warning: Duplicated aesthetics after name standardisation:
## contour
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning: `stat_contour()`: Zero contours were generated
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
You may be interested in plotting the results of the analys on the structure. This is the same set-up as before you simply need to make sure that you align the residue numbers correctly
# Using the predictive component of the supervised analysis as an example
library(NGLVieweR)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:S4Vectors':
##
## first, intersect, rename, setdiff, setequal, union
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
v <- matrix(t(scsa@loadingMN), nrow = 1)
# set colnames carefully to residue number by first remove the x from the rownames
# and then coverting to numeric
colnames(v) <- as.numeric(gsub("x", "", rownames(scsa@loadingMN)))
pdb_filepath <- system.file("extdata", "2acl.pdb", mustWork = TRUE,
package = "RexMS")
mycolor_parameters <- hdx_to_pdb_colours(dataset = v,
pdb = pdb_filepath,
cmap_name = "CSA", scale_limits = c(-0.2,0))
## 2024-06-02 13:30:28.850248 [INFO] Your HDX input dataset has 184 entries
## 2024-06-02 13:30:28.850552 [INFO] And excluding NA data you only have 184 entries
## 2024-06-02 13:30:28.850642 [INFO] However, your input PDB has only 1827 residues in total
## 2024-06-02 13:30:28.851295 [INFO] Your scale limits are -0.2
## 2024-06-02 13:30:28.851295 [INFO] Your scale limits are 0
## 2024-06-02 13:30:28.851391 [INFO] Your values will be coloured using custom format
graphics <- NGLVieweR(pdb_filepath) %>%
stageParameters(backgroundColor = "white", zoomSpeed = 1) %>%
addRepresentation("ribbon") %>% setQuality("high") %>%
addRepresentation("ribbon", param = list(color=mycolor_parameters,
backgroundColor="white"))
graphics
To extract the information from the uncertainty quantification analysis:
# Using the lower quantile as an example
loadingquants <- gg$data
v <- matrix(as.numeric(loadingquants[loadingquants$level == 0.025,
"quantile", drop = FALSE][[1]]), nrow = 1)
colnames(v) <- seq.int(ncol(v))
pdb_filepath <- system.file("extdata", "2acl.pdb", mustWork = TRUE,
package = "RexMS")
mycolor_parameters <- hdx_to_pdb_colours(dataset = v,
pdb = pdb_filepath,
cmap_name = "CSA")
## 2024-06-02 13:30:29.216062 [INFO] Your HDX input dataset has 447 entries
## 2024-06-02 13:30:29.216193 [INFO] And excluding NA data you only have 447 entries
## 2024-06-02 13:30:29.216285 [INFO] However, your input PDB has only 1827 residues in total
## 2024-06-02 13:30:29.216867 [INFO] Your scale limits are -0.16
## 2024-06-02 13:30:29.216867 [INFO] Your scale limits are 0
## 2024-06-02 13:30:29.21696 [INFO] Your values will be coloured using custom format
graphics <- NGLVieweR(pdb_filepath) %>%
stageParameters(backgroundColor = "white", zoomSpeed = 1) %>%
addRepresentation("ribbon") %>% setQuality("high") %>%
addRepresentation("ribbon", param = list(color=mycolor_parameters,
backgroundColor="white"))
graphics
R
session information.
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.3.3 (2024-02-29)
## os Ubuntu 22.04.4 LTS
## system x86_64, linux-gnu
## ui X11
## language en
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz UTC
## date 2024-06-02
## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] RSPM (R 4.3.0)
## bio3d 2.4-4 2022-10-26 [1] RSPM (R 4.3.0)
## Biobase 2.62.0 2023-10-24 [1] Bioconductor
## BiocGenerics * 0.48.1 2023-11-01 [1] Bioconductor
## BiocManager 1.30.22 2023-08-08 [1] RSPM (R 4.3.0)
## BiocParallel 1.36.0 2023-10-24 [1] Bioconductor
## BiocStyle * 2.30.0 2023-10-24 [1] Bioconductor
## bitops 1.0-7 2021-04-24 [1] RSPM (R 4.3.0)
## bookdown 0.39 2024-04-15 [1] RSPM (R 4.3.0)
## bslib 0.7.0 2024-03-29 [2] RSPM (R 4.3.0)
## cachem 1.0.8 2023-05-01 [2] RSPM (R 4.3.0)
## calibrate 1.7.7 2020-06-19 [1] RSPM (R 4.3.0)
## cli 3.6.2 2023-12-11 [2] RSPM (R 4.3.0)
## cluster 2.1.6 2023-12-01 [3] CRAN (R 4.3.3)
## coda 0.19-4.1 2024-01-31 [1] RSPM (R 4.3.0)
## codetools 0.2-20 2024-03-31 [3] RSPM (R 4.3.0)
## colorspace 2.1-0 2023-01-23 [1] RSPM (R 4.3.0)
## comprehenr 0.6.10 2021-01-31 [1] RSPM (R 4.3.0)
## crayon 1.5.2 2022-09-29 [2] RSPM (R 4.3.0)
## DBI 1.2.2 2024-02-16 [1] RSPM (R 4.3.0)
## dbplyr 2.5.0 2024-03-19 [1] RSPM (R 4.3.0)
## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor
## desc 1.4.3 2023-12-10 [2] RSPM (R 4.3.0)
## digest 0.6.35 2024-03-11 [2] RSPM (R 4.3.0)
## dplyr * 1.1.4 2023-11-17 [1] RSPM (R 4.3.0)
## evaluate 0.23 2023-11-01 [2] RSPM (R 4.3.0)
## fansi 1.0.6 2023-12-08 [2] RSPM (R 4.3.0)
## farver 2.1.1 2022-07-06 [1] RSPM (R 4.3.0)
## fastmap 1.1.1 2023-02-24 [2] RSPM (R 4.3.0)
## fs 1.6.3 2023-07-20 [2] RSPM (R 4.3.0)
## generics 0.1.3 2022-07-05 [1] RSPM (R 4.3.0)
## genlasso 1.6.1 2022-08-22 [1] RSPM (R 4.3.0)
## GenomeInfoDb 1.38.8 2024-03-15 [1] Bioconductor 3.18 (R 4.3.2)
## GenomeInfoDbData 1.2.11 2024-04-06 [1] Bioconductor
## GenomicRanges 1.54.1 2023-10-29 [1] Bioconductor
## ggfortify * 0.4.17 2024-04-17 [1] RSPM (R 4.3.0)
## ggplot2 * 3.5.0 2024-02-23 [1] RSPM (R 4.3.0)
## glue 1.7.0 2024-01-09 [2] RSPM (R 4.3.0)
## gridExtra 2.3 2017-09-09 [1] RSPM (R 4.3.0)
## gtable 0.3.5 2024-04-22 [1] RSPM (R 4.3.0)
## highr 0.10 2022-12-22 [2] RSPM (R 4.3.0)
## htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.3.0)
## htmlwidgets 1.6.4 2023-12-06 [2] RSPM (R 4.3.0)
## httpuv 1.6.15 2024-03-26 [2] RSPM (R 4.3.0)
## igraph 2.0.3 2024-03-13 [1] RSPM (R 4.3.0)
## IRanges 2.36.0 2023-10-24 [1] Bioconductor
## isoband 0.2.7 2022-12-20 [1] RSPM (R 4.3.0)
## jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.3.0)
## jsonlite 1.8.8 2023-12-04 [2] RSPM (R 4.3.0)
## knitr 1.46 2024-04-06 [2] RSPM (R 4.3.0)
## labeling 0.4.3 2023-08-29 [1] RSPM (R 4.3.0)
## LaplacesDemon 16.1.6 2021-07-09 [1] RSPM (R 4.3.0)
## later 1.3.2 2023-12-06 [2] RSPM (R 4.3.0)
## lattice 0.22-6 2024-03-20 [3] RSPM (R 4.3.0)
## lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.3.0)
## limma 3.58.1 2023-10-31 [1] Bioconductor
## magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.3.0)
## MASS 7.3-60.0.1 2024-01-13 [3] CRAN (R 4.3.3)
## Matrix 1.6-5 2024-01-11 [3] CRAN (R 4.3.3)
## MatrixGenerics 1.14.0 2023-10-24 [1] Bioconductor
## matrixStats 1.3.0 2024-04-11 [1] RSPM (R 4.3.0)
## memoise 2.0.1 2021-11-26 [2] RSPM (R 4.3.0)
## mgcv 1.9-1 2023-12-21 [3] CRAN (R 4.3.3)
## mime 0.12 2021-09-28 [2] RSPM (R 4.3.0)
## minpack.lm 1.2-4 2023-09-11 [1] RSPM (R 4.3.0)
## MultiAssayExperiment 1.28.0 2023-10-24 [1] Bioconductor
## MultiDataSet 1.30.0 2023-10-24 [1] Bioconductor
## munsell 0.5.1 2024-04-01 [1] RSPM (R 4.3.0)
## NGLVieweR * 1.3.1 2021-06-01 [1] RSPM (R 4.3.0)
## nlme 3.1-164 2023-11-27 [3] CRAN (R 4.3.3)
## permute 0.9-7 2022-01-27 [1] RSPM (R 4.3.0)
## pheatmap 1.0.12 2019-01-04 [1] RSPM (R 4.3.0)
## pillar 1.9.0 2023-03-22 [2] RSPM (R 4.3.0)
## pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.3.0)
## pkgdown 2.0.9 2024-04-18 [2] RSPM (R 4.3.0)
## plyr 1.8.9 2023-10-02 [1] RSPM (R 4.3.0)
## promises 1.3.0 2024-04-05 [2] RSPM (R 4.3.0)
## purrr 1.0.2 2023-08-10 [2] RSPM (R 4.3.0)
## qqman 0.1.9 2023-08-23 [1] RSPM (R 4.3.0)
## R6 2.5.1 2021-08-19 [2] RSPM (R 4.3.0)
## ragg 1.3.0 2024-03-13 [2] RSPM (R 4.3.0)
## RColorBrewer 1.1-3 2022-04-03 [1] RSPM (R 4.3.0)
## Rcpp 1.0.12 2024-01-09 [2] RSPM (R 4.3.0)
## RCurl 1.98-1.14 2024-01-09 [1] RSPM (R 4.3.0)
## RexMS * 0.99.7 2024-06-02 [1] Bioconductor
## rlang 1.1.3 2024-01-10 [2] RSPM (R 4.3.0)
## rlog 0.1.0 2021-02-24 [1] RSPM (R 4.3.0)
## rmarkdown 2.26 2024-03-05 [2] RSPM (R 4.3.0)
## ropls 1.34.0 2023-10-24 [1] Bioconductor
## S4Arrays 1.2.1 2024-03-04 [1] Bioconductor 3.18 (R 4.3.2)
## S4Vectors * 0.40.2 2023-11-23 [1] Bioconductor 3.18 (R 4.3.2)
## sass 0.4.9 2024-03-15 [2] RSPM (R 4.3.0)
## scales 1.3.0 2023-11-28 [1] RSPM (R 4.3.0)
## sessioninfo * 1.2.2 2021-12-06 [2] RSPM (R 4.3.0)
## shiny 1.8.1.1 2024-04-02 [2] RSPM (R 4.3.0)
## SparseArray 1.2.4 2024-02-11 [1] Bioconductor 3.18 (R 4.3.2)
## statmod 1.5.0 2023-01-06 [1] RSPM (R 4.3.0)
## stringi 1.8.3 2023-12-11 [2] RSPM (R 4.3.0)
## stringr 1.5.1 2023-11-14 [2] RSPM (R 4.3.0)
## SummarizedExperiment 1.32.0 2023-10-24 [1] Bioconductor
## systemfonts 1.0.6 2024-03-07 [2] RSPM (R 4.3.0)
## textshaping 0.3.7 2023-10-09 [2] RSPM (R 4.3.0)
## tibble 3.2.1 2023-03-20 [2] RSPM (R 4.3.0)
## tidyr 1.3.1 2024-01-24 [1] RSPM (R 4.3.0)
## tidyselect 1.2.1 2024-03-11 [1] RSPM (R 4.3.0)
## utf8 1.2.4 2023-10-22 [2] RSPM (R 4.3.0)
## vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.3.0)
## vegan 2.6-4 2022-10-11 [1] RSPM (R 4.3.0)
## viridisLite 0.4.2 2023-05-02 [1] RSPM (R 4.3.0)
## withr 3.0.0 2024-01-16 [2] RSPM (R 4.3.0)
## xfun 0.43 2024-03-25 [2] RSPM (R 4.3.0)
## xtable 1.8-4 2019-04-21 [2] RSPM (R 4.3.0)
## XVector 0.42.0 2023-10-24 [1] Bioconductor
## yaml 2.3.8 2023-12-11 [2] RSPM (R 4.3.0)
## zlibbioc 1.48.2 2024-03-13 [1] Bioconductor 3.18 (R 4.3.2)
##
## [1] /__w/_temp/Library
## [2] /usr/local/lib/R/site-library
## [3] /usr/local/lib/R/library
##
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────