Infers the underlying (sources by features by observations) 3D tensor from the observed (features by observations) 2D mixture, under the assumption of the Unico model that each observation is a mixture of unique source-specific values (in each feature in the data). In the context of bulk genomics containing a mixture of cell types (i.e. the input could be CpG sites by individuals for DNA methylation and genes by individuals for RNA expression), tensor
allows to estimate the cell-type-specific levels for each individual in each CpG site/gene (i.e. a tensor of CpG sites/genes by individuals by cell types).
tensor(
X,
W,
C1,
C2,
Unico.mdl,
parallel = TRUE,
num_cores = NULL,
log_file = "Unico.log",
verbose = FALSE,
debug = FALSE
)
An m
by n
matrix of measurements of m
features for n
observations. Each column in X
is assumed to be a mixture of k
sources. Note that X
must include row names and column names and that NA values are currently not supported. X
should not include features that are constant across all observations. Note that X
could potentially be different from the X
used to learn Unico.mdl
(i.e. the original observed 2D mixture used to fit the model).
An n
by k
matrix of weights - the weights of k
sources for each of the n
mixtures (observations). All the weights must be positive and each row - corresponding to the weights of a single observation - must sum up to 1. Note that W
must include row names and column names and that NA values are currently not supported.
An n
by p1
design matrix of covariates that may affect the hidden source-specific values (possibly a different effect size in each source). Note that C1
must include row names and column names and should not include an intercept term. NA values are currently not supported. Note that all covariates in C1
must be present and match the order of the set of covariates in C1
stored in Unico.mdl
(i.e. the original set of source-specific covariates available when initially fitting the model).
An n
by p2
design matrix of covariates that may affect the mixture (i.e. rather than directly the sources of the mixture; for example, variables that capture biases in the collection of the measurements). Note that C2
must include row names and column names and should not include an intercept term. NA values are currently not supported. Note that all covariates in C2
must be present and match the order of the set of covariates in C2
stored in Unico.mdl
(i.e. the original set of not source-specific covariates available when initially fitting the model).
The entire set of model parameters estimated by Unico on the 2D mixture matrix (i.e. the list returned by applying function Unico
to X
).
A logical value indicating whether to use parallel computing (possible when using a multi-core machine).
A numeric value indicating the number of cores to use (activated only if parallel == TRUE
). If num_cores == NULL
then all available cores except for one will be used.
A path to an output log file. Note that if the file log_file
already exists then logs will be appended to the end of the file. Set log_file
to NULL
to prevent output from being saved into a file; note that if verbose == FALSE
then no output file will be generated regardless of the value of log_file
.
A logical value indicating whether to print logs.
A logical value indicating whether to set the logger to a more detailed debug level; set debug
to TRUE
before reporting issues.
A k
by m
by n
array with the estimated source-specific values. The first axis/dimension in the array corresponds to the different sources.
After obtaining all the estimated parameters in the Unico model (by calling Unico), tensor
uses the conditional distribution \(Z_{jh}^i|X_{ij}=x_{ij}\) for estimating the \(k\) source-specific levels of each sample \(i\) at each feature \(j\).
data = simulate_data(n=100, m=2, k=3, p1=1, p2=1, taus_std=0, log_file=NULL)
res = list()
res$params.hat = Unico(data$X, data$W, data$C1, data$C2, parallel=FALSE, log_file=NULL)
res$Z = tensor(data$X, data$W, data$C1, data$C2, res$params.hat, parallel=FALSE, log_file=NULL)