Skip to content

longForm,SummarizedExperiment #85

@lgatto

Description

@lgatto

Coming back to the longForm discussed here, MEA now defines the following methods:

> suppressPackageStartupMessages(library(MultiAssayExperiment))
Warning message:
replacing previous import ‘S4Arrays::read_block’ by ‘DelayedArray::read_block’ when loading ‘SummarizedExperiment’
> showMethods("longForm")
Function: longForm (package BiocGenerics)
object="ANY"
object="ExperimentList"
object="MultiAssayExperiment"

ANY implicitly defines additional methods ...

> getMethod("longForm", "ANY")
Method Definition:

function (object, ...)
{
    .local <- function (object, colDataCols, i = 1L, ...)
    {
        rowNAMES <- rownames(object)
        if (is.null(rowNAMES))
            rowNames <- as.character(seq_len(nrow(object)))
        if (is(object, "ExpressionSet"))
            object <- Biobase::exprs(object)
        if (is(object, "SummarizedExperiment") || is(object,
            "RaggedExperiment"))
            object <- assay(object, i = i)
        BiocBaseUtils::checkInstalled("reshape2")
        res <- reshape2::melt(object, varnames = c("rowname",
            "colname"), value.name = "value")
        if (!is.character(res[["rowname"]]))
            res[["rowname"]] <- as.character(res[["rowname"]])
        res
    }
    .local(object, ...)
}
<bytecode: 0x59175b2aefc8>
<environment: namespace:MultiAssayExperiment>

Signatures:
        object
target  "ANY"
defined "ANY"

... including for a SummarizedExperiment

> nrows <- 5; ncols <- 2
> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
> colData <- DataFrame(Treatment=c("ChIP", "Input"), row.names=LETTERS[1:2])
> se0 <- SummarizedExperiment(assays=SimpleList(counts=counts), colData=colData)
> longForm(se0)
   rowname colname    value
1        1       A 1888.095
2        2       A 3194.072
3        3       A 7372.889
4        4       A 2488.492
5        5       A 7293.829
6        1       B 5895.799
7        2       B 9025.518
8        3       B 1884.100
9        4       B 3057.519
10       5       B 5762.292
> showMethods("longForm")
Function: longForm (package BiocGenerics)
object="ANY"
object="ExperimentList"
object="MultiAssayExperiment"
object="SummarizedExperiment"
    (inherited from: object="ANY")

Suggestion 1

I would suggest to implement longForm,SummarizedExperiment in the SummarizedExperiment package.

Suggestion 2

I would also suggest to allow to return all assays as a long table, ideally by default.

Current behaviour:

> assay(se0, "counts2") <- assay(se0) * 10
> longForm(se0, i = 1)
   rowname colname    value
1        1       A 1888.095
2        2       A 3194.072
3        3       A 7372.889
4        4       A 2488.492
5        5       A 7293.829
6        1       B 5895.799
7        2       B 9025.518
8        3       B 1884.100
9        4       B 3057.519
10       5       B 5762.292
> longForm(se0, i = 2)
   rowname colname    value
1        1       A 18880.95
2        2       A 31940.72
3        3       A 73728.89
4        4       A 24884.92
5        5       A 72938.29
6        1       B 58957.99
7        2       B 90255.18
8        3       B 18841.00
9        4       B 30575.19
10       5       B 57622.92

I would find it useful to have

> longForm(se0)
DataFrame with 20 rows and 4 columns
      rowname  colname     value   assayName
    <integer> <factor> <numeric> <character>
1           1        A   1888.10      counts
2           2        A   3194.07      counts
3           3        A   7372.89      counts
4           4        A   2488.49      counts
5           5        A   7293.83      counts
...       ...      ...       ...         ...
16          1        B   58958.0     counts2
17          2        B   90255.2     counts2
18          3        B   18841.0     counts2
19          4        B   30575.2     counts2
20          5        B   57622.9     counts2

Suggestion 3

I also think these long tables should incorporate colData and rowData columns.

Here's an example for a colData variable:

> longFormSE(se0, colvars = "Treatment")
DataFrame with 20 rows and 5 columns
      rowname  colname     value   assayName   Treatment
    <integer> <factor> <numeric> <character> <character>
1           1        A   1888.10      counts        ChIP
2           2        A   3194.07      counts        ChIP
3           3        A   7372.89      counts        ChIP
4           4        A   2488.49      counts        ChIP
5           5        A   7293.83      counts        ChIP
...       ...      ...       ...         ...         ...
16          1        B   58958.0     counts2       Input
17          2        B   90255.2     counts2       Input
18          3        B   18841.0     counts2       Input
19          4        B   30575.2     counts2       Input
20          5        B   57622.9     counts2       Input

A rowData variables:

> rowData(se0)$X <- letters[1:5]
> longFormSE(se0, rowvars = "X")
DataFrame with 20 rows and 5 columns
      rowname  colname     value   assayName           X
    <integer> <factor> <numeric> <character> <character>
1           1        A 9418.8870      counts           a
2           2        A 6657.9743      counts           b
3           3        A 1240.3003      counts           c
4           4        A 1278.6833      counts           d
5           5        A   27.7678      counts           e
...       ...      ...       ...         ...         ...
16          1        B   10652.9     counts2           a
17          2        B   34444.3     counts2           b
18          3        B   48373.1     counts2           c
19          4        B   21214.1     counts2           d
20          5        B   85826.3     counts2           e

and both, of course

> longFormSE(se0, colvars = "Treatment", rowvars = "X")
DataFrame with 20 rows and 6 columns
      rowname  colname     value   assayName   Treatment           X
    <integer> <factor> <numeric> <character> <character> <character>
1           1        A 9418.8870      counts        ChIP           a
2           2        A 6657.9743      counts        ChIP           b
3           3        A 1240.3003      counts        ChIP           c
4           4        A 1278.6833      counts        ChIP           d
5           5        A   27.7678      counts        ChIP           e
...       ...      ...       ...         ...         ...         ...
16          1        B   10652.9     counts2       Input           a
17          2        B   34444.3     counts2       Input           b
18          3        B   48373.1     counts2       Input           c
19          4        B   21214.1     counts2       Input           d
20          5        B   85826.3     counts2       Input           e

I would be happy to provide an initial implementation and unit test.

What do you think @hpages @LiNk-NY ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions