@@ -28,99 +28,14 @@ def diffuse(
2828) -> Matrix :
2929 """Run diffusion on a network given an input and a diffusion method.
3030
31- Diffusion methods procedures provided in this package differ on:
32- (a) How to distinguish positives, negatives and unlabelled examples.
33- (b) Their statistical normalisation.
34-
35- Input scores can be specified in three formats: a single set of scores to smooth can be represented as.
36- (1) A named numeric vector, whereas if several of these vectors that share the node names need to be smoothed,
37- they can be provided as
38- (2) A column-wise matrix. However, if the unlabelled entities are not the same from one case to another,
39- (3) A named list of such score matrices can be passed to this function. The path format will be kept in the
40- output.
41-
42- If the path labels are not quantitative, i.e. positive(1), negative(0) and possibly unlabelled, all the scores
43- raw, gm, ml, z, mc, ber_s, ber_p can be used.
44-
45- Methods [method attribute to choose one node]:
46-
47- - Methods without statistical normalisation:
48-
49- {raw}: positive nodes introduce unitary flow {y_raw[i] = 1} to
50- the network, whereas either negative and unlabelled
51- nodes introduce null diffusion {y_raw[j] = 0}.
52- [Vandin, 2011]. They are computed as:
53-
54- f_{raw} = k · y_{raw}
55-
56- where k is a graph kernel, see kernels.py.
57- These scores treat negative and unlabelled nodes equivalently.
58-
59- {ml}: same as raw, but negative nodes introduce a negative unit of flow.
60- therefore not equivalent to unlabelled nodes.
61- [Zoidi, 2015]
62-
63- {gm}: same as ml, but the unlabelled nodes are assigned
64- a (generally non-null) bias term based on the total
65- number of positives, negatives and unlabelled nodes
66- [Mostafavi, 2008]
67-
68- {ber_s}: a quantification of the relative change in the node score before
69- and after the network smoothing. The score for a particular node i can be written as
70-
71- f_{ber_s}[i] = f_{raw}[i] / (y_{raw}[i] + eps)
72-
73- where eps is a parameter controlling the importance of the relative change.
74-
75- - Methods with statistical normalisation:
76-
77- {z}: a parametric alternative to the raw score of node is subtracted its mean
78- value and divided by its standard deviation. Differential trait of this package.
79- The statistical moments have a closed analytical form,
80- see the main vignette, and are inspired in [Harchaoui, 2013].
81-
82- {mc}: the score of node code {i} is based on its empirical p-value, computed by permuting
83- the path {n.perm} times.
84- It is roughly the proportion of path permutations that led to a diffusion score as
85- high or higher than the original diffusion score.
86-
87- {ber_p}: as used in [Bersanelli, 2016], this score combines raw and mc, in order to take into
88- account both the magnitude of the {raw} scores and the effect of the network topology :
89- this is a quantification of the relative change in the node score before and after the
90- network smoothing.
91-
92- Methods summary table:
93- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
94- | Scores | y+ | y- | yn | Normalized | Stochastic | Quantitative | Reference |
95- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
96- __Unormalized__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
97- | raw | 1 | 0 | 0 | No | No | Yes | Vandin (2010) |
98- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
99- | ml | 1 | -1 | 0 | No | No | No | Tsuda (2010) |
100- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
101- | gm | 1 | -1 | k | No | No | No | Mostafavi (2008) |
102- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
103- | ber_s | 1 | 0 | 0 | No | No | Yes | Bersanelli (2016)|
104- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
105- __Normalized __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
106- | ber_p | 1 | 0 | 0* | Yes | Yes | Yes | Bersanelli (2016)|
107- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
108- | mc | 1 | 0 | 0* | Yes | Yes | Yes | Bersanelli (2016)|
109- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
110- | z | 1 | 0 | 0* | Yes | No | Yes | Harchaoui (2013) |
111- __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
112-
113- :param input_scores: score collection, supplied as n-dimensional array.
114- Could be 1-dimensional (List) or n-dimensional (Matrix).
115- :param method: Elected method, among the previously described in the table.
116- Possible values ["raw", "ml", "gm", "ber_s", "ber_p", "mc", "z"]
31+ :param input_scores: score collection, supplied as n-dimensional array. Could be 1-dimensional (List) or n-dimensional (Matrix).
32+ :param method: Elected method ["raw", "ml", "gm", "ber_s", "ber_p", "mc", "z"]
11733 :param graph: A network as a graph. It could be optional if a Kernel is provided
11834 :param kwargs: Optional arguments:
11935 - k: a kernel [matrix] steaming from a graph, thus sparing the graph transformation process
12036 - Other arguments which would differ depending on the chosen method
12137 :return: The diffused scores within the matrix transformation of the network, with the diffusion operation
12238 [k x input_vector] performed
123-
12439 """
12540
12641 # Sanity checks
0 commit comments