Skip to content

Commit d677230

Browse files
sarahbeeniesarah.mubeen
andauthored
Datasets (#18)
* update example datasets * update readme * update docs * update example scripts * update constants * refactor * try diffuse * cleaning * remove unused nodetype * cleaning * cleaning * cleaning * flake8 fixes * more flake8 fixes * try docs fix Co-authored-by: sarah.mubeen <sarah.mubeen@scai.fraunhofer.de>
1 parent bda68de commit d677230

21 files changed

Lines changed: 157 additions & 96 deletions

docs/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@
5656
# Example configuration for intersphinx: refer to the Python standard library.
5757
intersphinx_mapping = {
5858
'python': ('https://docs.python.org/3', None),
59-
'networkx': ('https://networkx.github.io/', None),
60-
'sqlalchemy': ('https://docs.sqlalchemy.org/en/latest', None),
59+
'networkx': ('https://networkx.github.io/documentation/stable', None),
60+
'sqlalchemy': ('https://docs.sqlalchemy.org/en/13/', None),
6161
'pybel': ('https://pybel.readthedocs.io/en/latest/', None),
6262
}
6363

docs/source/diffusion.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Methods without statistical normalisation
2525
a graph kernel, see :doc:`kernels <kernels>`. These scores treat negative and unlabelled nodes equivalently.
2626

2727
- **ml**: Same as raw, but negative nodes introduce a negative unit of flow. Therefore not equivalent to unlabelled
28-
nodes. [2]_
28+
nodes [2]_.
2929

3030
- **gl**: Same as ml, but the unlabelled nodes are assigned a (generally non-null) bias term based on the total number
3131
of positives, negatives and unlabelled nodes [3]_.

docs/source/intro.rst

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ First Steps
22
===========
33
The first step before running diffusion algorithms on your network using DiffuPy is to learn about the graph and data
44
formats are supported. Next, you can find samples of input datasets and networks to run diffusion methods over.
5+
56
Input Data
67
----------
78

@@ -10,9 +11,8 @@ You can submit your dataset in any of the following formats:
1011
- CSV (.csv)
1112
- TSV (.tsv)
1213

13-
Please ensure that the dataset has a column 'Node' containing node IDs. If you only provide the node IDs, you can
14-
also include a column in your dataset 'NodeType' indicating the entity type for each node. You can also optionally add
15-
the following columns to your dataset:
14+
Please ensure that the dataset minimally has a column 'Node' containing node IDs. You can also optionally add the
15+
following columns to your dataset:
1616

1717
- LogFC [*]_
1818
- p-value
@@ -28,20 +28,19 @@ DiffuPath accepts several input formats which can be codified in different ways.
2828
`diffusion scores <https://github.com/multipaths/DiffuPy/blob/master/docs/source/diffusion.rst>`_ summary for more
2929
details.
3030

31-
1. You can provide a dataset with a column 'Node' containing node IDs along with a column 'NodeType' indicating the
32-
entity type.
33-
34-
+------------+--------------+
35-
| Node | NodeType |
36-
+============+==============+
37-
| A | Gene |
38-
+------------+--------------+
39-
| B | Gene |
40-
+------------+--------------+
41-
| C | Metabolite |
42-
+------------+--------------+
43-
| D | Gene |
44-
+------------+--------------+
31+
1. You can provide a dataset with a column 'Node' containing node IDs.
32+
33+
+------------+
34+
| Node |
35+
+============+
36+
| A |
37+
+------------+
38+
| B |
39+
+------------+
40+
| C |
41+
+------------+
42+
| D |
43+
+------------+
4544

4645
2. You can also choose to provide a dataset with a column 'Node' containing node IDs as well as a column 'logFC' with
4746
their abs(LogFC).

examples/README.rst

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,8 @@ You can submit your dataset in any of the following formats:
88
- CSV (.csv)
99
- TSV (.tsv)
1010

11-
Please ensure that the dataset has a column 'Node' containing node IDs. If you only provide the node IDs, you can
12-
also include a column in your dataset 'NodeType' indicating the entity type for each node. You can also optionally add
13-
the following columns to your dataset:
11+
Please ensure that the dataset minimally has a column 'Node' containing node IDs. You can also optionally add the
12+
following columns to your dataset:
1413

1514
- LogFC [*]_
1615
- p-value
@@ -26,20 +25,19 @@ DiffuPath accepts several input formats which can be codified in different ways.
2625
`diffusion scores <https://github.com/multipaths/DiffuPy/blob/master/docs/source/diffusion.rst>`_ summary for more
2726
details.
2827

29-
1. You can provide a dataset with a column 'Node' containing node IDs along with a column 'NodeType' indicating the
30-
entity type.
31-
32-
+------------+--------------+
33-
| Node | NodeType |
34-
+============+==============+
35-
| A | Gene |
36-
+------------+--------------+
37-
| B | Gene |
38-
+------------+--------------+
39-
| C | Metabolite |
40-
+------------+--------------+
41-
| D | Gene |
42-
+------------+--------------+
28+
1. You can provide a dataset with a column 'Node' containing node IDs.
29+
30+
+------------+
31+
| Node |
32+
+============+
33+
| A |
34+
+------------+
35+
| B |
36+
+------------+
37+
| C |
38+
+------------+
39+
| D |
40+
+------------+
4341

4442
2. You can also choose to provide a dataset with a column 'Node' containing node IDs as well as a column 'logFC' with
4543
their | logFC |.

examples/datasets/node.csv

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Node
2+
A
3+
B
4+
C
5+
D
6+
E

examples/datasets/node_logfc.csv

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Node,LogFC
2+
A,0.7
3+
B,1.2
4+
C,-0.2
5+
D,-0.4
6+
E,-2.2
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Node,LogFC,p-value
2+
A,0.7,0.2
3+
B,1.2,0.01
4+
C,-0.2,0.01
5+
D,-0.4,0.3
6+
E,-2.2,0.005
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Node
2+
A
3+
B
4+
C
5+
D
6+
E
7+
F
8+
G

examples/datasets/sample_dataset_with_ids.csv

Lines changed: 0 additions & 8 deletions
This file was deleted.
File renamed without changes.

0 commit comments

Comments
 (0)