Skip to content

Commit d78e99e

Browse files
authored
Merge branch 'master' into diff_input_processing_recoding
2 parents 8fec72d + 3d42242 commit d78e99e

11 files changed

Lines changed: 127 additions & 59 deletions

File tree

docs/source/conf.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,6 @@
5757
intersphinx_mapping = {
5858
'python': ('https://docs.python.org/3', None),
5959
'networkx': ('https://networkx.github.io/documentation/stable', None),
60-
'sqlalchemy': ('https://docs.sqlalchemy.org/en/13/', None),
61-
'pybel': ('https://pybel.readthedocs.io/en/latest/', None),
6260
}
6361

6462
autodoc_member_order = 'bysource'

docs/source/intro.rst

Lines changed: 34 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ You can submit your dataset in any of the following formats:
1414
Please ensure that the dataset minimally has a column 'Node' containing node IDs. You can also optionally add the
1515
following columns to your dataset:
1616

17+
- NodeType
1718
- LogFC [*]_
1819
- p-value
1920

@@ -42,39 +43,53 @@ details.
4243
| D |
4344
+------------+
4445

45-
2. You can also choose to provide a dataset with a column 'Node' containing node IDs as well as a column 'logFC' with
46-
their abs(LogFC).
46+
2. You can also provide a dataset with a column 'Node' containing node IDs as well as a column 'NodeType', indicating
47+
the entity type of the node to run diffusion by entity type.
48+
49+
+------------+--------------+
50+
| Node | NodeType |
51+
+============+==============+
52+
| A | Gene |
53+
+------------+--------------+
54+
| B | Gene |
55+
+------------+--------------+
56+
| C | Metabolite |
57+
+------------+--------------+
58+
| D | Gene |
59+
+------------+--------------+
60+
61+
3. You can also choose to provide a dataset with a column 'Node' containing node IDs as well as a column 'logFC' with
62+
their logFC. You may also add a 'NodeType' column to run diffusion by entity type.
4763

4864
+--------------+------------+
4965
| Node | LogFC |
5066
+==============+============+
51-
| Gene A | 4 |
67+
| A | 4 |
5268
+--------------+------------+
53-
| Gene B | -1 |
69+
| B | -1 |
5470
+--------------+------------+
55-
| Metabolite C | 1.5 |
71+
| C | 1.5 |
5672
+--------------+------------+
57-
| Gene D | 3 |
73+
| D | 3 |
5874
+--------------+------------+
5975

60-
3. Finally, you can provide a dataset with a column 'Node' containing node IDs, a column 'logFC' with their abs(LogFC)
61-
and a column 'p-value' with adjusted p-values.
76+
4. Finally, you can provide a dataset with a column 'Node' containing node IDs, a column 'logFC' with their logFC and a
77+
column 'p-value' with adjusted p-values. You may also add a 'NodeType' column to run diffusion by entity type.
6278

6379
+--------------+------------+---------+
6480
| Node | LogFC | p-value |
6581
+==============+============+=========+
66-
| Gene A | 4 | 0.03 |
82+
| A | 4 | 0.03 |
6783
+--------------+------------+---------+
68-
| Gene B | -1 | 0.05 |
84+
| B | -1 | 0.05 |
6985
+--------------+------------+---------+
70-
| Metabolite C | 1.5 | 0.001 |
86+
| C | 1.5 | 0.001 |
7187
+--------------+------------+---------+
72-
| Gene D | 3 | 0.07 |
88+
| D | 3 | 0.07 |
7389
+--------------+------------+---------+
7490

75-
You can also take a look at our `sample datasets <https://github.com/multipaths/DiffuPy/tree/master/examples/datasets>`_
76-
folder for some examples files.
77-
91+
See the `sample datasets <https://github.com/multipaths/DiffuPy/tree/master/examples/datasets>`_ directory for example
92+
files.
7893

7994
Networks
8095
--------
@@ -119,13 +134,13 @@ Custom-network example
119134
~~~~~~~~~~~~~~~~~~~~~~
120135

121136
+-----------+--------------+-------------+
122-
| Source | Target | Relation |
137+
| Source | Target | Relation |
123138
+===========+==============+=============+
124-
| Gene A | Gene B | Increase |
139+
| A | B | Increase |
125140
+-----------+--------------+-------------+
126-
| Gene B | Metabolite C | Association |
141+
| B | C | Association |
127142
+-----------+--------------+-------------+
128-
| Gene A | Pathology D | Association |
143+
| A | D | Association |
129144
+-----------+--------------+-------------+
130145

131146
You can also take a look at our `sample networks <https://github.com/multipaths/DiffuPy/tree/master/examples/networks>`_

examples/README.rst

Lines changed: 31 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ You can submit your dataset in any of the following formats:
1111
Please ensure that the dataset minimally has a column 'Node' containing node IDs. You can also optionally add the
1212
following columns to your dataset:
1313

14+
- NodeType
1415
- LogFC [*]_
1516
- p-value
1617

@@ -39,36 +40,49 @@ details.
3940
| D |
4041
+------------+
4142

42-
2. You can also choose to provide a dataset with a column 'Node' containing node IDs as well as a column 'logFC' with
43-
their | logFC |.
43+
2. You can also provide a dataset with a column 'Node' containing node IDs as well as a column 'NodeType', indicating
44+
the entity type of the node to run diffusion by entity type.
45+
46+
+------------+--------------+
47+
| Node | NodeType |
48+
+============+==============+
49+
| A | Gene |
50+
+------------+--------------+
51+
| B | Gene |
52+
+------------+--------------+
53+
| C | Metabolite |
54+
+------------+--------------+
55+
| D | Gene |
56+
+------------+--------------+
57+
58+
3. You can also choose to provide a dataset with a column 'Node' containing node IDs as well as a column 'logFC' with
59+
their logFC. You may also add a 'NodeType' column to run diffusion by entity type.
4460

4561
+--------------+------------+
4662
| Node | LogFC |
4763
+==============+============+
48-
| Gene A | 4 |
64+
| A | 4 |
4965
+--------------+------------+
50-
| Gene B | -1 |
66+
| B | -1 |
5167
+--------------+------------+
52-
| Metabolite C | 1.5 |
68+
| C | 1.5 |
5369
+--------------+------------+
54-
| Gene D | 3 |
70+
| D | 3 |
5571
+--------------+------------+
5672

57-
.. | logFC | replace:: Log\ :sub:`2`\ FC
58-
59-
3. Finally, you can provide a dataset with a column 'Node' containing node IDs, a column 'logFC' with their | logFC | and
60-
a column 'p-value' with adjusted p-values.
73+
4. Finally, you can provide a dataset with a column 'Node' containing node IDs, a column 'logFC' with their logFC and a
74+
column 'p-value' with adjusted p-values. You may also add a 'NodeType' column to run diffusion by entity type.
6175

6276
+--------------+------------+---------+
6377
| Node | LogFC | p-value |
6478
+==============+============+=========+
65-
| Gene A | 4 | 0.03 |
79+
| A | 4 | 0.03 |
6680
+--------------+------------+---------+
67-
| Gene B | -1 | 0.05 |
81+
| B | -1 | 0.05 |
6882
+--------------+------------+---------+
69-
| Metabolite C | 1.5 | 0.001 |
83+
| C | 1.5 | 0.001 |
7084
+--------------+------------+---------+
71-
| Gene D | 3 | 0.07 |
85+
| D | 3 | 0.07 |
7286
+--------------+------------+---------+
7387

7488
See the `sample datasets <https://github.com/multipaths/DiffuPy/tree/master/examples/datasets>`_ directory for example
@@ -118,11 +132,11 @@ Custom-network example
118132
+-----------+--------------+-------------+
119133
| Source | Target | Relation |
120134
+===========+==============+=============+
121-
| Gene A | Gene B | Increase |
135+
| A | B | Increase |
122136
+-----------+--------------+-------------+
123-
| Gene B | Metabolite C | Association |
137+
| B | C | Association |
124138
+-----------+--------------+-------------+
125-
| Gene A | Pathology D | Association |
139+
| A | D | Association |
126140
+-----------+--------------+-------------+
127141

128142
See the `sample networks <https://github.com/multipaths/DiffuPy/tree/master/examples/networks>`_ directory for some

examples/datasets/node_type.csv

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Node,NodeType
2+
A,Gene
3+
B,Gene
4+
C,Metabolite
5+
D,Gene
6+
E,Metabolite
7+
F,Gene
8+
G,Pathology
Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
Source,Target,Relation
2-
Gene A,Gene B,Increase
3-
Gene B,Metabolite C,Association
4-
Gene A,Pathology D,Association
2+
A, B, Increase
3+
B, C, Association
4+
A, D, Association
5+

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ version = 0.0.5-dev
77
description = Compute diffusion scores over networks
88
long_description = file: README.rst
99

10-
# URLs associated with PyBEL
10+
# URLs associated with DiffuPy
1111
url = https://github.com/multipaths/DiffuPy
1212
download_url = https://github.com/multipaths/DiffuPy
1313
project_urls =

src/diffupy/cli.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
import time
1010

1111
import click
12-
from diffupath.constants import OUTPUT_DIR
13-
from diffupy.process_network import get_kernel_from_network_path
12+
13+
from .process_network import get_kernel_from_network_path
1414

1515
from .constants import OUTPUT, METHODS, EMOJI, RAW, CSV, JSON
1616
from .diffuse import diffuse as run_diffusion
@@ -93,7 +93,7 @@ def kernel(
9393
'-o', '--output',
9494
type=click.File('w'),
9595
help="Output file",
96-
default=OUTPUT_DIR,
96+
default=OUTPUT,
9797
)
9898
@click.option(
9999
'-m', '--method',
@@ -141,7 +141,7 @@ def kernel(
141141
def diffuse(
142142
input: str,
143143
network: str,
144-
output: str = OUTPUT_DIR,
144+
output: str = OUTPUT,
145145
method: str = RAW,
146146
binarize: bool = False,
147147
threshold: float = None,

src/diffupy/process_input.py

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,32 @@ def _codify_input_data(
266266
threshold
267267
)
268268

269+
# Standardize the title of the node column labeling column to 'Label', for later processing.
270+
if LABEL not in df.columns:
271+
for l in list(df.columns):
272+
if l in NODE_LABELING:
273+
df = df.rename(columns={l: LABEL})
274+
break
275+
276+
# If node type provided in a column, classify in a dictionary the input codification by its node type.
277+
if NODE_TYPE in df.columns:
278+
279+
node_types = list(set(df[NODE_TYPE])) # Get the node types list set.
280+
codified_by_type_dict = {}
281+
282+
for node_type in node_types:
283+
# Filter the nodes by the iterable type.
284+
df_by_type = df.loc[df[NODE_TYPE] == node_type]
285+
286+
# Codify the nodes for the iterable type.
287+
codified_by_type_dict[node_type] = _codify_method_check(df_by_type,
288+
method,
289+
binning,
290+
absolute_value,
291+
p_value,
292+
threshold
293+
)
294+
return codified_by_type_dict
269295

270296
def _codify_method_check(
271297
df: pd.DataFrame,
@@ -587,6 +613,7 @@ def mapping_statistics(
587613

588614
total_mapping.update(mapping)
589615

616+
590617
if subtotals:
591618
statistics_dict['total_mapping'] = total_mapping
592619
statistics_dict['total_input'] = total_input
@@ -755,7 +782,7 @@ def _map_label_dict(
755782
label_bck = _check_label_to_background_labels(label, background_labels, check_substrings)
756783
if label_bck is not None:
757784
mapped_dict[label_bck] = v
758-
785+
759786
elif isinstance(label, set) or isinstance(label, tuple) or isinstance(label, list):
760787
for sublabel in set(label):
761788
label_bck = _check_label_to_background_labels(sublabel, background_labels, check_substrings)

src/diffupy/utils.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
"""Matrix/graph handling utils."""
2525

2626

27+
2728
def get_laplacian(graph: Graph, normalized: bool = False) -> np.ndarray:
2829
"""Return Laplacian matrix."""
2930
if nx.is_directed(graph):
@@ -137,18 +138,16 @@ def print_dict_dimensions(entities_db, title='Title', message=''):
137138
"""Print dimension of the dictionary."""
138139
total = 0
139140
m = f'{title}\n'
140-
141141
for k1, v1 in entities_db.items():
142142
m += f'\n{message}{k1}:\n'
143143
if isinstance(v1, dict):
144144
for k2, v2 in v1.items():
145145
m += f'{k2} ({v2})\n'
146146
else:
147147
m += f'{v1}'
148-
148+
149149
print(f'{m}\n\n')
150150

151-
152151
def log_dict(dict_to_print: dict, message: str = ''):
153152
"""Print dictionary as list with a message."""
154153
for k1, v1 in dict_to_print.items():
@@ -301,6 +300,12 @@ def munge_cell(cell):
301300
else:
302301
raise TypeError(f'The cell "{cell}" could not be processed.')
303302

303+
def parse_xls_sheet_to_df(sheet: opxl.workbook,
304+
min_row: Optional[int] = 1,
305+
relevant_cols: Optional[list] = None,
306+
irrelevant_cols: Optional[list] = None) -> pd.DataFrame:
307+
"""Process/format excel sheets to DataFrame."""
308+
parsed_sheet_dict = {}
304309

305310
def parse_xls_sheet_to_df(sheet: opxl.workbook,
306311
min_row: Optional[int] = 1,
Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
NodeType,Node
2-
Gene,A
3-
Gene,B
4-
Metabolite,C
5-
Gene,D
6-
Gene,E
1+
Node,NodeType
2+
A,Gene
3+
B,Gene
4+
C,Metabolite
5+
D,Gene
6+
E,Metabolite
7+
F,Gene
8+
G,Pathology

0 commit comments

Comments
 (0)