Skip to content

Commit e0c3b07

Browse files
Refactored Code for SEMB
0 parents  commit e0c3b07

801 files changed

Lines changed: 1345452 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.vscode/settings.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"python.pythonPath": "/Users/jackiezhang/miniconda2/envs/gemslab/bin/python"
3+
}

README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# The Structural EMBedding library (SEMB)
2+
3+
**Authors: GEMS Lab Team @ University of Michigan**
4+
5+
This SEMB library allows fast onboarding to explore structural embedding of graph data using hetereogenous methods, with a unified API interface and a modular codebase enabling easy intergration of 3rd party methods and datasets.
6+
7+
The library itself has already included a set of popular methods and datasets ready for use immediately.
8+
9+
The library requires *Python 3.7+*.
10+
11+
## Getting started
12+
13+
Make sure you are using *Python 3.7+* for all below!
14+
15+
### Installation
16+
`python setup.py install` (TODO: Pip support will be added soon)
17+
18+
### Import and load a dataset
19+
```py
20+
from semb.data import load, get_dataset_ids
21+
# explore all datasets (both built in and extended by 3rd party)
22+
ids = get_dataset_ids()
23+
# load a dataset
24+
graph = load(ids[0])
25+
```
26+
27+
### Import and load a method
28+
```py
29+
from semb.methods import load, get_method_ids
30+
# explore all methods (both built in and extended by 3rd party)
31+
ids = ge_method_ids()
32+
# load a method, returns a constructor for a method's base class
33+
Method = load(ids[0])
34+
# create and run a method.
35+
# NOTE: except for the first "graph" arg, everything other argument MUST be in keyword form!
36+
method = Method(graph, a=1, b=2, c=3, ...)
37+
method.train()
38+
embeddings = method.get_embeddings()
39+
```
40+
41+
## Extending SEMB
42+
43+
First make sure the `semb` library is installed.
44+
45+
### Developing 3rd party Dataset extension
46+
47+
- Create a Python 3.7+ [package](https://packaging.python.org/tutorials/packaging-projects/) with a name in form of `semb-dataset[$YOUR_CHOSEN_DATASET_ID]`
48+
- Within the package root directory, make sure `__init__.py` is present
49+
- Create a `dataset.py` and make a `Method` class that inherits from `from semb.data import BaseDataset` and implement the required methods. See `src/datasets/airports/dataset.py` for more details.
50+
- Install the package via `setup.py` or pip.
51+
- Now the dataset is loadable by the main client program that uses `semb`!
52+
53+
### Developing 3rd party Method extension
54+
55+
- Create a Python 3.7+ [package](https://packaging.python.org/tutorials/packaging-projects/) with a name in form of `semb-method[$YOUR_CHOSEN_METHOD_ID]`
56+
- Within the package root directory, make sure `__init__.py` is present
57+
- Create a `dataset.py` and make a `Dataset` class that inherits from `from semb.data import BaseDataset` and implement the required methods. See `src/methods/node2vec/method.py` for more details.
58+
- Install the package via `setup.py` or pip.
59+
- Now the method is load-able by the main client program that uses `semb`!
60+
61+
### Note
62+
For both `dataset` and `method` extensions, make sure the `get_id()` to be overridden and returns the same id as your chosen id in your package name.

build/lib/src/__init__.py

Whitespace-only changes.

build/lib/src/data/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .data import Dataset, BaseDataProvider
2+
3+
from .loader import load, get_dataset_ids

build/lib/src/data/data.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
from collections import namedtuple
2+
from typing import List
3+
from os import path
4+
from ..exceptions import UnimplementedException
5+
6+
# TODO: add more buildtin datasets below
7+
# FIXME: fix the current relative path into remote file url
8+
SAMPLE_DATA_PATH = path.join(path.dirname(__file__), '../../sample-data')
9+
10+
# the Data type for all supported data sets
11+
Dataset = namedtuple('Dataset', ['name', 'description', 'format', 'src_url'])
12+
13+
# the base class for all new data providers
14+
class BaseDataProvider(object):
15+
16+
def provideId(self) -> str:
17+
raise UnimplementedException(
18+
"Please implement the provideId() method to register the unique id for your datasets")
19+
20+
def provideDatasets(self) -> List[Dataset]:
21+
raise UnimplementedException(
22+
"Please implement the provideDatasets() method for registering dataset details")

build/lib/src/data/loader.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import networkx as nx
2+
import requests
3+
from requests_file import FileAdapter
4+
from os import path
5+
6+
from ..exceptions import DatasetNotExistException
7+
8+
# repository for all data sets
9+
DATASETS = {}
10+
11+
# dynamically export all included datasets
12+
def _find_builtin_datasets(methods):
13+
pass
14+
15+
# TODO: add support to register 3rd party extension methods automatically by searching the installed packages
16+
def _find_external_datasets(methods):
17+
pass
18+
19+
_find_builtin_datasets(DATASETS)
20+
_find_external_datasets(DATASETS)
21+
22+
def get_dataset_ids():
23+
global DATASETS
24+
return list(DATASETS.keys())
25+
26+
def load(dataset_id):
27+
"""
28+
For a data item, loads from its source and convert to a graph
29+
30+
Args:
31+
dataset_id (str): unique id to load the dataset from
32+
"""
33+
# TODO
34+
pass
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#TODO: what to put here?

build/lib/src/exceptions.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
class UnimplementedException(Exception):
2+
pass
3+
4+
class DatasetNotExistException(Exception):
5+
pass
6+
7+
class DatasetFormatNotAllowedException(Exception):
8+
pass
9+
10+
class MethodKeywordUnAllowedException(Exception):
11+
pass

build/lib/src/methods/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# export BaseDriver to be used for 3rd party extension
2+
from .method import BaseMethod
3+
4+
# export method loader
5+
from .loader import load, get_method_ids

build/lib/src/methods/drne/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)