Skip to content

Notebook readme#2

Open
RevathyVenukuttan wants to merge 26 commits into
bmajoros:mainfrom
Duke-IGVF:notebook_readme
Open

Notebook readme#2
RevathyVenukuttan wants to merge 26 commits into
bmajoros:mainfrom
Duke-IGVF:notebook_readme

Conversation

@RevathyVenukuttan
Copy link
Copy Markdown

Added the Jupyter notebook for generating counts.txt.gz file and updated the README with link to the notebook for data pre-processing.

hlapp and others added 26 commits May 15, 2024 18:08
An unreleased dependency (from which only a single module, ConfigFile,
is used here) is included using git+ syntax.
This initial version is exported from the environment that we know works.
Many of the version specs can probably be relaxed to at least allow minor
version upgrades.
Also factors the non TF requirements into their own reqs file so both
the Dockerfile and a standard install can use a reqs file.
The image is pushed to ghcr.io as the registry. Closes #6.
Also includes a few other minor changes and fixes.

Co-authored-by: revathy95 <66286555+RevathyVenukuttan@users.noreply.github.com>
This is to mirror the architecture of DeepSTARR, which uses a Flatten
step with subsequent Dense layers, but does not put a dropout layer
between these.
This is mainly useful for testing a given configuration file for its
contents and syntax.
Adds making dropout after flatten optional
Adds library size normalization for custom loss

If the input count file has library sizes for DNA and RNA replicates,
activating the custom loss function in the configuration will now use
them for correcting the RNA/DNA count ratios accordingly.

Note that library sizes won't be used if not present as additional columns,
and not unless custom loss (`UseCustomLoss`) is activated.

Code ported over from https://github.com/bmajoros/BlueSTARR/blob/main/BlueSTARR-NLL.py

---------

Co-authored-by: Bill Majoros <bmajoros@duke.edu>
Corrects library size ratio to using average of DNA library sizes
rather than sum. (This corrects the shift in predicted values to the
right.) Also simplifies the conditional for using library size
correction for theta or not.
Changes from SinePositionEncoding to RotaryEncoding, and changes the
transformer layer to use TransformerEncoder from keras_nlp, rather than
MultiHeadAttention.

Also upgrades the recommended Python version to 3.11 (from 3.10).
@hlapp hlapp deleted the notebook_readme branch March 26, 2026 14:49
@hlapp
Copy link
Copy Markdown

hlapp commented Mar 26, 2026

See Duke-IGVF/BlueSTARR#25 where this was "moved" and then merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants