Notebook readme by RevathyVenukuttan · Pull Request #2 · bmajoros/BlueSTARR

RevathyVenukuttan · 2026-03-25T21:56:01Z

Added the Jupyter notebook for generating counts.txt.gz file and updated the README with link to the notebook for data pre-processing.

An unreleased dependency (from which only a single module, ConfigFile, is used here) is included using git+ syntax.

This initial version is exported from the environment that we know works. Many of the version specs can probably be relaxed to at least allow minor version upgrades.

Also factors the non TF requirements into their own reqs file so both the Dockerfile and a standard install can use a reqs file.

The image is pushed to ghcr.io as the registry. Closes #6.

Also includes a few other minor changes and fixes. Co-authored-by: revathy95 <66286555+RevathyVenukuttan@users.noreply.github.com>

This is to mirror the architecture of DeepSTARR, which uses a Flatten step with subsequent Dense layers, but does not put a dropout layer between these.

This is mainly useful for testing a given configuration file for its contents and syntax.

Adds making dropout after flatten optional

Adds library size normalization for custom loss If the input count file has library sizes for DNA and RNA replicates, activating the custom loss function in the configuration will now use them for correcting the RNA/DNA count ratios accordingly. Note that library sizes won't be used if not present as additional columns, and not unless custom loss (`UseCustomLoss`) is activated. Code ported over from https://github.com/bmajoros/BlueSTARR/blob/main/BlueSTARR-NLL.py --------- Co-authored-by: Bill Majoros <bmajoros@duke.edu>

Corrects library size ratio to using average of DNA library sizes rather than sum. (This corrects the shift in predicted values to the right.) Also simplifies the conditional for using library size correction for theta or not.

Changes from SinePositionEncoding to RotaryEncoding, and changes the transformer layer to use TransformerEncoder from keras_nlp, rather than MultiHeadAttention. Also upgrades the recommended Python version to 3.11 (from 3.10).

hlapp · 2026-03-26T16:09:04Z

See Duke-IGVF/BlueSTARR#25 where this was "moved" and then merged.

hlapp and others added 26 commits May 15, 2024 18:08

Adds python dependencies in standard requirements format

defeb5f

An unreleased dependency (from which only a single module, ConfigFile, is used here) is included using git+ syntax.

Merges changes from upstream (bmajoros/BlueSTARR)

3349eed

Merge changes from upstream (bmajoros/BlueSTARR)

8694839

Adds conda environment definition

80fbe33

This initial version is exported from the environment that we know works. Many of the version specs can probably be relaxed to at least allow minor version upgrades.

Add dockerfile for building BlueSTARR images (#4)

dbf4340

Also factors the non TF requirements into their own reqs file so both the Dockerfile and a standard install can use a reqs file.

Adds GitHub Action workflow to automatically build Docker image

84bf59d

The image is pushed to ghcr.io as the registry. Closes #6.

Adds extra index URL directly into requirements

3d9d75c

Adds environment setup instructions to README

4e91b76

Adds scripts to fix TensorRT library failing to be found

4a45bbd

Adds checking whether link target exists

0767303

Adds documentation for fixing TensorRT not found

e8541af

BlueSTARR version used for v0.1.1 model training

9e3194d

Clarifies instructions for building conda environment

b803208

Adds sequence index to true-vs-predicted table

fa225fe

Also includes a few other minor changes and fixes. Co-authored-by: revathy95 <66286555+RevathyVenukuttan@users.noreply.github.com>

Allows loading pretrained weights prior to training

6082e3e

Adds making dropout after flatten optional

0efced4

This is to mirror the architecture of DeepSTARR, which uses a Flatten step with subsequent Dense layers, but does not put a dropout layer between these.

Adds clean logic for calling as a program

afbb5d8

This is mainly useful for testing a given configuration file for its contents and syntax.

Merge pull request #14 from Duke-IGVF/preDense-dropout

12f3211

Adds making dropout after flatten optional

Fixed library size ratio calculation

dcf4ad2

Corrects library size ratio to using average of DNA library sizes rather than sum. (This corrects the shift in predicted values to the right.) Also simplifies the conditional for using library size correction for theta or not.

Updates transformer-based model architecture

e57f7a7

Changes from SinePositionEncoding to RotaryEncoding, and changes the transformer layer to use TransformerEncoder from keras_nlp, rather than MultiHeadAttention. Also upgrades the recommended Python version to 3.11 (from 3.10).

Expands instructions for running BlueSTARR code

9a8239a

Adds initial section on data processing from STARR-seq

83df3ce

Adds documentation and links for downsampling

3647860

Notebook for data preprocessing

d56610e

Update README.md

edce05c

hlapp deleted the notebook_readme branch March 26, 2026 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook readme#2

Notebook readme#2
RevathyVenukuttan wants to merge 26 commits into
bmajoros:mainfrom
Duke-IGVF:notebook_readme

RevathyVenukuttan commented Mar 25, 2026

Uh oh!

hlapp commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RevathyVenukuttan commented Mar 25, 2026

Uh oh!

hlapp commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hlapp commented Mar 26, 2026 •

edited

Loading