Skip to content

Add big file diann#151

Open
Rudhik1904 wants to merge 19 commits into
develfrom
Add-big-file-DIANN
Open

Add big file diann#151
Rudhik1904 wants to merge 19 commits into
develfrom
Add-big-file-DIANN

Conversation

@Rudhik1904
Copy link
Copy Markdown
Contributor

@Rudhik1904 Rudhik1904 commented Feb 3, 2026

Motivation and Context

This PR adds comprehensive support for "big file" DIANN data processing workflows in MSstatsShiny, enabling users to handle large DIANN (Data-Independent Acquisition analysis) files through optional lazy loading via the MSstatsBig package. Previously, only Spectronaut "big file" workflows were supported. The implementation adopts a modular, per-workflow-per-filetype architecture for big-file path handling and introduces a single source of truth for large-file configuration. MSstatsBig has been recategorized as an optional dependency (moved from Imports to Suggests), allowing the application to function without it for standard-file workflows while maintaining support for big-file operations when the package is available.

Detailed Changes

Core Module Logic (module-loadpage-server.R)

  • Added is_web_server parameter to loadpageServer() function signature to enable environment-aware branching
  • Implemented separate shinyFiles browse and parse reactives for each workflow/filetype (spec and diann) instead of shared reactives
  • Created workflow-specific big-file path reactives (local_big_file_path_spec(), local_big_file_path_diann()) that resolve to NULL in web-server mode
  • Added new renderUI outputs for DIANN large-file UI components:
    • diann_header_ui: DIANN section header
    • diann_file_selection_ui: Large-file browser and selection
    • diann_options_ui: Large-file processing options
  • Updated "proceed" button enablement logic to check per-filetype big-file path reactives
  • Modified getData() call to pass explicit big_file_path_spec and big_file_path_diann arguments derived from local reactives

UI Layer (module-loadpage-ui.R)

  • Refactored create_diann_uploads() from static conditionalPanel to dynamic uiOutput() rendering via tagList()
  • Added new UI constructor helper functions:
    • create_diann_header(ns): DIANN section header
    • create_spectronaut_mode_selector(ns, selected = FALSE): Mode selection for Spectronaut files
    • create_diann_mode_selector(ns, selected = FALSE): Mode selection for DIANN files
    • create_spectronaut_standard_ui(ns): Standard-file input for Spectronaut
    • create_diann_standard_ui(ns): Standard-file input for DIANN
    • create_diann_large_file_ui(ns): Large-file browser for DIANN
    • create_diann_large_filter_options(ns, mbr_def = TRUE): Filter options with MBR toggle for large DIANN
    • create_diann_large_bottom_ui(ns, max_feature_def = 100, unique_peps_def = FALSE, agg_psms_def = FALSE, few_obs_def = FALSE): Processing options for large DIANN files
  • Separated Spectronaut large-file input/output IDs to be spec-specific

Data Processing Logic (utils.R)

  • Extended getData() function signature with optional big_file_path_spec and big_file_path_diann parameters
  • Added resolveDiannQuantCol(input) helper function to compute quantification column for DIANN large-file workflows
  • Implemented Spectronaut big-file branch with:
    • MSstatsBig availability check
    • Validation of qvalue_cutoff (0–1) and max_feature_count (positive)
    • Path existence verification
    • Call to MSstatsBig::bigSpectronauttoMSstatsFormat() with filtering parameters
    • Memory-aware result collection via dplyr::collect() with error handling for memory allocation failures
  • Implemented DIANN big-file branch mirroring Spectronaut flow:
    • Validation of max_feature_count
    • Path existence check
    • resolveDiannQuantCol() resolution for quantificationColumn
    • Call to MSstatsBig::bigDIANNtoMSstatsFormat() with MBR and filtering parameters
    • Memory-error handling on collection
  • Updated getDataCode() to:
    • Report MSstatsBig version availability in generated code
    • Generate code for large Spectronaut/DIANN paths using MSstatsBig converters
    • Use resolveDiannQuantCol() when emitting DIANN large-file quantificationColumn argument

Dependencies (DESCRIPTION)

  • Moved MSstatsBig from mandatory Imports to optional Suggests to support optional big-file functionality
  • Reformatted Imports and Suggests fields across multiple lines for maintainability

Unit Tests

Test Coverage for Big-File Workflows (tests/testthat/test-utils.R)

  • Big Spectronaut tests:

    • Successful conversion and collection with big_file_path_spec parameter
    • Invalid qvalue_cutoff validation (rejects out-of-range values)
    • Invalid max_feature_count validation (rejects non-positive values)
    • Missing file handling (validates path existence)
    • Memory allocation error handling from dplyr::collect() with notification and spinner removal
  • Big DIANN tests:

    • Successful conversion and collection with big_file_path_diann parameter
    • Invalid max_feature_count validation
    • Missing file error handling
    • Memory allocation error behavior with proper error notification

New Parallel Execution Test (tests/testthat/test_parallel_simulation.R)

  • Added simulation tests for memory behavior in parallel execution contexts
  • Includes mocked internal MSstats functions (.fitSurvival(), .isSummarizable(), .runTukey())
  • Tests both "leaky" and "fixed" memory cleanup scenarios using mclapply()
  • Logs memory usage to validate proper resource management in multi-core settings
  • Exercises left-truncated survival modeling paths with cleanup verification

Coding Guidelines

No apparent violations of standard R coding guidelines are evident in the provided changes. The implementation follows consistent patterns:

  • Proper use of reactive programming patterns in Shiny modules
  • Modular UI constructors with namespace isolation
  • Comprehensive error handling with try/catch blocks for memory errors
  • Proper parameter validation before external function calls
  • Clear separation of concerns between UI layer, server logic, and data processing utilities

@Rudhik1904 Rudhik1904 requested a review from tonywu1999 February 3, 2026 06:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR extends big-file processing (via MSstatsBig) from Spectronaut to DIANN by introducing per-filetype browse/parse reactives, per-filetype derived big-file path reactives in local mode, DIANN-specific UI components, and updated getData logic with validation and collection for both filetypes. MSstatsBig becomes an optional dependency.

Changes

Large-File Mode for Spectronaut and DIANN

Layer / File(s) Summary
Package Metadata
DESCRIPTION
MSstatsBig moved from mandatory Imports to optional Suggests.
Infrastructure & Reactives
R/module-loadpage-server.R
loadpageServer gains is_web_server parameter; per-filetype big-file browse/parse reactives created for spec and diann; local mode exposes per-filetype datapath reactives; web-server mode forces them to NULL.
UI Composition
R/module-loadpage-ui.R
New modular UI constructors for DIANN header, mode selectors (create_spectronaut_mode_selector, create_diann_mode_selector), standard inputs, and large-file selection/option components; create_diann_uploads() converted to dynamic uiOutput() placeholders.
Server UI Rendering
R/module-loadpage-server.R
Conditional DIANN UI outputs (diann_header_ui, diann_file_selection_ui, diann_options_ui) branch on is_web_server and big-file mode to show standard vs large components.
Proceed Logic
R/module-loadpage-server.R
Updated proceed-enablement to compute spec_big_file_ok and diann_big_file_ok using per-filetype derived big-file path reactives.
Data Processing Layer
R/utils.R
getData() signature extended with optional big_file_path_spec and big_file_path_diann; Spectronaut big-file branch validates qvalue_cutoff/max_feature_count, verifies file path, calls MSstatsBig::bigSpectronauttoMSstatsFormat(), and collects with memory-error handling; DIANN big-file branch follows parallel pattern with resolveDiannQuantCol() for quantification column resolution and MSstatsBig::bigDIANNtoMSstatsFormat().
Helper Functions
R/utils.R
New resolveDiannQuantCol() computes DIANN quantification column name from input.
Code Generation
R/utils.R
getDataCode() updated to emit MSstatsBig version, include MSstatsBig converter calls for big-file paths, and use dplyr::collect() for result materialization.
Event Handler
R/module-loadpage-server.R
get_data event now calls getData(input, big_file_path_spec = local_big_file_path_spec(), big_file_path_diann = local_big_file_path_diann()) to pass resolved paths.
Tests
tests/testthat/test-utils.R
Comprehensive Big Spectronaut and Big DIANN test suites covering valid input, validation errors, missing files, and memory-error paths; assertions verify correct conversion, collection, notifications, and spinner removal.
Memory Simulation
tests/testthat/test_parallel_simulation.R
New test script that mocks MSstats internals and simulates parallel workloads to assess memory behavior (leaky vs fixed variants).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Vitek-Lab/MSstatsShiny#114: Both PRs modify loadpage UI and DIANN utils—adding DIANN-specific UI helpers and quantification handling.
  • Vitek-Lab/MSstatsShiny#193: Both PRs extend DIANN handling in loadpage and utils, one with version detection and the other with large-file and getData plumbing.
  • Vitek-Lab/MSstatsShiny#144: Both PRs add MSstatsBig-backed Spectronaut "big" processing (shinyFiles wiring and conversion), with this PR extending the pattern to DIANN and per-filetype reactives.

Suggested labels

enhancement, feature

Suggested reviewers

  • devonjkohler

Poem

🐰 Big files hop through gates with per-filetype care,
DIANN and Spectronaut each get their own lair,
Shiny reactives branch where web-servers dare,
MSstatsBig collects with memory to spare,
A careful design, tested everywhere! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Add big file diann' is vague and generic; it uses non-descriptive phrasing that does not clearly convey the scope or nature of the changes beyond mentioning DIANN. Consider a more specific title that describes the primary change, such as 'Support big-file DIANN workflow with MSstatsBig integration' or 'Add big-file DIANN processing with conditional large-file UI'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch Add-big-file-DIANN
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch Add-big-file-DIANN

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
R/module-loadpage-ui.R (1)

226-231: ⚠️ Potential issue | 🔴 Critical

Add && !input['loadpage-big_file_diann'] to the annotation file condition for consistency.

DIANN big file mode (using MSstatsBig::bigDIANNtoMSstatsFormat) does not accept annotation parameters, unlike regular DIANN mode which supports optional annotations. The annotation file input should be excluded for DIANN big file mode, matching the pattern already established for Spectronaut (!input['loadpage-big_file_spec']).

Suggested fix

Update line 226 condition from:

condition = "(input['loadpage-filetype'] == 'sky' || input['loadpage-filetype'] == 'prog' || input['loadpage-filetype'] == 'PD' || (input['loadpage-filetype'] == 'spec' && !input['loadpage-big_file_spec']) || input['loadpage-filetype'] == 'open'|| input['loadpage-filetype'] =='spmin' || input['loadpage-filetype'] == 'phil' || input['loadpage-filetype'] == 'diann') && input['loadpage-BIO'] != 'PTM'",

To:

condition = "(input['loadpage-filetype'] == 'sky' || input['loadpage-filetype'] == 'prog' || input['loadpage-filetype'] == 'PD' || (input['loadpage-filetype'] == 'spec' && !input['loadpage-big_file_spec']) || input['loadpage-filetype'] == 'open'|| input['loadpage-filetype'] =='spmin' || input['loadpage-filetype'] == 'phil' || (input['loadpage-filetype'] == 'diann' && !input['loadpage-big_file_diann'])) && input['loadpage-BIO'] != 'PTM'",
🤖 Fix all issues with AI agents
In `@R/module-loadpage-server.R`:
- Around line 75-98: The DIANN big-file UI is missing the separator buttons, so
update output$diann_file_selection_ui to append
create_separator_buttons(session$ns, "sep_dianndata") when building the UI for
big-file mode; locate the branch that adds
create_diann_large_file_ui(session$ns) (function create_diann_large_file_ui) and
add the call to create_separator_buttons(session$ns, "sep_dianndata") alongside
it so both standard (create_diann_standard_ui) and large-file modes include
separators.

In `@R/utils.R`:
- Around line 533-545: The call to MSstatsBig::bigSpectronauttoMSstatsFormat
uses a hardcoded output_file_name ("output_file.csv") which can collide for
concurrent runs; change it to generate a unique temporary filename (e.g., via
tempfile with a .csv extension) and pass that variable as output_file_name in
the converted_data assignment (keep references to local_big_file_path and
converted_data unchanged), and optionally ensure the temp file is cleaned up
after use.
- Around line 617-627: The call to MSstatsBig::bigDIANNtoMSstatsFormat uses a
hardcoded output_file_name ("output_file.csv"); replace that literal with a
unique temporary filename (e.g. generate via tempfile() with an appropriate file
extension) and pass that variable as output_file_name in the converted_data
assignment so the function uses a temp file for local_big_file_path conversion;
ensure the tempfile is created before the call and removed/managed after use to
avoid collisions.

In `@tests/testthat/test-loadpage-server.R`:
- Around line 1-21: Implement the missing helper .get_data_source_type to accept
(file_type, is_big_flag) and return "big_spectronaut" only when file_type ==
"spec" and is_big_flag is TRUE; in every other case (including NULL or any
non-"spec" file_type) return "standard"; add the function to the source (e.g.,
alongside other loadpage server helpers) and ensure it handles NULL/NA boolean
inputs safely and is exported/internal as needed so tests can call
.get_data_source_type.
🧹 Nitpick comments (2)
R/module-loadpage-server.R (1)

100-143: Consider extracting common default-value initialization pattern.

Both diann_options_ui (lines 104-110) and spectronaut_options_ui (lines 125-133) use the same pattern for initializing default values from potentially-null inputs. This repetition could be reduced with a helper function.

♻️ Optional: Extract helper for default value initialization
# Helper function to get input value or default
get_input_or_default <- function(input_value, default) {

if (is.null(input_value)) default else input_value
}

# Usage in diann_options_ui:
max_feature_def <- get_input_or_default(input$max_feature_count, 100)
unique_peps_def <- get_input_or_default(input$filter_unique_peptides, FALSE)
R/module-loadpage-ui.R (1)

377-396: Consider consolidating duplicate bottom UI functions.

create_spectronaut_large_bottom_ui and create_diann_large_bottom_ui are nearly identical, differing only in the default value for max_feature_def (20 vs 100). These could be consolidated into a single function.

♻️ Optional: Consolidate into single function
-#' Create Spectronaut large file options (Bottom part)
-#' `@noRd`
-create_spectronaut_large_bottom_ui <- function(ns, max_feature_def = 20, unique_peps_def = FALSE, agg_psms_def = FALSE, few_obs_def = FALSE) {
-  tagList(
-    numericInput(ns("max_feature_count"), "Max feature count", value = max_feature_def, min = 1),
-    checkboxInput(ns("filter_unique_peptides"), "Use unique peptides", value = unique_peps_def),
-    checkboxInput(ns("aggregate_psms"), "Aggregate PSMs to peptides", value = agg_psms_def),
-    checkboxInput(ns("filter_few_obs"), "Filter features with few observations", value = few_obs_def)
-  )
-}
-
-#' Create DIANN large file options (Bottom part)
-#' `@noRd`
-create_diann_large_bottom_ui <- function(ns, max_feature_def = 100, unique_peps_def = FALSE, agg_psms_def = FALSE, few_obs_def = FALSE) {
-  tagList(
-    numericInput(ns("max_feature_count"), "Max feature count", value = max_feature_def, min = 1),
-    checkboxInput(ns("filter_unique_peptides"), "Use unique peptides", value = unique_peps_def),
-    checkboxInput(ns("aggregate_psms"), "Aggregate PSMs to peptides", value = agg_psms_def),
-    checkboxInput(ns("filter_few_obs"), "Filter features with few observations", value = few_obs_def)
-  )
-}
+#' Create large file bottom options (shared by Spectronaut and DIANN)
+#' `@noRd`
+create_large_file_bottom_ui <- function(ns, max_feature_def, unique_peps_def = FALSE, agg_psms_def = FALSE, few_obs_def = FALSE) {
+  tagList(
+    numericInput(ns("max_feature_count"), "Max feature count", value = max_feature_def, min = 1),
+    checkboxInput(ns("filter_unique_peptides"), "Use unique peptides", value = unique_peps_def),
+    checkboxInput(ns("aggregate_psms"), "Aggregate PSMs to peptides", value = agg_psms_def),
+    checkboxInput(ns("filter_few_obs"), "Filter features with few observations", value = few_obs_def)
+  )
+}
+
+# Then use with tool-specific defaults:
+# create_large_file_bottom_ui(ns, max_feature_def = 20, ...)  # Spectronaut
+# create_large_file_bottom_ui(ns, max_feature_def = 100, ...) # DIANN

Comment thread R/module-loadpage-server.R
Comment thread R/utils.R
Comment thread R/utils.R
Comment on lines +617 to +627
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
output_file_name = "output_file.csv",
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same hardcoded output file name issue as Spectronaut path.

The output_file_name = "output_file.csv" is also hardcoded here. Apply the same fix using tempfile() for consistency and to avoid file collisions.

🛠️ Suggested fix
         converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
           input_file = local_big_file_path,
-          output_file_name = "output_file.csv",
+          output_file_name = tempfile(pattern = "diann_", fileext = ".csv"),
           backend = "arrow",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
output_file_name = "output_file.csv",
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
output_file_name = tempfile(pattern = "diann_", fileext = ".csv"),
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)
🤖 Prompt for AI Agents
In `@R/utils.R` around lines 617 - 627, The call to
MSstatsBig::bigDIANNtoMSstatsFormat uses a hardcoded output_file_name
("output_file.csv"); replace that literal with a unique temporary filename (e.g.
generate via tempfile() with an appropriate file extension) and pass that
variable as output_file_name in the converted_data assignment so the function
uses a temp file for local_big_file_path conversion; ensure the tempfile is
created before the call and removed/managed after use to avoid collisions.

Comment thread tests/testthat/test-loadpage-server.R Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@R/module-loadpage-ui.R`:
- Around line 274-277: Add three renderUI handlers in R/module-loadpage-server.R
mirroring the Spectronaut pattern: implement output$diann_header_ui,
output$diann_file_selection_ui, and output$diann_options_ui. Each renderUI
should call the corresponding DIANN UI builder function (the same functions used
to construct the DIANN widgets in the UI module) and include the same
conditional logic used for Spectronaut (e.g., only show file selection/options
when the selected file type is DIANN); wire these outputs to the names used in
create_diann_uploads() (diann_header_ui, diann_file_selection_ui,
diann_options_ui) so the UI renders correctly.

Comment thread R/module-loadpage-ui.R
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
R/module-loadpage-ui.R (2)

589-619: ⚠️ Potential issue | 🟠 Major

Standard label-free options are not hidden for DIANN big-file mode.

Lines 592 and 599 exclude Spectronaut big-file mode (input['loadpage-filetype'] != 'spec' || !input['loadpage-big_file_spec']), but do not exclude DIANN big-file mode. When a user selects DIANN with large-file mode enabled, they'll see both the standard label-free options (e.g., "Use unique peptides" at line 594) and the big-file-specific options rendered by create_diann_large_bottom_ui (which also has "Use unique peptides"). This creates a confusing duplicate UI.

Proposed fix: add DIANN big-file exclusion
-      condition = "input['loadpage-filetype'] && input['loadpage-DDA_DIA'] == 'LType' && input['loadpage-filetype'] != 'sample' && input['loadpage-filetype'] != 'MRF' && (input['loadpage-filetype'] != 'spec' || !input['loadpage-big_file_spec'])",
+      condition = "input['loadpage-filetype'] && input['loadpage-DDA_DIA'] == 'LType' && input['loadpage-filetype'] != 'sample' && input['loadpage-filetype'] != 'MRF' && (input['loadpage-filetype'] != 'spec' || !input['loadpage-big_file_spec']) && (input['loadpage-filetype'] != 'diann' || !input['loadpage-big_file_diann'])",

Apply the same pattern to the condition on line 599.


224-232: ⚠️ Potential issue | 🟠 Major

Exclude annotation upload for DIANN big-file mode.

The annotation file is only required for regular DIANN (passed to DIANNtoMSstatsFormat on line 674), but is not used in the big-file workflow (bigDIANNtoMSstatsFormat on lines 617–627 has no annotation parameter). Update the condition on line 226 to exclude annotation upload for big-file DIANN, consistent with the Spectronaut pattern: change input['loadpage-filetype'] == 'diann' to (input['loadpage-filetype'] == 'diann' && !input['loadpage-big_file_diann']).

🤖 Fix all issues with AI agents
In `@R/module-loadpage-server.R`:
- Around line 233-248: The separator requirement is inconsistent between
Spectronaut and DIANN big-file modes; update the Spectronaut checks to match
DIANN by moving the sep check into spec_regular_file_ok (i.e. set
spec_regular_file_ok <- !isTRUE(input$big_file_spec) && !is.null(input$specdata)
&& !is.null(input$sep_specdata) and keep spec_big_file_ok <-
isTRUE(input$big_file_spec) && length(local_big_file_path()) > 0) and then
change the if to if(spec_regular_file_ok || spec_big_file_ok) {
enable("proceed1") } so big-file Spectronaut no longer requires sep selection
(mirrors diann behavior).
🧹 Nitpick comments (1)
R/module-loadpage-server.R (1)

50-98: Separator buttons are rendered even in large-file mode, where they may be irrelevant.

When big_file_spec or big_file_diann is TRUE, the user browses for a local file (likely Parquet). The CSV/TSV column-separator radio buttons are still shown (lines 72, 97) but serve no purpose in that path. Consider conditionally rendering separators only when the standard (non-big-file) UI is active to avoid user confusion.

Example for Spectronaut (similar change for DIANN)
-      tagList(ui_elements, create_separator_buttons(session$ns, "sep_specdata"))
+      if (is_web_server || !isTRUE(input$big_file_spec)) {
+        tagList(ui_elements, create_separator_buttons(session$ns, "sep_specdata"))
+      } else {
+        ui_elements
+      }

Comment thread R/module-loadpage-server.R
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@R/module-loadpage-server.R`:
- Around line 100-143: The shared input IDs (max_feature_count,
filter_unique_peptides, aggregate_psms, filter_few_obs) used by
create_diann_large_bottom_ui() and create_spectronaut_large_bottom_ui() cause
state leakage between DIANN and Spectronaut; fix by giving each UI its own
namespaced IDs (e.g., diann_max_feature_count, diann_filter_unique_peptides,
diann_aggregate_psms, diann_filter_few_obs for DIANN and spec_max_feature_count,
spec_filter_unique_peptides, spec_aggregate_psms, spec_filter_few_obs for
Spectronaut) inside those functions and update all references in
module-loadpage-server.R (the input checks and local default variables used in
output$diann_options_ui and output$spectronaut_options_ui) to read the new
per-tool input names so defaults are applied correctly when switching file
types.

Comment thread R/module-loadpage-server.R
Comment thread R/utils.R

# Call the big file conversion function from MSstatsConvert
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. The annotation file will need to be incorporated into this function
  2. Add bigDIANNtoMSstatsFormat and bigSpectronauttoMSstatsFormat as dependencies in this file. And then you won't need to do MSstatsBig:: when calling the function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the annotation file in the DIANN Converter.

So web version does not support Big files, so I was thinking we don't want to add MSstatsBig in the dependencies, but we should add it in the suggested sections.

Updating MSstatsShiny.R with dependancies
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@R/utils.R`:
- Around line 938-951: The generated big DIANN code omits the annotation
argument for MSstatsBig::bigDIANNtoMSstatsFormat; update the code building block
that constructs the bigDIANN call (where codes is appended with "data =
MSstatsBig::bigDIANNtoMSstatsFormat(") to include annotation = getAnnot(input)
as a parameter (matching how the runtime uses annotation = getAnnot(input)),
ensuring proper comma placement and quoting/concatenation consistent with the
surrounding parameters (e.g., alongside input_file, backend, MBR,
quantificationColumn, etc.) so the emitted call contains annotation =
getAnnot(input).
- Around line 595-645: The generated code in getDataCode omits the annotation
argument when emitting the bigDIANNtoMSstatsFormat() call; update getDataCode to
include a line that appends the annotation parameter (e.g., codes = paste(codes,
"  annotation = \"insert your annotation file path\",\\n", sep = "")) so the
generated call matches the runtime call (bigDIANNtoMSstatsFormat with annotation
= getAnnot(input)); ensure the string is properly escaped and includes the
trailing comma/newline and references input$big_file_browse/shared UI as before.

In `@tests/testthat/test_parallel_simulation.R`:
- Around line 1-159: This file is not a real test and triggers heavy
work/side-effects during R CMD check: top-level mclapply sleeps and allocates
large objects, writes parallel_log.txt, and has no expectations; fix by either
moving the script out of tests/testthat (e.g., inst/scripts/) or converting it
into a proper test_that block that (1) wraps execution in testthat::skip_on_ci()
and testthat::skip_on_os("windows") or similar guards, (2) uses
withr::local_tempdir()/tempfile() for any log files instead of writing to
working dir, (3) reduces or parameterizes the heavy allocation/sleep (or mock
.fitSurvival to avoid large numeric vectors) and (4) includes explicit expect_*
assertions (or expect_silent/expect_true) to produce pass/fail signals; update
functions MSstatsSummarizeSingleTMP_Leaky_Sim and
MSstatsSummarizeSingleTMP_Fixed_Sim usage in the test accordingly.
- Line 146: The test currently creates and writes to a hard-coded file
"parallel_log.txt" (via file.create and subsequent cat(..., file =
"parallel_log.txt", append = TRUE)), causing unwanted I/O side effects; update
the test to generate a temp path (e.g. tempfile() or withr::local_tempfile())
and use that variable for all file.create and cat calls (and any reads) so no
artifact is left in the working directory—ensure every reference to
"parallel_log.txt" in the test (file.create, cat, etc.) is replaced with the
temp file variable.

Comment thread R/utils.R
Comment on lines +595 to +645
if (isTRUE(input$big_file_diann)) {
# Logic for big DIANN files
# Parse the file path from shinyFiles input
volumes <- shinyFiles::getVolumes()()
path_info <- shinyFiles::parseFilePaths(volumes, input$big_file_browse)
local_big_file_path <- if (nrow(path_info) > 0) path_info$datapath else NULL

if (!is.numeric(input$max_feature_count) || is.na(input$max_feature_count) || input$max_feature_count <= 0) {
showNotification("Error: max_feature_count must be a positive number.", type = "error")
shinybusy::remove_modal_spinner()
return(NULL)
}

if (is.null(local_big_file_path) || !file.exists(local_big_file_path)) {
showNotification("Error: The selected file does not exist or is not readable.", type = "error")
shinybusy::remove_modal_spinner()
return(NULL)
}

shinybusy::update_modal_spinner(text = "Processing large DIANN file...")

# Call the big file conversion function from MSstatsConvert
converted_data <- bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
annotation = getAnnot(input),
output_file_name = "output_file.csv",
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)

# Attempt to load the data into memory.
mydata <- tryCatch({
dplyr::collect(converted_data)
}, error = function(e) {
showNotification(
paste("Memory Error: The dataset is too large to process in-memory.", e$message),
type = "error",
duration = NULL
)
return(NULL)
})

if (is.null(mydata)) {
shinybusy::remove_modal_spinner()
return(NULL)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n 'big_file_browse' --type=r -C2

Repository: Vitek-Lab/MSstatsShiny

Length of output: 2953


🏁 Script executed:

rg -n 'bigDIANNtoMSstatsFormat' --type=r -C5

Repository: Vitek-Lab/MSstatsShiny

Length of output: 4398


🏁 Script executed:

sed -n '938,955p' R/utils.R

Repository: Vitek-Lab/MSstatsShiny

Length of output: 1306


🏁 Script executed:

# Check if both Spectronaut and DIANN big file UIs are conditionally rendered or always active
rg -n 'create_spectronaut_large_file_ui|create_diann_large_file_ui' -C3

Repository: Vitek-Lab/MSstatsShiny

Length of output: 2795


Generated code for DIANN big-file conversion is missing the annotation parameter.

The runtime code at line 619 correctly includes annotation = getAnnot(input) in the bigDIANNtoMSstatsFormat() call. However, the generated code (lines 938–955 in the getDataCode function) omits this parameter. The generated code should include:

codes = paste(codes, "  annotation = \"insert your annotation file path\",\n", sep = "")

The shared input$big_file_browse input ID is appropriate—both Spectronaut and DIANN large-file UIs are conditionally rendered (only one active at a time based on input$big_file_spec or input$big_file_diann), so the single shinyFileChoose binding works correctly.

🤖 Prompt for AI Agents
In `@R/utils.R` around lines 595 - 645, The generated code in getDataCode omits
the annotation argument when emitting the bigDIANNtoMSstatsFormat() call; update
getDataCode to include a line that appends the annotation parameter (e.g., codes
= paste(codes, "  annotation = \"insert your annotation file path\",\\n", sep =
"")) so the generated call matches the runtime call (bigDIANNtoMSstatsFormat
with annotation = getAnnot(input)); ensure the string is properly escaped and
includes the trailing comma/newline and references input$big_file_browse/shared
UI as before.

Comment thread R/utils.R
Comment on lines +1 to +159
library(data.table)
library(survival)
library(parallel)

# -------------------------------------------------------------------------
# 1. Mock Internal MSstats Functions
# -------------------------------------------------------------------------

.fitSurvival <- function(data, iterations) {
tryCatch({
# Create a real survreg object to simulate memory usage
# Use a subset of data to ensure stability/speed, as we only care about object size
fit <- survreg(Surv(newABUNDANCE, !cen, type="left") ~ 1,
data = head(data, 500), dist = "gaussian")
# Artificially bloat the object to simulate a complex model (approx 100MB)
# Real MSstats models can be very large due to model frames and environments
fit$bloat <- numeric(12.5 * 1024 * 1024)
return(fit)
}, error = function(e) return(NULL))
}

.isSummarizable <- function(data, remove50missing) return(data)

.runTukey <- function(data, is_labeled, censored_symbol, remove50missing) {
return(data.table(Protein = "TestProtein", LogIntensities = mean(data$newABUNDANCE, na.rm=TRUE)))
}

# -------------------------------------------------------------------------
# 2. Define Functions with "Work Simulation" (Sleep)
# -------------------------------------------------------------------------

# LEAKY VERSION
MSstatsSummarizeSingleTMP_Leaky_Sim <- function (single_protein, impute, censored_symbol, remove50missing, aft_iterations = 90) {
# ... Setup ...
newABUNDANCE = n_obs = n_obs_run = RUN = FEATURE = LABEL = NULL
predicted = censored = NULL
cols = intersect(colnames(single_protein), c("newABUNDANCE", "cen", "RUN", "FEATURE", "ref"))
single_protein = single_protein[(n_obs > 1 & !is.na(n_obs)) & (n_obs_run > 0 & !is.na(n_obs_run))]
if (nrow(single_protein) == 0) return(list(NULL, NULL))
single_protein[, `:=`(RUN, factor(RUN))]
single_protein[, `:=`(FEATURE, factor(FEATURE))]

if (impute & any(single_protein[["censored"]])) {
converged = TRUE
survival_fit = withCallingHandlers({
.fitSurvival(single_protein[LABEL == "L", cols, with = FALSE], aft_iterations)
}, warning = function(w) { if (grepl("converge", conditionMessage(w), ignore.case = TRUE)) converged <<- FALSE })

if (converged && !is.null(survival_fit)) {
single_protein[, `:=`(predicted, predict(survival_fit, newdata = .SD))]
} else {
single_protein[, `:=`(predicted, NA_real_)]
}

# --- LEAK SIMULATION ---
# The object 'survival_fit' is still in memory here.
# We simulate "doing other work" (predictions, formatting) by sleeping.
Sys.sleep(1)

# Report Memory Usage of this Worker
mem_used <- sum(gc()[,2])
msg <- sprintf("[Worker %d] LEAKY State - Holding Memory: %.2f MB\n", Sys.getpid(), mem_used)
cat(msg)
cat(msg, file = "parallel_log.txt", append = TRUE)

single_protein[, `:=`(predicted, ifelse(censored & (LABEL == "L"), predicted, NA))]
single_protein[, `:=`(newABUNDANCE, ifelse(censored & LABEL == "L", predicted, newABUNDANCE))]
survival = single_protein[, c(cols, "predicted"), with = FALSE]
} else {
survival = single_protein[, cols, with = FALSE]
survival[, `:=`(predicted, NA)]
}
# ... Finalize ...
single_protein = .isSummarizable(single_protein, remove50missing)
if (is.null(single_protein)) return(list(NULL, NULL))
result = .runTukey(single_protein, TRUE, censored_symbol, remove50missing)
list(result, survival)
}

# FIXED VERSION
MSstatsSummarizeSingleTMP_Fixed_Sim <- function (single_protein, impute, censored_symbol, remove50missing, aft_iterations = 90) {
# ... Setup ...
newABUNDANCE = n_obs = n_obs_run = RUN = FEATURE = LABEL = NULL
predicted = censored = NULL
cols = intersect(colnames(single_protein), c("newABUNDANCE", "cen", "RUN", "FEATURE", "ref"))
single_protein = single_protein[(n_obs > 1 & !is.na(n_obs)) & (n_obs_run > 0 & !is.na(n_obs_run))]
if (nrow(single_protein) == 0) return(list(NULL, NULL))
single_protein[, `:=`(RUN, factor(RUN))]
single_protein[, `:=`(FEATURE, factor(FEATURE))]

if (impute & any(single_protein[["censored"]])) {
converged = TRUE
survival_fit = withCallingHandlers({
.fitSurvival(single_protein[LABEL == "L", cols, with = FALSE], aft_iterations)
}, warning = function(w) { if (grepl("converge", conditionMessage(w), ignore.case = TRUE)) converged <<- FALSE })

if (converged && !is.null(survival_fit)) {
single_protein[, `:=`(predicted, predict(survival_fit, newdata = .SD))]
} else {
single_protein[, `:=`(predicted, NA_real_)]
}

# --- FIX APPLIED ---
rm(survival_fit)

# --- FIXED SIMULATION ---
# We simulate "doing other work" by sleeping.
Sys.sleep(1)

# Report Memory Usage of this Worker
mem_used <- sum(gc()[,2])
msg <- sprintf("[Worker %d] FIXED State - Holding Memory: %.2f MB\n", Sys.getpid(), mem_used)
cat(msg)
cat(msg, file = "parallel_log.txt", append = TRUE)

single_protein[, `:=`(predicted, ifelse(censored & (LABEL == "L"), predicted, NA))]
single_protein[, `:=`(newABUNDANCE, ifelse(censored & LABEL == "L", predicted, newABUNDANCE))]
survival = single_protein[, c(cols, "predicted"), with = FALSE]
} else {
survival = single_protein[, cols, with = FALSE]
survival[, `:=`(predicted, NA)]
}
# ... Finalize ...
single_protein = .isSummarizable(single_protein, remove50missing)
if (is.null(single_protein)) return(list(NULL, NULL))
result = .runTukey(single_protein, TRUE, censored_symbol, remove50missing)
list(result, survival)
}

# -------------------------------------------------------------------------
# 3. Run Simulation
# -------------------------------------------------------------------------

set.seed(123)
n_rows <- 20000
dt <- data.table(
newABUNDANCE = rnorm(n_rows, 20, 5),
censored = sample(c(TRUE, FALSE), n_rows, replace=TRUE, prob=c(0.3, 0.7)),
LABEL = "L", RUN = sample(1:20, n_rows, replace=TRUE), FEATURE = sample(1:500, n_rows, replace=TRUE),
n_obs = 5, n_obs_run = 5, cen = FALSE, ref = "ref"
)
dt$cen <- dt$censored
dt$newABUNDANCE[dt$censored] <- dt$newABUNDANCE[dt$censored] - 5

# Clear log file
file.create("parallel_log.txt")

cat("\n--- Simulating LEAKY Parallel Execution ---\n")
# We run 2 cores. Both will hit the 'sleep' at the same time.
# Both will report HIGH memory because they haven't cleaned up yet.
invisible(mclapply(1:2, function(i) MSstatsSummarizeSingleTMP_Leaky_Sim(copy(dt), TRUE, "NA", FALSE), mc.cores = 2))

cat("\n--- Simulating FIXED Parallel Execution ---\n")
# We run 2 cores. Both will hit the 'sleep' at the same time.
# Both will report LOW memory because they cleaned up BEFORE sleeping.
invisible(mclapply(1:2, function(i) MSstatsSummarizeSingleTMP_Fixed_Sim(copy(dt), TRUE, "NA", FALSE), mc.cores = 2))

cat("\n--- Log File Content ---\n")
cat(readLines("parallel_log.txt"), sep = "\n") No newline at end of file
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This script has no test_that() blocks or assertions — it isn't an actual test.

Since it lives under tests/testthat/, testthat will source() it during R CMD check, causing it to:

  1. Execute all top-level code unconditionally — including two mclapply calls with Sys.sleep(1) each, adding ~2+ seconds of wall-clock time per run.
  2. Allocate ~200 MB of bloated objects (numeric(12.5 * 1024 * 1024) ≈ 100 MB each, two workers) which could cause CI OOM failures.
  3. Write parallel_log.txt into the working directory — a side effect that violates testthat conventions (use withr::local_tempdir() or tempfile()).
  4. Silently degrade on Windowsmclapply with mc.cores > 1 falls back to serial on Windows, so the parallel memory behavior isn't actually exercised.
  5. Produce no pass/fail signal — without expect_*() calls, this provides no regression protection.

Additionally, this file appears unrelated to the stated PR objective (adding big-file DIANN support). If this is a diagnostic tool for a separate memory-leak investigation, consider moving it out of the test suite (e.g., into inst/scripts/ or a standalone vignette) or converting it into a proper test_that() with skip guards and assertions.

🤖 Prompt for AI Agents
In `@tests/testthat/test_parallel_simulation.R` around lines 1 - 159, This file is
not a real test and triggers heavy work/side-effects during R CMD check:
top-level mclapply sleeps and allocates large objects, writes parallel_log.txt,
and has no expectations; fix by either moving the script out of tests/testthat
(e.g., inst/scripts/) or converting it into a proper test_that block that (1)
wraps execution in testthat::skip_on_ci() and testthat::skip_on_os("windows") or
similar guards, (2) uses withr::local_tempdir()/tempfile() for any log files
instead of writing to working dir, (3) reduces or parameterizes the heavy
allocation/sleep (or mock .fitSurvival to avoid large numeric vectors) and (4)
includes explicit expect_* assertions (or expect_silent/expect_true) to produce
pass/fail signals; update functions MSstatsSummarizeSingleTMP_Leaky_Sim and
MSstatsSummarizeSingleTMP_Fixed_Sim usage in the test accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be in this PR

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rudhik1904, understood! You can remove it from this PR by running the following command on your branch:

git rm tests/testthat/test_parallel_simulation.R
git commit -m "Remove test_parallel_simulation.R (not part of this PR)"
git push

If this script was intended for a separate memory-leak investigation, consider placing it in inst/scripts/ or a dedicated branch so it doesn't interfere with the DIANN big-file changes here.

Would you like me to open a follow-up GitHub issue to track the memory-leak investigation (and properly convert this into a guarded diagnostic tool) so it doesn't get lost?

dt$newABUNDANCE[dt$censored] <- dt$newABUNDANCE[dt$censored] - 5

# Clear log file
file.create("parallel_log.txt")
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

File I/O side effect: parallel_log.txt written to working directory.

file.create("parallel_log.txt") and the subsequent cat(..., file = "parallel_log.txt", append = TRUE) calls (lines 64, 114) write to the current working directory. During R CMD check, this is typically a temp directory managed by the test harness, but it still leaves artifacts. Use tempfile() and pass the path explicitly, or use withr::local_tempfile() if this is converted to a proper test.

🤖 Prompt for AI Agents
In `@tests/testthat/test_parallel_simulation.R` at line 146, The test currently
creates and writes to a hard-coded file "parallel_log.txt" (via file.create and
subsequent cat(..., file = "parallel_log.txt", append = TRUE)), causing unwanted
I/O side effects; update the test to generate a temp path (e.g. tempfile() or
withr::local_tempfile()) and use that variable for all file.create and cat calls
(and any reads) so no artifact is left in the working directory—ensure every
reference to "parallel_log.txt" in the test (file.create, cat, etc.) is replaced
with the temp file variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rudhik1904, here's the fix — define a log_file variable using tempfile() at the top of the simulation section and replace every "parallel_log.txt" reference with it:

# -------------------------------------------------------------------------
# 3. Run Simulation
# -------------------------------------------------------------------------

# Use a temp file to avoid leaving artifacts in the working directory
log_file <- tempfile(pattern = "parallel_log", fileext = ".txt")
file.create(log_file)

Then in both MSstatsSummarizeSingleTMP_Leaky_Sim and MSstatsSummarizeSingleTMP_Fixed_Sim, pass log_file as a parameter instead of hard-coding the path:

MSstatsSummarizeSingleTMP_Leaky_Sim <- function(
    single_protein, impute, censored_symbol, remove50missing,
    aft_iterations = 90, log_file = NULL   # <-- add parameter
) {
  ...
  cat(msg, file = log_file, append = TRUE)   # replace "parallel_log.txt"
  ...
}

MSstatsSummarizeSingleTMP_Fixed_Sim <- function(
    single_protein, impute, censored_symbol, remove50missing,
    aft_iterations = 90, log_file = NULL   # <-- add parameter
) {
  ...
  cat(msg, file = log_file, append = TRUE)   # replace "parallel_log.txt"
  ...
}

And update the mclapply calls to forward log_file:

mclapply(split(dt, by = "RUN")[1:2], MSstatsSummarizeSingleTMP_Leaky_Sim,
         impute = TRUE, censored_symbol = 0, remove50missing = FALSE,
         log_file = log_file,   # <-- pass temp path
         mc.cores = 2)

mclapply(split(dt, by = "RUN")[1:2], MSstatsSummarizeSingleTMP_Fixed_Sim,
         impute = TRUE, censored_symbol = 0, remove50missing = FALSE,
         log_file = log_file,   # <-- pass temp path
         mc.cores = 2)

Finally, update the log-read at the end:

cat(readLines(log_file), sep = "\n")

This ensures no parallel_log.txt artifact is left behind in the working directory after the test runs.

@Rudhik1904
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
R/module-loadpage-ui.R (1)

309-393: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hide the standard LF controls in DIANN large-file mode.

The new DIANN large-file UI adds its own MBR/filter settings, but the generic label-free panels still stay visible because only big_file_spec is excluded elsewhere. In DIANN large-file mode that leaves duplicate/conflicting controls on screen, and some of them are ignored by getData().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/module-loadpage-ui.R` around lines 309 - 393, When DIANN large-file mode
(input big_file_diann) is selected the standard label-free controls must be
hidden to avoid duplicate/conflicting UI; wrap the standard DIANN LF panels (the
UI elements produced by create_diann_standard_ui and the generic label-free
panels that show filter/aggregation inputs) in a conditionalPanel that checks
!input[ns("big_file_diann")] (or use paste0("!input['", ns("big_file_diann"),
"']") to respect namespacing), so that when create_diann_large_filter_options /
create_diann_large_bottom_ui are shown the standard LF controls are hidden;
alternatively implement an observer that toggles visibility of those standard
panel outputs based on input$big_file_diann.
R/utils.R (1)

551-564: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap the converter call in tryCatch to prevent spinner from being left dangling on converter failure.

The tryCatch() currently only protects dplyr::collect(). If MSstatsBig::bigSpectronauttoMSstatsFormat() or MSstatsBig::bigDIANNtoMSstatsFormat() throws an error, execution exits before the if (is.null(mydata)) block that removes the spinner, leaving the UI stuck in a busy state. Move both the converter call and collect() inside the tryCatch block.

Also applies to: 640-651

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils.R` around lines 551 - 564, Wrap the calls to
MSstatsBig::bigSpectronauttoMSstatsFormat and
MSstatsBig::bigDIANNtoMSstatsFormat together with the subsequent
dplyr::collect() inside a single tryCatch so any error from the converter or
collect is caught; on error return NULL (or a sentinel) and ensure the existing
if (is.null(mydata)) block still runs to remove the spinner and handle failure.
Specifically, replace the direct calls to bigSpectronauttoMSstatsFormat /
bigDIANNtoMSstatsFormat followed by dplyr::collect() with a tryCatch that
executes both the converter call and collect in the try branch and returns NULL
in the error branch, and apply the same change to the similar DIANN conversion
block.
tests/testthat/test-utils.R (1)

1441-1515: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the big-file tests when MSstatsBig is unavailable.

These tests will fail in CI runs without suggested packages installed because getData() checks requireNamespace("MSstatsBig") before reaching any of your function stubs. When the package is unavailable, the function returns NULL early, and your stubs never execute. Add skip_if_not_installed("MSstatsBig") at the start of each describe block, or stub requireNamespace() to return TRUE.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/testthat/test-utils.R` around lines 1441 - 1515, The tests for big
Spectronaut fail when MSstatsBig isn't installed because getData() checks
requireNamespace("MSstatsBig") before your stubs run; to fix, guard the describe
block by calling skip_if_not_installed("MSstatsBig") at its start (or
alternatively stub getData's call to requireNamespace to return TRUE) so the
tests run only when MSstatsBig is available and your stubs (e.g., stubbing
MSstatsBig::bigSpectronauttoMSstatsFormat, dplyr::collect, showNotification,
shinybusy::update_modal_spinner, shinybusy::remove_modal_spinner) will be
executed.
♻️ Duplicate comments (1)
R/utils.R (1)

962-975: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the DIANN annotation argument to the generated script.

The runtime large-file DIANN path passes annotation = getAnnot(input), but the exported code omits it. That makes the generated script diverge from the analysis the app actually ran.

Proposed fix
         codes = paste(codes, "library(MSstatsBig)\n", sep = "")
         codes = paste(codes, "data = MSstatsBig::bigDIANNtoMSstatsFormat(\n", sep = "")
         codes = paste(codes, "  input_file = \"insert your large DIANN file path\",\n", sep = "")
+        codes = paste(codes, "  annotation = read.csv(\"insert your annotation filepath\"),\n", sep = "")
         codes = paste(codes, "  output_file_name = \"output_file.csv\",\n", sep = "")
         codes = paste(codes, "  backend = \"arrow\",\n", sep = "")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils.R` around lines 962 - 975, The generated script for big DIANN
conversion omits the annotation argument used at runtime; update the code that
builds the call to MSstatsBig::bigDIANNtoMSstatsFormat (inside the
input$big_file_diann branch) to include annotation = getAnnot(input) as an
argument (alongside existing args like input_file, output_file_name, backend,
MBR, quantificationColumn resolved via resolveDiannQuantCol(input),
max_feature_count, filter_unique_peptides, aggregate_psms, filter_few_obs) so
the exported script matches the actual runtime call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@R/module-loadpage-ui.R`:
- Around line 309-393: When DIANN large-file mode (input big_file_diann) is
selected the standard label-free controls must be hidden to avoid
duplicate/conflicting UI; wrap the standard DIANN LF panels (the UI elements
produced by create_diann_standard_ui and the generic label-free panels that show
filter/aggregation inputs) in a conditionalPanel that checks
!input[ns("big_file_diann")] (or use paste0("!input['", ns("big_file_diann"),
"']") to respect namespacing), so that when create_diann_large_filter_options /
create_diann_large_bottom_ui are shown the standard LF controls are hidden;
alternatively implement an observer that toggles visibility of those standard
panel outputs based on input$big_file_diann.

In `@R/utils.R`:
- Around line 551-564: Wrap the calls to
MSstatsBig::bigSpectronauttoMSstatsFormat and
MSstatsBig::bigDIANNtoMSstatsFormat together with the subsequent
dplyr::collect() inside a single tryCatch so any error from the converter or
collect is caught; on error return NULL (or a sentinel) and ensure the existing
if (is.null(mydata)) block still runs to remove the spinner and handle failure.
Specifically, replace the direct calls to bigSpectronauttoMSstatsFormat /
bigDIANNtoMSstatsFormat followed by dplyr::collect() with a tryCatch that
executes both the converter call and collect in the try branch and returns NULL
in the error branch, and apply the same change to the similar DIANN conversion
block.

In `@tests/testthat/test-utils.R`:
- Around line 1441-1515: The tests for big Spectronaut fail when MSstatsBig
isn't installed because getData() checks requireNamespace("MSstatsBig") before
your stubs run; to fix, guard the describe block by calling
skip_if_not_installed("MSstatsBig") at its start (or alternatively stub
getData's call to requireNamespace to return TRUE) so the tests run only when
MSstatsBig is available and your stubs (e.g., stubbing
MSstatsBig::bigSpectronauttoMSstatsFormat, dplyr::collect, showNotification,
shinybusy::update_modal_spinner, shinybusy::remove_modal_spinner) will be
executed.

---

Duplicate comments:
In `@R/utils.R`:
- Around line 962-975: The generated script for big DIANN conversion omits the
annotation argument used at runtime; update the code that builds the call to
MSstatsBig::bigDIANNtoMSstatsFormat (inside the input$big_file_diann branch) to
include annotation = getAnnot(input) as an argument (alongside existing args
like input_file, output_file_name, backend, MBR, quantificationColumn resolved
via resolveDiannQuantCol(input), max_feature_count, filter_unique_peptides,
aggregate_psms, filter_few_obs) so the exported script matches the actual
runtime call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8fbfe58-7836-4587-9187-732f45c4446a

📥 Commits

Reviewing files that changed from the base of the PR and between 6d06c88 and f90620a.

📒 Files selected for processing (5)
  • DESCRIPTION
  • R/module-loadpage-server.R
  • R/module-loadpage-ui.R
  • R/utils.R
  • tests/testthat/test-utils.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • R/module-loadpage-server.R

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants