Description
The bio-research plugin's Python scripts have two defense-in-depth concerns in how they fetch and download FASTQ data from external APIs.
Severity: Low-Medium (not immediately exploitable, but worth hardening)
Issue 1: HTTP Protocol Downgrade on FASTQ Downloads (Medium)
File: bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py (line 343)
The ENA API is queried over HTTPS (line 314), but the actual FASTQ file downloads are forced to unencrypted HTTP:
# Line 343 — FTP paths from ENA converted to HTTP (not HTTPS)
urls = [f"http://{url}" for url in ftp_urls.split(';') if url]
A real ENA response returns values like ftp.sra.ebi.ac.uk/vol1/fastq/SRR635/000/SRR6357070/SRR6357070_1.fastq.gz, which becomes http://ftp.sra.ebi.ac.uk/....
Impact: FASTQ downloads (often multi-GB) happen over unencrypted HTTP. A network-level attacker could modify file contents in transit. While genomic data isn't secret, integrity matters for research reproducibility.
Fix: Change http:// to https:// on line 343. ENA supports HTTPS downloads.
Issue 2: No Domain Validation on Download URLs (Low)
File: bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py (lines 338-344)
The fastq_ftp field from the ENA API response is used to construct download URLs without validating that they point to known ENA/NCBI domains:
# Lines 338-344
ftp_urls = fields[ftp_idx]
if ftp_urls:
urls = [f"http://{url}" for url in ftp_urls.split(';') if url]
fastq_urls[srr] = urls
These URLs are then passed to download_file() which streams the response body to disk via requests.get(url, stream=True).
Impact: If the ENA API were ever compromised or its response tampered with, the code would fetch from arbitrary URLs and write content to disk. This is a defense-in-depth concern — the ENA query itself is over HTTPS (line 314), so MITM is not trivial.
Fix: Validate that download URLs match expected ENA domains (e.g., *.ebi.ac.uk, ftp.sra.ebi.ac.uk) before fetching.
Issue 3: Missing URL Encoding on API Parameters (Informational)
File: bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py (lines 99, 156, 212, 314)
User-supplied geo_id is interpolated into API URLs without urllib.parse.quote():
search_url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gds&term={geo_id}[Accession]&retmode=json"
Since this is a CLI tool where the user provides their own arguments, this is not exploitable in practice — but URL encoding is good hygiene.
What's NOT a vulnerability (correcting our original report)
- Output path (
--output): Our original report claimed this was "arbitrary file write." It's not — this is a CLI tool where the user supplies their own arguments. Normal CLI behavior, not a security issue.
- Compound attack scenario: Our original report chained HTTPS MITM + CLI argument control. This was unrealistic — each link requires conditions that make the chain implausible.
Suggested Fixes
- Line 343: Change
f"http://{url}" to f"https://{url}" (simplest, highest impact)
- Lines 338-344: Add domain allowlist check before downloading
- Lines 99, 156, 212, 314: Use
urllib.parse.quote() for geo_id/accession in URLs
Secure Patterns Already in Use (Credit)
- ✅ ENA API query is over HTTPS (line 314)
- ✅
yaml.safe_load() used correctly
- ✅
subprocess.run() uses list format, not shell=True
- ✅ No hardcoded secrets
- ✅ NCBI rate limiting properly enforced
🤖 Generated with Claude Code
Description
The
bio-researchplugin's Python scripts have two defense-in-depth concerns in how they fetch and download FASTQ data from external APIs.Severity: Low-Medium (not immediately exploitable, but worth hardening)
Issue 1: HTTP Protocol Downgrade on FASTQ Downloads (Medium)
File:
bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py(line 343)The ENA API is queried over HTTPS (line 314), but the actual FASTQ file downloads are forced to unencrypted HTTP:
A real ENA response returns values like
ftp.sra.ebi.ac.uk/vol1/fastq/SRR635/000/SRR6357070/SRR6357070_1.fastq.gz, which becomeshttp://ftp.sra.ebi.ac.uk/....Impact: FASTQ downloads (often multi-GB) happen over unencrypted HTTP. A network-level attacker could modify file contents in transit. While genomic data isn't secret, integrity matters for research reproducibility.
Fix: Change
http://tohttps://on line 343. ENA supports HTTPS downloads.Issue 2: No Domain Validation on Download URLs (Low)
File:
bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py(lines 338-344)The
fastq_ftpfield from the ENA API response is used to construct download URLs without validating that they point to known ENA/NCBI domains:These URLs are then passed to
download_file()which streams the response body to disk viarequests.get(url, stream=True).Impact: If the ENA API were ever compromised or its response tampered with, the code would fetch from arbitrary URLs and write content to disk. This is a defense-in-depth concern — the ENA query itself is over HTTPS (line 314), so MITM is not trivial.
Fix: Validate that download URLs match expected ENA domains (e.g.,
*.ebi.ac.uk,ftp.sra.ebi.ac.uk) before fetching.Issue 3: Missing URL Encoding on API Parameters (Informational)
File:
bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py(lines 99, 156, 212, 314)User-supplied
geo_idis interpolated into API URLs withouturllib.parse.quote():Since this is a CLI tool where the user provides their own arguments, this is not exploitable in practice — but URL encoding is good hygiene.
What's NOT a vulnerability (correcting our original report)
--output): Our original report claimed this was "arbitrary file write." It's not — this is a CLI tool where the user supplies their own arguments. Normal CLI behavior, not a security issue.Suggested Fixes
f"http://{url}"tof"https://{url}"(simplest, highest impact)urllib.parse.quote()for geo_id/accession in URLsSecure Patterns Already in Use (Credit)
yaml.safe_load()used correctlysubprocess.run()uses list format, notshell=True🤖 Generated with Claude Code