scripts/: containing bash and SLURM scripts for scRNA analysis, database download, and other utilitiesdownload_refgenome.sh: script for downloading reference genomes, transcriptomes, and GTFsdownload_sra.sbatch: SLURM script to download SRA samples from NCBIfastqc.sbatch: SLURM script to run FastQCstar_index.sh: bash script to index reference genome with STARstar_align.sbatch: SLURM script to map reads to the indexed reference genome and run STARSolo
Other files in the repository:
setup.sh: bash script to set up container and default pathsDockerfile: Docker image to build containerREADME.md: project READMENotes.md: running documentation
-
download_refgenome.sh./download_refgenome.sh <output_directory_path>
-
download_sra.sbatchsbatch download_sra.sh SRP SRP266243 $RAW_DATA/sra -
fastqc.sbatchsbatch fastqc.sbatch $RAW_DATA/sra/SRP266243/SRR11940660 \ $PROCESSED_DATA/SRP266243/SRR11940660/fastqc_outs
-
star_index.sh./star_index.sh <input_fasta> <input_gtf> <output_folder> # Example: ./star_index.sh $ECOLI_REF_GENOME $ECOLI_GTF \ $PROCESSED_DATA/ref_index/STAR_ecoli_genome_index
-
star_align.sbatch# Usage: sbatch star_align.sbatch \ $PROCESSED_DATA/ref_index/STAR_bsub_genome_index \ $RAW_DATA/sra/SRP266243/SRR11940660/SRR11940660_1.fastq \ $RAW_DATA/sra/SRP266243/SRR11940660/SRR11940660_2.fastq \ $RAW_DATA/barcodes/SRP266243/barcodes_r2_r3_solo.txt \ $RAW_DATA/barcodes/SRP266243/barcodes_r1_solo.txt \ $PROCESSED_DATA/SRP266243/SRR11940660/STAR_Bsub_ref_outs
scRNAseq/
-
envs/: directory containing environment filesscrna_latest.sif: singularity image file
-
processed_data/($PROCESSED_DATA): contains analysis outputsref_index/: indexed reference genomesSTAR_alt_bsub_genome_index:star_index.shoutput for reference genome of B.subtilis PY79STAR_bsub_genome_index:star_index.shoutput for reference genome of B.subtilis strain 168STAR_ecoli_genome_index:star_index.shoutput for reference genome of E.coli K-12 substr. MG1655
SRP266243/: outputs for SRP266243SRR11940660/: outputs for SRR11940660 runsfastqc_outs:fastqc.sbatchoutputSTAR_alt_bsub_ref_outs: STAR outputs (aligning reads toSTAR_alt_bsub_genome_index)STAR_Bsub_ref_outs: STAR outputs (aligning reads toSTAR_bsub_genome_index)
-
raw_data/($RAW_DATA): containing raw data filesbarcodes/: barcodes from Supplementary information from this paperSRP266243/
references/bsubtilis: reference genomes/transcriptomes/gtf files of B.subtilis strain 168 and PY79ecoli: reference genomes/transcriptomes/gtf files of E.coli K-12 substr. MG1655
sra/SRP266243/: this project has 3 runsSRR11940660: B.subtilis rep 1SRR11940661: B.subtilis rep 2SRR11940662: B.subtilis + E.coli
-
Run
setup.shscript to set up environment variables and singularity run -
Prepare necessary files in
$RAW_DATA
Download the reference files:
./download_refgenome.sh $RAW_DATADownload SRA reads:
sbatch download_sra.sh SRP SRP266243 $RAW_DATA/sraCopy barcodes from Supplementary information to text files in $RAW_DATA/barcodes
- Convert file formats
gffread $BSUB_PY79_GFF -g $BSUB_PY79_REF_GENOME -T -v -o $BSUB_PY79_GTF- Run
fastqc.sbatchon SRR11940660 (B. subtilis experiment):
sbatch fastqc.sbatch $RAW_DATA/sra/SRP266243/SRR11940660 \
$PROCESSED_DATA/SRP266243/SRR11940660/fastqc_outs- Index reference genomes
E.coli:
./star_index.sh $ECOLI_REF_GENOME $ECOLI_GTF \
$PROCESSED_DATA/ref_index/STAR_ecoli_genome_indexB.subtilis strain 168:
./star_index.sh $BSUB_REF_GENOME $BSUB_GTF \
$PROCESSED_DATA/ref_index/STAR_bsub_genome_indexB.subtilis strain PY79:
./star_index.sh $BSUB_PY79_REF_GENOME $BSUB_PY79_GTF \
$PROCESSED_DATA/ref_index/STAR_alt_bsub_genome_index- STARsolo align
B.subtilis strain 168:
sbatch star_align.sbatch \
$PROCESSED_DATA/ref_index/STAR_bsub_genome_index \
$RAW_DATA/sra/SRP266243/SRR11940660/SRR11940660_1.fastq \
$RAW_DATA/sra/SRP266243/SRR11940660/SRR11940660_2.fastq \
$RAW_DATA/barcodes/SRP266243/barcodes_r2_r3_solo.txt \
$RAW_DATA/barcodes/SRP266243/barcodes_r1_solo.txt \
$PROCESSED_DATA/SRP266243/SRR11940660/STAR_Bsub_ref_outsB.subtilis strain PY79:
sbatch star_align.sbatch \
$PROCESSED_DATA/ref_index/STAR_alt_bsub_genome_index \
$RAW_DATA/sra/SRP266243/SRR11940660/SRR11940660_1.fastq \
$RAW_DATA/sra/SRP266243/SRR11940660/SRR11940660_2.fastq \
$RAW_DATA/barcodes/SRP266243/barcodes_r2_r3_solo.txt \
$RAW_DATA/barcodes/SRP266243/barcodes_r1_solo.txt \
$PROCESSED_DATA/SRP266243/SRR11940660/STAR_alt_bsub_ref_outs