Genomics
quality control
version 0.11.4
A quality control tool for high throughput sequence data.
fastqc -h
Trimmomatic: A flexible read trimming tool for Illumina NGS
trimmomatic
A modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
under construcion
Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs (BUSCO)
BUSCO
remember to select appropriate database
genome and transcriptome assembly
version 1.7044
KmerGenie estimates the best k-mer length for genome de novo assembly.
version 3.10.1
SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
version 2.4.0
Trinity assembles transcript sequences from Illumina RNA-Seq data.
trinity
version 2.6.2
NOVOPlasty is a de novo assembler for short circular genomes.
novoplasty
DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies
DBG2OLC
AssemblyStatistics
SelectLongestReads
Sparc
SparseAssembler
dependencies
BLASR: The PacBio® long read aligner
A sequence consensus algorithm implementation based on using directed acyclic graphs to encode multiple sequence alignment
Canu 1.6
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).
canu
minimap version 0.2-r124-dirty
miniasm version 0.2-r168-dirty
Miniasm is a very fast OLC-based de novo assembler for noisy long reads.
minimap
miniasm
version 3.0
SSPACE is a script able to extend and scaffold pre-assembled contigs using one or more mate pairs or paired-end libraries, or even a combination.
perl /opt/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl
version 1.22
Pilon is a software tool which can be used to (1) automatically improve draft assemblies and (2) find variation among strains, including large event detection.
java -jar /opt/pilon-1.22.jar
version 4.5
QUAST evaluates genome assemblies
quast
CGAL is a tool for computing genome assembly likelihoods. It computes the likelihood of reads with respect to the assembly and a statistical model which can be used as a metric for evaluating assemblies.
unser construction
Version 2.04
Next generation sequencing reads de novo assembler.
version 4.0.0 (the newest version 4.0.2 is not working properly)
MIRA is a multi-pass DNA sequence data assembler/mapper for whole genome and EST/RNASeq projects.
cicuta only
vesrion 3
de novo short reads assembler
cicuta only
version 1.2.10
Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454.
cicuta only
GRAbB (Genome Region Assembly by Baiting) is program designed to assemble selected regions of the genome or transcriptome using reference sequences and NGS data.
grabb
cicuta only
version: 1.0.18
REAPR is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis-assemblies and other warnings, and produces a new broken assembly based on the error calls.
reapr
cicuta only
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
/opt/TransDecoder-TransDecoder-v5.0.2/TransDecoder.LongOrfs
mapping
version 2.2.6
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
version 2.1.0
TopHat is a fast splice junction mapper for RNA-Seq reads. Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 (32).
version 0.7.12-r1039
BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome.
version 2015-12-31
GMAP: a genomic mapping and alignment program for mRNA and EST sequences
version 2.5.0a
STAR: ultrafast universal RNA-seq aligner
STAR
version 2.1.0
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of genomes (as well as to a single reference genome).
hisat2
hisat2-align-s
hisat2-align-l
hisat2-build
hisat2-build-s
hisat2-build-l
hisat2-inspect
hisat2-inspect-s
hisat2-inspect-l
File processing
version 0.0.14
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
version 1.0-r31
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format.
seqtk
version 2.4.0
Bamtools is a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
Version: 1.6
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
version 2.25.0
PRINSEQ-lite 0.20.4
PRINSEQ will help you to preprocess your genomic or metagenomic sequence data in FASTA or FASTQ format
BBMap short read aligner, and other bioinformatic tools.
/opt/BBMap/
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.
Annotation
cicuta only
version open-1.0.11
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data.
RepeatModeler
version open-4.0.7
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).
RepeatMasker
Version: 2.0.6
CIRI: an efficient and unbiased algorithm for de novo circular RNA identification
perl /opt/CIRI_v2.0.6/CIRI2.pl
This software is designed to take the chimeric output from the STAR alignment tool and discover high confidence fusions and circular RNA in the data.