Anna Karnkowska

Department of Molecular Phylogenetics and Evolution repository

Genomics

quality control

  1. fastqc

version 0.11.4

A quality control tool for high throughput sequence data.

fastqc -h

  1. trimmomatic

Trimmomatic: A flexible read trimming tool for Illumina NGS

trimmomatic

  1. blobtools

A modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets

under construcion

  1. BUSCO

Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs (BUSCO)

BUSCO

remember to select appropriate database

genome and transcriptome assembly

  1. kmergenie

version 1.7044

KmerGenie estimates the best k-mer length for genome de novo assembly.

  1. SPAdes

version 3.10.1

SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.

  1. trinity

version 2.4.0

Trinity assembles transcript sequences from Illumina RNA-Seq data.

trinity

  1. NOVOPlasty

version 2.6.2

NOVOPlasty is a de novo assembler for short circular genomes.

novoplasty

  1. DBG2OLC

DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

DBG2OLC AssemblyStatistics SelectLongestReads Sparc SparseAssembler

dependencies

blasr

BLASR: The PacBio® long read aligner

pbdgacon

A sequence consensus algorithm implementation based on using directed acyclic graphs to encode multiple sequence alignment

  1. CANU

Canu 1.6

Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).

canu

  1. minimap/miniasm

minimap version 0.2-r124-dirty

miniasm version 0.2-r168-dirty

Miniasm is a very fast OLC-based de novo assembler for noisy long reads.

minimap miniasm

  1. sspace

version 3.0

SSPACE is a script able to extend and scaffold pre-assembled contigs using one or more mate pairs or paired-end libraries, or even a combination.

perl /opt/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl

  1. Pilon

version 1.22

Pilon is a software tool which can be used to (1) automatically improve draft assemblies and (2) find variation among strains, including large event detection.

java -jar /opt/pilon-1.22.jar

  1. QUAST

version 4.5

QUAST evaluates genome assemblies

quast

  1. CGAL

CGAL is a tool for computing genome assembly likelihoods. It computes the likelihood of reads with respect to the assembly and a statistical model which can be used as a metric for evaluating assemblies.

unser construction

  1. SOAPdenovo

Version 2.04

Next generation sequencing reads de novo assembler.

  1. MIRA

version 4.0.0 (the newest version 4.0.2 is not working properly)

MIRA is a multi-pass DNA sequence data assembler/mapper for whole genome and EST/RNASeq projects.

cicuta only

  1. edena

vesrion 3

de novo short reads assembler

cicuta only

  1. velvet

version 1.2.10

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454.

cicuta only

  1. GRAbB

GRAbB (Genome Region Assembly by Baiting) is program designed to assemble selected regions of the genome or transcriptome using reference sequences and NGS data.

grabb

cicuta only

  1. REAPR

version: 1.0.18

REAPR is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis-assemblies and other warnings, and produces a new broken assembly based on the error calls.

reapr

cicuta only

  1. TransDecoder

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

/opt/TransDecoder-TransDecoder-v5.0.2/TransDecoder.LongOrfs

mapping

  1. bowtie2

version 2.2.6

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

  1. tophat

version 2.1.0

TopHat is a fast splice junction mapper for RNA-Seq reads. Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 (32).

  1. bwa

version 0.7.12-r1039

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

  1. gmap

version 2015-12-31

GMAP: a genomic mapping and alignment program for mRNA and EST sequences

aligner tutorial

  1. STAR

version 2.5.0a

STAR: ultrafast universal RNA-seq aligner

STAR

  1. hisat2

version 2.1.0

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of genomes (as well as to a single reference genome).

hisat2 hisat2-align-s hisat2-align-l hisat2-build hisat2-build-s hisat2-build-l hisat2-inspect hisat2-inspect-s hisat2-inspect-l

File processing

  1. fastx-toolkit

version 0.0.14

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

  1. seqtk

version 1.0-r31

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format.

seqtk

  1. bamtools

version 2.4.0

Bamtools is a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.

  1. samtools

Version: 1.6

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

samtools cheatsheet

  1. bedtools

version 2.25.0

  1. prinseq

PRINSEQ-lite 0.20.4

PRINSEQ will help you to preprocess your genomic or metagenomic sequence data in FASTA or FASTQ format

  1. BBMap

BBMap short read aligner, and other bioinformatic tools.

/opt/BBMap/

  1. bcftools

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.

Annotation

cicuta only

  1. RepeatModeler

version open-1.0.11

RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data.

RepeatModeler

  1. RepeatMasker

version open-4.0.7

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).

RepeatMasker

  1. CIRI

Version: 2.0.6

CIRI: an efficient and unbiased algorithm for de novo circular RNA identification

perl /opt/CIRI_v2.0.6/CIRI2.pl

  1. STARChip

This software is designed to take the chimeric output from the STAR alignment tool and discover high confidence fusions and circular RNA in the data.