SANITY
SANITY (SAmpling Noise corrected Inference of
Transcription activitY) is a unique Bayesian procedure
for normalizing single-cell RNA-seq data. SANITY estimates log
expression values and associated errors bars directly from
raw UMI counts without any tunable parameters.
Download
SANITY source code and installation
instructions are available on GitHub.
Motevo
We present MotEvo, a integrated suite of Bayesian probabilistic methods for
the prediction of TFBSs and inference of regulatory motifs from multiple
alignments of phylogenetically related DNA sequences which incorporates all
features just mentioned. In addition, MotEvo incorporates a novel model for
detecting unknown functional elements that are under evolutionary
constraint, and a new robust model for treating gain and loss of TFBSs
along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show
that MotEvos novel features significantly improve the accuracy of TFBS
prediction, motif inference, and enhancer prediction.
Download:
Source,
Linux binary,
Mac binary
DWT-toolbox
The DWT-toolbox is a collection of software tools for performing
motif finding and transcription factor binding site (TFBS)
predictions with Dinucleotide Weight Tensors (DWTs). Besides a
motif finder, and a program for predicting TFBSs with a given
DWT in a given set of sequences, the toolbox also includes a
program for constructing dilogos that visualize DWT motifs.
Download DWT-toolbox DWT-toolbox online tool
PhyloGibbs
PhyloGibbs is an algorithm for discovering regulatory sites in a collection
of DNA sequences, including multiple alignments of orthologous sequences
from related organisms. Many existing approaches to either search for
sequence-motifs that are overrepresented in the input data, or for
sequence-segments that are more conserved evolutionary than
expected. PhyloGibbs combines these two approaches and identifies
significant sequence-motifs by taking both over-representation and
conservation signals into account.
Download PhyloGibbs
PROCSE
Using the assumption that regulatory sites can be represented as samples
from weight matrices (WMs), we derive a unique probability distribution for
assignments of sites into clusters. Our algorithm, PROCSE (probabilistic
clustering of sequences), uses Monte Carlo sampling of this distribution to
partition and align thousands of short DNA sequences into clusters. The
algorithm internally determines the number of clusters from the data and
assigns significance to the resulting clusters.
Download PROCSE
STUBB
We develop a computational method that uses Hidden Markov Models and an
Expectation Maximization algorithm to detect cis-regulatory modules, given
the weight matrices of a set of transcription factors known to work
together. Two novel features of our probabilistic model are: (i)
correlations between binding sites, known to be required for module
activity, are exploited, and (ii) phylogenetic comparisons among sequences
from multiple species are made to highlight a regulatory module. The novel
features are shown to improve detection of modules, in experiments on
synthetic as well as biological data.
Download STUBB
SPA
Spa is a computer program for aligning cDNA sequences to a genome. It uses
a probabilistic Bayesian model to find the optimal alignment. To keep
running times feasible we use the BLAT gfServer to identify genomic loci
and return the best mapping from these loci.
Download SPA