slackbuilds/academic/clark-ugene/README

40 lines
1.9 KiB
Plaintext

This is Ugene's (http://ugene.net/) fork of the CLARK tool
(http://clark.cs.ucr.edu/Tool/), with supports building DB directly from
gzip & 7z packed RefSeq files
CLARK: CLAssifier based on Reduced K-mers
The problem of DNA sequence classification is central to several
application domains in molecular biology, genomics, metagenomics and
genetics. The problem is computationally challenging due to the size of
datasets generated by modern sequencing instruments and the growing size
of reference sequence databases.
CLARK is a novel method for supervised sequence classification based on
discriminative k-mers. Somewhat unique among other metagenomic and
genomic classification methods, CLARK provides a confidence score for
its assignments which can be used in downstream analysis. The utility of
CLARK is demonstrated on two distinct specific classification problems:
1) the assignment of metagenomic reads to known bacterial genomes
2) the assignment of BAC clones and transcript to chromosome arms (in
the absence of a finished assembly for the reference genome).
Three classifiers or variants in the CLARK framework are provided :
CLARK (default): created for powerful workstation, it may require a
significant amount of RAM to run with large database (e.g., all
bacterial genomes from NCBI/RefSeq). This classifier queries k-mers
with exact matching.
CLARK-l (light): created for workstations with limited memory, this
software tool provides precise classification on small metagenomes.
Indeed, for metagenomics analysis, CLARK-l works with a sparse or
"light" database (up to 4 GB of RAM) that is built using distant and
non-overlapping k-mers. This classifier queries k-mers with exact
matching.
CLARK-S (spaced): created for powerful workstation exploiting spaced k-
mers, this classifier requires a higher RAM usage than CLARK or CLARK-l,
but it does offer a higher sensitivity. CLARK-S completes the CLARK
series of classifiers.