40 lines
1.9 KiB
Plaintext
40 lines
1.9 KiB
Plaintext
This is Ugene's (http://ugene.net/) fork of the CLARK tool
|
|
(http://clark.cs.ucr.edu/Tool/), with supports building DB directly from
|
|
gzip & 7z packed RefSeq files
|
|
|
|
CLARK: CLAssifier based on Reduced K-mers
|
|
|
|
The problem of DNA sequence classification is central to several
|
|
application domains in molecular biology, genomics, metagenomics and
|
|
genetics. The problem is computationally challenging due to the size of
|
|
datasets generated by modern sequencing instruments and the growing size
|
|
of reference sequence databases.
|
|
|
|
CLARK is a novel method for supervised sequence classification based on
|
|
discriminative k-mers. Somewhat unique among other metagenomic and
|
|
genomic classification methods, CLARK provides a confidence score for
|
|
its assignments which can be used in downstream analysis. The utility of
|
|
CLARK is demonstrated on two distinct specific classification problems:
|
|
|
|
1) the assignment of metagenomic reads to known bacterial genomes
|
|
2) the assignment of BAC clones and transcript to chromosome arms (in
|
|
the absence of a finished assembly for the reference genome).
|
|
|
|
Three classifiers or variants in the CLARK framework are provided :
|
|
CLARK (default): created for powerful workstation, it may require a
|
|
significant amount of RAM to run with large database (e.g., all
|
|
bacterial genomes from NCBI/RefSeq). This classifier queries k-mers
|
|
with exact matching.
|
|
|
|
CLARK-l (light): created for workstations with limited memory, this
|
|
software tool provides precise classification on small metagenomes.
|
|
Indeed, for metagenomics analysis, CLARK-l works with a sparse or
|
|
"light" database (up to 4 GB of RAM) that is built using distant and
|
|
non-overlapping k-mers. This classifier queries k-mers with exact
|
|
matching.
|
|
|
|
CLARK-S (spaced): created for powerful workstation exploiting spaced k-
|
|
mers, this classifier requires a higher RAM usage than CLARK or CLARK-l,
|
|
but it does offer a higher sensitivity. CLARK-S completes the CLARK
|
|
series of classifiers.
|