65 lines
2.7 KiB
Plaintext
65 lines
2.7 KiB
Plaintext
References:
|
|
|
|
======================
|
|
File formats
|
|
|
|
|
|
The introduction of the SAM/BAM format and the samtools command line tool:
|
|
|
|
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9 [19505943]
|
|
|
|
|
|
Extension of the SAM/BAM format to support de novo assemblies:
|
|
|
|
Cock PJA, Bonfield JK, Chevreux B, Li H, SAM/BAM format v1.5 extensions for de novo assemblies, bioRxiv (2015) 020024 [doi:10.1101/020024]
|
|
|
|
|
|
The introduction of the CRAM format:
|
|
|
|
Hsi-Yang Fritz M, Leinonen R, Cochrane G, and Birney E, Efficient storage of high throughput DNA sequencing data using reference-based compression, Genome Research (2011) 21(5) 734-740.
|
|
The introduction of the VCF format:
|
|
|
|
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group, The variant call format and VCFtools, Bioinformatics (2011) 27(15) 2156-8
|
|
|
|
======================
|
|
Calling and analysis
|
|
|
|
|
|
The original mpileup calling algorithm plus mathematical notes (mpileup/bcftools call -c):
|
|
|
|
Li H, A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data, Bioinformatics (2011) 27(21) 2987-93.
|
|
Li H, Mathematical Notes on SAMtools Algorithms (2010)
|
|
|
|
|
|
Mathematical notes for the updated multiallelic calling model (mpileup/bcftools call -m):
|
|
|
|
Danecek P, Schiffels S, and Durbin R, Multiallelic calling model in bcftools (-m) (2014)
|
|
|
|
|
|
Hidden Markov model for detecting runs of homozygosity (bcftools roh):
|
|
|
|
Narasimhan V, Danecek P, Scally A, Xue Y, Tyler-Smith C, and Durbin R, BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data, Bioinformatics (2016) 32(11) 1749-51
|
|
|
|
|
|
Copy number variation/aneuploidy calling from microarray data (bcftools cnv/bcftools polysomy):
|
|
|
|
Danecek P, McCarthy SA, HipSci Consortium, and Durbin R, A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data, PLoS One (2016) 11(5) e0155014
|
|
|
|
|
|
Haplotype-aware calling of variant consequences (bcftools csq):
|
|
|
|
Danecek P, McCarthy SA, BCFtools/csq: Haplotype-aware variant consequences, Bioinformatics (2017) 33(13) 2037-39
|
|
|
|
======================
|
|
Other
|
|
|
|
|
|
Base alignment quality (BAQ) method improve SNP calling around INDELs:
|
|
|
|
Li H, Improving SNP discovery by base alignment quality, Bioinformatics (2011) 27(8) 1157-8
|
|
|
|
|
|
Segregation based QC metric originally implemented in SGA:
|
|
|
|
Durbin R, Segregation based metric for variant call QC (2014)
|