I have a set of illumina paired end reads (rone.fq and rtwo.fq), and I have mapped those reads to a reference (ref.fa).

bwa aln ref.fa rone.fq > rtwo.sai
bwa aln ref.fa rtwo.fq > rtwo.sai
bwa sampe ref.fa rone.sai rtwo.sai rone.fq rtwo.fq > aln.sam
#convert sam to bam
samtools view -bS aln.sam > aln.bam
#sort bam file
samtools view aln.bam aln_sorted

Now I am trying to get SNP and INDEL information using the following script

samtools mpileup -ugf ref.fa aln_sorted.bam >aln.bcf
#convert to vcf for viewing
bcftools view aln.bcf > aln.vcf

I get the following VCF (this is just one line of the VCF I actually get)

##fileformat=VCFv4.1
##samtoolsVersion=0.1.17 (r973:277)
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  aln_sorted.bam
gi|110227054|gb|AE004091.2| 32  .   A   C,X 0   .   DP=14;I16=9,1,1,0,352,12854,23,529,600,36000,29,841,159,3395,21,441 PL  0,10,189,30,192,202

I am assuming that samtools thinks this is a SNP, because the REF=A and ALT=C,X. My question is: How can I know the coverage depth of this SNP given this VCF? If this VCF does not have information enough to show the SNP coverage depth then how can I creat a VCF that shows that information? Thanks

asked 21 May '12, 17:12

Customer's gravatar image

Customer
295
accept rate: 0%


right from the VCF spec:

DP combined depth across samples, e.g. DP=154.

Samtools depth reports raw coverage. In the VCF file DP reports the coverage of all reads at that loci (that pass the quality filter). So 'Samtools depth' gives a different statistics than the DP field.

There is also information about the depth on each strand supporting the ref / alt.

IT is worth your time to carefully read the documentation for all the Samtools steps

VCF specs

link

answered 21 May '12, 17:12

SupportRep's gravatar image

SupportRep
2314
accept rate: 32%

edited 21 May '12, 17:12

I dont quite understand the DP value. For example if DP=14, does that mean that 14 reads support that polymorphism? If so 14 out of how many? I am confused because in position where samtools does not find SNPs the DP value would just mean the read coverage of that position. Thanks

(21 May '12, 17:13) Customer

Nope DP=14 means you have 14 reads that cover gi|110227054. The way you used samtools it will only report information on those loci that have a suggested mutation. use 'Samtools depth' if you want to know the depth for each loci

(21 May '12, 17:13) SupportRep

SO I ran "samtools depth aln_sorted.bam>aln_depth" and the info I get about position 32 is "gi|110227054|gb|AE004091.2| 32 33". So from what you are saying I get that the coverage of that base is 32X and the reads that support the SNP is 14? Thanks so much again

(21 May '12, 17:13) Customer

see my edits above.

(21 May '12, 17:13) SupportRep

Ok I get it. Why do you say it is an insertion? I thought that was a substitution (because of the comma)

(21 May '12, 17:13) Customer
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×4
×4
×2
×1
×1

Asked: 21 May '12, 17:12

Seen: 594 times

Last updated: 21 May '12, 17:13

powered by OSQA