We will be getting data from human whole exome sequencing done on Illumina GAIIx. I am planning to put together a pipeline to align (BWA?)(optional as we may get the sequence already aligned)|variant detection (Samtools?)|variant classification-deleteriousness prediction(something like SIFT/PolyPhen)|. I know that there are various options out there but I would like to know based on your experiences what the best collection of these tools is? If you had the chance to start from scratch what kind of a pipeline would you put together? I also know that big sequencing centers have their in house tool sets for these for squeezing the last drop but what is available to the general public is what I am looking for.
asked 21 May '12, 17:03
Here is our basic pipeline. We are novices, but this seems to work for us. Please investigate all of the tools carefully, as (I'll repeat) I am not an expert in this analysis. Note that I am using mostly default values for the analysis; advice I gotten from experienced hands is it's routine to use the intersection of multiple SNP calling methods, and that indel calling is still an art. Note that this is set up for single reads, not paired reads, but the same basic pipeline should apply.
FOR EACH SAMPLE:
answered 21 May '12, 17:04
My this answer better serves as a comment to David Quigley's, which is great, but to emphasize the recent improvements, I still give it as separate answer.
Between step 7 and 8, it is recommended to add an additional step:
This will substantially improve SNP specificity.
A few other comments to David Quigley's pipeline are:
answered 21 May '12, 17:06