Hi all, I am preparing a course on NGS: there will be seven students for 4 hours an I want them to play with some NGS data. No programming skill is required here.

Here is what I plan to do:

  • short into to NGS
  • struture of a (small) FASTQ file
  • map it with BWA on public Galaxy http://main.g2.bx.psu.edu/
  • index the genome and map the fastqs with MAQ
  • index the genome and map the fastqs with BWA
  • structure of a SAM file
  • GATK recalibration (?)
  • call the SNPs with samtools pileup and generate a VCF
  • explore a BAM file with samtools tview
  • find the rs## at UCSC (table browser or mysql )
  • predict the consequences of a set of SNPs with polyphen2 (btw is there a way to generate a random fastq file with a set of 'forced' mutations ?)

my other ideas:

  • running something in the clouds: do you know if there is a way to run something for free on Amazon ? what kind of analysis could I run ?
  • storing something (the VCF ?) in a database (mysql ? sqlite3 ?) and using rails to display the data
  • generating the tool for using a webservice (SOAP/REST...): what service could I use for this course ?

Any other suggestion ? What would you like to see during this course ?

I'll validate the answer with highest number of votes next week.

Thanks,

Pierre

EDIT: the course should give them the opportunity see what would look the work of someone working with NGS and to have an experience with some real data. I don't know their skill but AFAIK, there are supposed to have some programming courses later.

My only experience is the analysis of "exome capture" data = SNP.

Update: I posted my slides on slideshare: http://www.slideshare.net/lindenb/20101210-ngscourse

asked 21 May '12, 16:40

Customer's gravatar image

Customer
296
accept rate: 0%


Titus Brown at Michigan State University has run a course on Analyzing Next Generation Sequencing Data and as the link shows he has built an amazing resource around it.

His tutorials might give you a good sense on what topics to include and what level of detail may be appropriate.

link

answered 21 May '12, 16:40

SupportRep's gravatar image

SupportRep
2314
accept rate: 32%

and of course drop me a note if you discover bugs, problems, etc. And re-use the material as much as you want

(21 May '12, 16:40) SupportRep
  1. Drop maq as it is not widely used as before.
  2. Choose between Galaxy and command line, depending on which suits them best. It should not be hard to find the consensus of 7 students. Similar to cloud computing.
  3. I do not work with raw data now. I would guess Illumina base quality should be better than before. In that case, I am not sure if recalibration is absolutely necessary.
  4. Introduce IGV instead of samtools' tview. Although tview is useful in a few scenarios, IGV is in general more powerful and user friendly. IGV works with VCF. No need to set up database or services.
  5. I know SQL well, but except for setting up a serious web server, I never use it. SQL is overkilling. Those graphic viewers and the UCSC custom track are much more convenient.
  6. Mention Picard and GATK, which are both great packages.
  7. I do not know how serious duplicates are affecting results, but you should mention this is a potential concern.
  8. As others suggested, it would be good to introduce ChIP/RNA-seq and the discovery of structural variations even if these are not the main purposes.

It is important to let students play with real or simulated data.

EDIT: a further comment:

I used to give a two-hour course on variant discovery. I gave each attendee a tar-ball which includes a bacterial genome (S. suis), a variant/read simulator (wgsim), a mapper (bwa), a SNP caller (samtools) and a few scripts. It is only a couple of MB in size, suitable for email. With these, one can do simulation, mapping, SNP calling, visualization and evaluation, nearly the entire pipeline. I have lost the tar-ball, though.

link

answered 21 May '12, 16:42

SupportRep's gravatar image

SupportRep
2314
accept rate: 32%

UCSC custom track is another nice idea. Thanks

(21 May '12, 16:42) Customer

There are also a number of existing tracks--especially in the ENCODE data--that show NGS data types. A survey of a few of those might help people to grasp some of the aspects and challenges. But I think we've already gone way over 4 hours...

(21 May '12, 16:42) SupportRep
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×6
×2
×2

Asked: 21 May '12, 16:40

Seen: 665 times

Last updated: 21 May '12, 16:42

powered by OSQA