HUSAR Bioinformatics Lab
Deutsches Krebsforschungszentrum Genomics Proteomics Core Facility
Online Exercises of the Advanced Course to HUSAR:

Content:

  1. Principles of Pairwise Sequence Comparison
  2. Methods of Sequence Alignment
  3. Multiple Alignments
  4. Gene prediction I
  5. Gene Prediction II
  6. Phylogeny I
  7. Phylogeny II
  8. Phylogeny III
  9. Database Searching I
  10. Database Searching II
  11. Protein Analysis I
  12. Protein Analysis II - Sorry, but this part is still under construction!
  13. Fragment Assembly - Sorry, but this part is still under construction!

Check this box to get HelpCards in multiple windows

1. Principles of Pairwise Sequence Comparison

  1. Comparing two homologous sequences:
    Run the HUSAR application GAP to compare the human sequence SW:P00846 and the fungus sequence SW:Q36918. The two sequences are chains of the H+-ATPase, or proton-pump, used to convert energy to ATP in the mitochrondria and chloroplasts of aerobic organisms. Run Gap a second time with both sequences. Use this time the substitition matrix PAM250. Why do you get different results?
  2. Comparing two non-homologous sequences:
    Run the HUSAR application GAP to compare the human ATPase sequence SW:P00846 and SW:o21326 (NADH Oxidoreductase)
  3. Compare the sequences se57 and se57.chrom using the programs SIM4 and BL2SEQ. What can you see in the outputs?
UP

 


2. Methods of Sequence Alignment

  1. Copy the sequence of rhodopsin_frag.sw to your home directory.
  2. Use the HUSAR application GAP to compare rhodopsin_frag.sw and remote sequence SW:Q17292.
  3. Use the HUSAR application BESTFIT to compare rhodopsin_frag.sw and remote sequence SW:Q17292.
  4. Use the HUSAR application SIMILARITY to compare rhodopsin_frag.sw and remote sequence SW:Q17292.
  5. Compare the results of the three applications.
UP

 


3. Multiple Alignments

  1. Create a multiple alignment using Clustal on all swissprot-entries starting with hba1_ . These are all sequences of heatshock-proteins.
  2. Fetch the following sequences from the EMBL database: X75263, X74604, X77199. These sequences are rather different in size, therefore use MALIGN to perform a multiple global alignment.
  3. With the same 3 sequences as before use MUSCLE (parameter max iterations = 4) to perform a multiple global alignment. Use fetch to read the annotation about the Hsc70 coding sequence (cds = coding sequence) and compare the coding parts of Hsc70 in the alignments resulting from MALIGN and MUSCLE.
  4. Whereas the Smith-Waterman algorithm only finds the one single best local alignment, dialign can include several regions of similarity in the alignment. This is an important feature if, for example, coding regions of DNA sequences are separated by introns. Fetch the following sequences from the swissprot database: hba_human, hbb_human, hbg_human, hbrh_chick, glb_aplku, glb_nasmu. Run dialign using these sequences.
UP

 


4. Gene prediction I

  1. Copy the sequence of our testgene to your home directory. Use it as test sequence for the following exercises when indicated.
UP

 


5. Gene Prediction II

  1. Which gene prediction tools within HUSAR can you use to analyse out testgene?
  2. Which gene prediction tools beside HUSAR can you use to analyse out testgene?
  3. Employ the program FGENES to analyse our testgene sequence.
UP

 


6. Phylogeny I

  1. Given the following four alignes sequences, alignedseq: Which sites are phylogenetic informative? Using the informative sites assign scores to all possible tree topologies.
  2. Get the datasets hominid.parstrees, and hominid.mltrees. There are 15 unrooted trees corresonding to 5 sequences in the dataset. Get scores for all tree topologies with ednapars and ednaml. You will have to set the user tree option in those programs. Do the same best tree topologies show up with parsimony and maximum likelihood?
UP

 


7. Phylogeny II

  1. Given the following distance matrices:examplematrix, Which are additive, which are ultrametric ?
  2. Calculate uncorrected distances and Jukes-Cantor corrected distances of the aligned sequences in reformat.msf. What is the difference you see in the matrices?
  3. Do a Clustree with bootstrapping with reformat.msf(see exercise 2). What do the bootstrap values mean?
  4. Use the program splits with the uncorrected distance matrix from exercise 2. Compare the picture with your bootstrap result of exercise 3.
UP

 


8. Phylogeny III

  1. Use the task path with reformat.msf. Is this set good for phylogenetic analysis?
  2. Use the task path with hs11trimmed.msf. Is this set good for phylogenetic analysis? Problems?
  3. Question 3
  4. Question 4
UP

 


9. Database Searching I

  1. Compare the results of a database search using blastn2 and fasta using embl:u50532 against database human. Which are the differences and why?
  2. How to run Fasta with a group of sequences: Use the following sequences as input sequences:

    sw:CYAA_ERWCH
    sw:CYAA_HAEIN
    sw:CYAA_PASMU
    sw:CYAA_YERIN
    sw:CYAA_YERPE

  3. How to get an overview of sequence correlations: Check the W2HID number of the previous fasta run (listed in the Result page of W2H Advanced mode) and start the program ALITAB with this number.
UP

 


10. Database Searching II

  1. Find the sequences in the database nrpep which contain the pattern QWERT.
  2. Create a multiple alignment with the sequences swissprot:hs11_* using clustal. Use the msf-file to create a profile and search the swissprot database with this profile.
UP

 


11. Protein Analysis I

  1. Copy the sequence of unknown_prot.pep to your home directory.
UP

 


12. Protein Analysis II - Sorry, but this part is still under construction!

    UP

     


    13. Fragment Assembly - Sorry, but this part is still under construction!

      UP