BLyS Sequence Analysis

07 Mar

I’ve been playing with some sequence analysis and phylogentic tree construction programs recently because I would like to introduce these sorts of data analysis into my biology classes. As a sample protein, I decided to use BLyS / BAFF, a protein important in regulating B Cell numbers. I’ve always wondered about the origin of this kind of molecule, since working on it in grad school, and this seemed like a decent way to get some ideas about where it might come from.

The first thing I did was go to the NIH’s National Library of Medicine website:

It’s easy to search for any protein / gene / whole genome you are interested in examining. Knowing that BLyS is vital in humans and mice, I chose to start with the human sequence. I retrieved it as the following:

>gi|20196464|dbj|BAB90856.1| BLyS [Homo sapiens]

The easiest tool to find similar proteins in other animals is the Basic Local Alignment Search Tool for proteins, or BLASTp. Just using default settings, I pasted the sequence in the search field and hit go. (note, I actually just used the accession number, not the whole sequence)


This retrieved tons of proteins with similar sequences from the vast database of sequence information, from which I chose several model species. One thing I wanted to do was to include several primates as a sort of internal calibration (assuming that they would all have very similar sequences compared to more distantly related species). I also wanted to get a few animals’ sequences who are quite distantly related to humans (frog and ground tit fir that bill)

Once I had a list, I put them all into a single text file and then used that in a second program. This time, I decided that the best ‘multiple alignment tool’ would be CLUSTALX. It’s been around for a while and can create data in a number of different forms. Besides, it’s free and versions are available for both mac and PC.

Again, for starters, I just accepted the default parameters and did a quick alignment:


Obviously, there’s something odd about the canid familiars (dog) sequence, but before I did anything about that, I just wanted to see what a phylogenetic tree looked like. This is another thing that Clustal does well, it will export your sequence alignment as tree data in a number of formats, then I could plug that data into one final program. This last is a web based program that I access through a french site (but you can probably find it in a number of places). The program is called DRAWGRAM. It accepts alignment data and outputs a graphical tree representation of the alignment.

This is an important logical step… What I’m doing is asking for a family tree of sorts to be displayed that represents the relationship of the sequences I provided. We might want to assume that this also tells us how related the organisms that have these proteins are – and that’s not wrong, but it’s also not thorough as we’re only using ONE protein to make that assumption.

Here’s my first tree:


Note how isolated Canis is on this representation.

Finally, I went back and truncated the Canis sequence to a place where I suspect the protein actually starts – my sequence from the NCBI gave me a string of Amino Acids at the front of the protein that I think are probably not there, but just got added by some computer algorithm without proper human oversight.

Once I did that Canis (by the way, I remained the sequence ‘DOG’ so I was sure it was the new one) fell in line with a sequence more similar to that seen in cats (felis):

ImageThat’s it for now. Although I expect that I will dig a little deeper with more animals to see if I can come closer to an ‘original BLyS’.


  1. Dereeper A., Audic S., Claverie J.M., Blanc G. BLAST-EXPLORER helps you building datasets for phylogenetic analysis. BMC Evol Biol. 2010 Jan 12;10:8. (PubMed)
  2. Dereeper A.*, Guignon V.*, Blanc G., Audic S., Buffet S., Chevenet F., Dufayard J.F., Guindon S., Lefort V., Lescot M., Claverie J.M., Gascuel O. robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W465-9. Epub 2008 Apr 19. (PubMed) *: joint first authors
  3. Felsenstein J. PHYLIP – Phylogeny Inference Package (Version 3.2). 1989, Cladistics 5: 164-166
  4. Larkin,M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G. (2007) Clustal W and Clustal X version 2.0. Bioinformatics, 23:2947-2948.
  5. Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research, 25:4876-4882.
Leave a comment

Posted by on March 7, 2014 in Uncategorized


Tags: , , , , , , , , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: