BLyS Sequence Analysis

07 Mar

I’ve been playing with some sequence analysis and phylogentic tree construction programs recently because I would like to introduce these sorts of data analysis into my biology classes. As a sample protein, I decided to use BLyS / BAFF, a protein important in regulating B Cell numbers. I’ve always wondered about the origin of this kind of molecule, since working on it in grad school, and this seemed like a decent way to get some ideas about where it might come from.

The first thing I did was go to the NIH’s National Library of Medicine website:

It’s easy to search for any protein / gene / whole genome you are interested in examining. Knowing that BLyS is vital in humans and mice, I chose to start with the human sequence. I retrieved it as the following:

>gi|20196464|dbj|BAB90856.1| BLyS [Homo sapiens]

The easiest tool to find similar proteins in other animals is the Basic Local Alignment Search Tool for proteins, or BLASTp. Just using default settings, I pasted the sequence in the search field and hit go. (note, I actually just used the accession number, not the whole sequence)


This retrieved tons of proteins with similar sequences from the vast database of sequence information, from which I chose several model species. One thing I wanted to do was to include several primates as a sort of internal calibration (assuming that they would all have very similar sequences compared to more distantly related species). I also wanted to get a few animals’ sequences who are quite distantly related to humans (frog and ground tit fir that bill)

Once I had a list, I put them all into a single text file and then used that in a second program. This time, I decided that the best ‘multiple alignment tool’ would be CLUSTALX. It’s been around for a while and can create data in a number of different forms. Besides, it’s free and versions are available for both mac and PC.

Again, for starters, I just accepted the default parameters and did a quick alignment:


Obviously, there’s something odd about the canid familiars (dog) sequence, but before I did anything about that, I just wanted to see what a phylogenetic tree looked like. This is another thing that Clustal does well, it will export your sequence alignment as tree data in a number of formats, then I could plug that data into one final program. This last is a web based program that I access through a french site (but you can probably find it in a number of places). The program is called DRAWGRAM. It accepts alignment data and outputs a graphical tree representation of the alignment.

This is an important logical step… What I’m doing is asking for a family tree of sorts to be displayed that represents the relationship of the sequences I provided. We might want to assume that this also tells us how related the organisms that have these proteins are – and that’s not wrong, but it’s also not thorough as we’re only using ONE protein to make that assumption.

Here’s my first tree:


Note how isolated Canis is on this representation.

Finally, I went back and truncated the Canis sequence to a place where I suspect the protein actually starts – my sequence from the NCBI gave me a string of Amino Acids at the front of the protein that I think are probably not there, but just got added by some computer algorithm without proper human oversight.

Once I did that Canis (by the way, I remained the sequence ‘DOG’ so I was sure it was the new one) fell in line with a sequence more similar to that seen in cats (felis):

ImageThat’s it for now. Although I expect that I will dig a little deeper with more animals to see if I can come closer to an ‘original BLyS’.


