This a short explanation of Dot plots an easy and powerful means of sequence analysis, useful for searching out regions of similarity in two sequences and repeats within a single sequence.
This the principle:
These are some examples of reading:
By Gepard ("GEnome PAir - Rapid Dotter" is an open source from Helmholtz Zentrum München ) we have two variable parameters:
· Word length - minimum word length for identical subsequences which create a hit in the dotplot.
· Window size - If word length =0 "normal" dotplot mode will be activated where all characters of both sequences are compared against each other.
This parameter specifies the window size over which an average dot value will be calculated.
· Word length is number of bases in a sliding window that is moved along each sequence and compared to generate a single data point on the plot. Word lenght must be an number.
· Mismatch Limit ( is the window size) determines how similar the two sequences in a word lenght must be to "match". For example, if word lenght size is 9 and mismatch limit is 2, then up to 2 mismatches in a 9 base word lenght will still be classified as a match.
If we compare the genomes of My. smegmatis and My. ulcerans by Gepard (two FASTA format sequence files ) and changing the parameters for increasing the sensibility,we have:
This is the best: Here it is possible to observe common regions, some repeats and palindromic sequences. By the co-ordinate axes it is possible to select a single gene for further evidence.
The next image it doesn't make any difference.
Control: My.smegmatis versus My.smegmatis
Control: My.ulcerans versus My.ulcerans