QUESTION : Look at the entire length of the sequences. What do you notice about the electropherogram peaks and quality scores at nucleotide positions labeled “N”? Describe the quality scores at these “N”. Where do you find more Ns? (at the 5’and 3’ends of the sequence or in the middle of the sequence). Why is it important to remove excess N’s from the sequences?
At "N" positions, peaks represent different nucleotides have similar amplitudes (heights) and overlap, or no single peak rises above the background of lower amplitude peaks. Hence, quality scores are very low at "N" positions.
Mostly, more Ns observed at the 5’ and 3’ ends because the sequence quality at the ends is poor.
Each "N" is scored as a misalignment, causing experimental sequences to appear to be less related to reference sequences than they actually are. This will significantly impact tree building, potentially placing related sequences in different clades.
QUESTION : Look at the entire length of the sequences. What do you notice about the...