Determining class-membership probabilities with nearest-neighbor search
Overview
Key to our core approach for secondary structure prediction is estimating class membership probabilities for the residues of the protein: at each position i in the protein, for each structure class c, the probability that class c is the true secondary structure state of the residue at position i. We estimate these probabilities by a form of nearest neighbor classification, using words of a fixed length l extracted from the amino acid sequence P of the protein.Overlapping words
Using our template database of amino acid words we find nearest neighbor words for each fixed-length word from the input sequence. These nearest-neighbors have known secondary structure, which we use to estimate the structure state probabilities at each position of the input sequence. An illustration of this is shown at the top of this page.Videos
The following video was presented at ISMB 2020 and gives more detailed information on Nnessy:
|
A shorter version was presented at SCS 2020:
|