Московские семинары по биоинформатике - 15 и 16 декабря 2011

Inna S. Povolotskaya*, Fyodor A. Kondrashov, Peter K. Vlasov
(Centre for Genomic Regulation, Barcelona)

Stop codons in bacteria are not selectively equivalent

Четверг, 15 декабря 2011, 17.00 (NB!)
Институт молекулярной биологии им. В.А.Энгельгардта РАН. Ул. Вавилова 32, 3 этаж, конференц-зал.



Kasia Bozek
(Max Planck Institute for Computer Sciences, Saarbrucken (now at PICB, Shanghai)

Physicochemical and structural properties determining HIV-1 coreceptor usage

Пятница, 16 декабря 2011, 18.00

NB! МГУ, Лаб. корпус Б (факультет биоинженерии и биоинформатики), к. 117.



Inna S. Povolotskaya*, Fyodor A. Kondrashov, Peter K. Vlasov
Centre for Genomic Regulation, Barcelona

Stop codons in bacteria are not selectively equivalent

Many global patterns in molecular evolution are defined by the genetical code, including rates of nonsynonymous and synonymous evolution, synonymous codon usage and the optimality of the genetic code. The evolution and usage of stop codons, however, have not been rigorously studied with the exception of coding of non-canonical amino acids. Here, we study the rate of evolution and genomic frequency of TAA, TGA and TAG canonical stop codons in bacterial genomes. We find that stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codon usage relative to genomic nucleotide content indicates that this selection regime is not straightforward. The usage of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC content, while TAG frequency is independent of nucleotide content. We thus modeled stop codon usage and nucleotide content with mutation rates and two selection on nucleotide content and TAG fre
quency as parameters. We found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the lack of a relationship of TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal stop codon for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG.





Kasia Bozek
Max Planck Institute for Computer Sciences, Saarbrucken (now at PICB, Shanghai)

Physicochemical and structural properties determining HIV-1 coreceptor usage

The entry of the human immunodeficiency virus (HIV) into human cells is a multi step process involving binding to one of the cell-surface coreceptors CCR5 or CXCR4. The binding site of the coreceptor is partially situated on the third variable region (V3) of gp120 viral protein. Whether a virus can bind to CCR5 only (R5 virus), to CCR5 and CXCR4 alternately (dual virus) or to CXCR4 only (X4 virus) is determined predominantly by the sequence and structure of this region. The phenotype related to the virus coreceptor usage is termed viral tropism. While in the early, asymptomatic stages of infection mainly R5 viruses are observed, progression to AIDS is often correlated with the emergence of X4 viruses. The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the V3 loop sequence of the viru
s gp120 protein and on its structure. The major drawback of the binary sequence representation is that it offers insights into the physicochemical properties of amino acids and their spatial arrangement in the binding site that determines coreceptor binding.
Here we present a structural descriptor of the V3 loop encoding the physicochemical properties of the loop together with their locations on the protein structure. We map 54 amino acid indices representing the physicochemical properties of amino acids onto the V3 loop structure and use machine learning methods to extract the features which are the most informative for coreceptor usage. The extracted set of features represents a small fraction of the initial feature set and models based on this set attain higher prediction accuracy with decreased computational load.
Our descriptor used as input to the support vector machine predicting tropism shows a statistically significant improvement over the binary representation of the V3 sequence. At the specificity of 11/25 rule a sensitivity of 69% was achieved, comparing favorably with the 62% sensitivity of sequence-based prediction. In addition to the data inferred from lab-cloned viruses (clonal data) we assessed the predictive power of our method on the clinically derived 'bulk' sequence data of patient samples and obtained a statistically significant 3% improvement over the sequence representation evaluated using receiver operating characteristic (ROC) curve. We also demonstrated the capacity of our method to predict the outcome of the coreceptor blocker-based therapy by applying it to 53 samples of patients undergoing Maraviroc therapy.
Our structural descriptor affords direct interpretation of the features of the V3 loop relevant for viral tropism by pointing to specific physicochemical properties of amino acids in specific parts of the loop being predictive of coreceptor usage. The analysis of features important for the classification pointed to two loop regions and their physicochemical properties playing determining role in the coreceptor usage. The regions are located on the opposite strands of the loop stem; and show predominantly structure, hydrophobicity and charge-related properties. These regions are in close proximity in the bound conformation of the loop forming a potentially determinant site for the coreceptor usage. The resulting method offers higher performance over sequence-based method with a comparable efficiency and a direct interpretation of structural and physicochemical determinants of tropism.