Swathi Ramachandra Upadhya
Proteins are the workhorses of our cells. They are known to physically interact with each other in dense networks to carry out specific tasks within our cells. The complement of proteins present in a given cell or sample is typically called its ‘proteome’ and this proteome can be quantified using techniques like mass spectrometry, RPPA, etc. Despite the importance of proteins for cellular function, the ‘omics’ revolution that has occurred in biology over the last two decades has largely focused on quantifying genomes and transcriptomes. It is only in the last few years that technology has progressed to the stage where we can quantify proteomes for large (n=100-200) numbers of samples. These studies have revealed that there is only a moderate (r=~0.4) correlation between mRNA (transcriptomics) and protein levels. Moreover certain classes of proteins display higher or lower than average concordance between mRNA and protein levels. In some cases it appears that differences between mRNA and protein levels can be predicted based on knowledge of protein-protein interaction networks. Proteins that physically interact tend to have correlated protein levels and a DNA mutation that alters the abundance of one protein can also influence the abundance of its interaction partners. This suggests that it may be possible to better predict protein abundance by integrating DNA and mRNA profiles with knowledge of protein-protein interactions. Thus, we intend to develop an integrative machine learning approach to predict protein abundance for ~10,000 tumour samples. Currently we have protein abundance measurements for ~1000 tumour samples, but mRNA and genome information for over 20,000 samples. Even with new technical developments, it is likely that for the foreseeable future we will have orders of magnitude more DNA/mRNA profiles than we have proteomes. Consequently methods for predicting proteomes from more readily available data would have enormous utility.