To construct CKSAAP_UbSite, 203 ubiqulytated substrates, which were previously compiled by Radivojac et al (2010), were downloaded from These 203 proteins contained 272 experimentally validated ubiquitination sites, which are regarded as positive samples. By employing the similar strategy as the work of Radivojac et al (2010), we extracted 4642 negative samples from the 124 mitochondrial matrix proteins. After a filtering of sequence identity of 40%, we obtained a filtered ubiquitination site dataset containing 263 positive and 4345 negative samples (i.e. Radivojac_dataset), which was used to train and test CKSAAP_UbSite . Since the number of available non-ubiquitination sites in Radivojac_dataset is much larger than that of ubiquitination sites, we randomly selected ten sets of negative samples to allow a reliable performance estimation of CKSAAP_UbSite.

All the positive and negative samples (after a filtering of sequence identity of 40%)  
The dataset of hCKSAAP_UbSite was obtained from two high throughput proteomic assays (Danielsen, et al., 2011; Wagner, et al., 2011) and our own literature search. The redundant sequences was removed by using Blastclust program with 30% sequence identity cutoff. At last, there are 6118 ubiquitination sites in 2500 proteins for the training dataset and 3419 ubiquitination sites in 1352 proteins for the independent testing dataset.

