• OpenAccess
  • The Research on Identification of Gene Splice Sites by Support Vector Machine  [iCBBE 2016]
  • DOI: 10.4236/jbise.2016.910B007   PP.53 - 57
  • Author(s)
  • Hongbin Li, Guangzhong He
  • The recognition of splicing sites is a very important step in the eukaryotic DNA se-quence analysis. Many scholars are working hard to improve the accuracy of identifi-cation. Our team carried out research on this issue based on support vector machine, which is one famous algorithm in data mining. The training and testing data is from the HS3D dataset, and excellent accuracy rate is achieved by nucleic acid sequence orthogonal coding and RBF core function, and the cross validation experiment hints that base pattern information is mainly located within 20 nucleotides upstream and downstream splice sites.
  • Splicing Sites, Recognition, Support Vector Machine
  • References
  • [1]
    Lu, W., Wainwright, G., Webster, S.G., Rees, H.H. and Turner, P.C. (2000) Clustering of Mandibular Organ-Inhibiting Hormone and Moult-Inhibiting Hormone Genes in The crab, Cancer Pagurus, and Implications for Regulation of Expression. Gene, 253, 197-207.
    Brunak, S., Engelbrecht, J. and Knudsen, S. (1990) Neural Network Detects Errors in the Assignment of mRNA Splice Sites. Nucleic Acids Research, 18, 4797-4801.
    Henderson, J., Salzberg, S. and Fasman, K.H. (1997) Finding Genes in DNA with a Hidden Markov Model. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology, 4, 127-141.
    Lukashin, A.V. and Borodovsky, M. (1998) GeneMark.hmm: New Solutions for Gene Find-ing. Nucleic Acids Research, 26, 1107-1115.
    Cai, D., Delcher, A., Kao, B. and Kasif, S. (2000) Modeling Splice Sites with Bayes Networks. Bioinformatics, 16, 152-158.
    Zhang, L. and Luo, L. (2003) aLL: Splice Site Prediction with QUADRATIC Discriminant Analysis Using Diversity Measure. Nucleic Acids Research, 31, 6214-6220.
    Sun, Y.F., Fan, X.D. and Li, Y.D. (2003) Identifying Splicing Sites in Eukaryotic RNA: Support Vector Machine Approach. Computers in Biology & Medicine, 33, 17-29.
    Zhang, X.H., Heller, K.A., Hefter, I., Leslie, C.S. and Chasin, LA. (2003) Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine Classification. Genome Research, 13.
    Vapnik, V.N. (1998) Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Signal Processing, Communications, and Control.
    Pollastro, P. and Rampone, S. (2002) HS3D: Homosapiens Splice Site Data Set.
    Damasevicius, R. (2008) Optimization of SVM Parameters for Promoter Recognition in DNA Sequences. 20th EURO Mini Con-ference “Continuous Optimization and Knowledge- Based Technologies” Eur OPT-2008, 99-104.

Engineering Information Institute is the member of/source content provider to