生物医学文本中命名实体识别研究

张向喆 已出版文章查询
张向喆
本平台内已出版文章查询
1 王明辉 已出版文章查询
王明辉
本平台内已出版文章查询
1 赵洪波 已出版文章查询
赵洪波
本平台内已出版文章查询
1 王起山 已出版文章查询
王起山
本平台内已出版文章查询
1 潘玉春 已出版文章查询
潘玉春
本平台内已出版文章查询
1

+ 作者地址

1上海交通大学农业与生物学院,上海,200240


0
  • 摘要
  • 参考文献
  • 相关文章
  • 统计
生物命名实体识别是对生物医学文本进行信息处理的关键技术.准确的生物命名实体识别工具是对文本进行后续工作如信息提取或文本分类等的先决条件.经过多年的研究,生命科学领域生物命名实体识别取得了一定的进展.本文总结了生物命名实体的特征,分析了基于不同方法的生物命名实体识别系统,及生物命名实体识别方法在提取蛋白质互作等方面的丰富应用,并展望了未来的发展趋势.

[1] Swanson D R .Fish oil,raynaud's syndrome,and undiscov-ered public knowledge[J].Perspectives in Biology and Medicine,1986,30(01):7-18.

[2] Jenssen TK;Laegreid A;Komorowski J;Hovig E .A literature network of human genes for high-throughput analysis of gene expression.[J].Nature Genetics,2001(1):21-28.

[3] 李保利,陈玉忠,俞士汶.信息抽取研究综述[J].计算机工程与应用,2003(10):1-5,66.

[4] Ohta T;Tateisi Y;Kim J D.The genia corpus:An annotated research abstract corpus in molecular biology domain[A].San Diego,California,USA,2002:82-86.

[5] Liu H;Hu ZZ;Torii M;Wu C;Friedman C .Quantitative assessment of dictionary-based protein named entity tagging.[J].Journal of the American Medical Informatics Association: JAMIA,2006(5):497-507.

[6] Manning C D;Sehutze H.统计自然语言处理基础[M].北京:电子工业出版社,2005

[7] Krauthammer M;Nenadic G .Term identification in the biomedical literature.[J].Journal of biomedical informatics,2004(6):512-526.

[8] Hirschman L;Morgan AA;Yeh AS .Rutabaga by any other name: extracting biological names.[J].Journal of biomedical informatics,2002(4):247-259.

[9] Tuason O;Chen L;Liu H et al.Biological nomenclattmes:A source of lexical knowledge and ambiguity[J].Pacific Symposium on Biocomputing,2004,9:238-249.

[10] Krauthammer M.;Morozov P.;Friedman C.;Rzhetsky A. .Using BLAST for identifying gene and protein names in journal articles[J].Gene,2000(1/2 Special):245-252.

[11] Stephen F. Altschul;Thomas L. Madden;Alejandro A. Schaffer;Jinghui Zhang;Zheng Zhang;Webb Miller;David J. Lipman .GAPPED BLAST AND PSI-BLAST - A NEW GENERATION OF PROTEIN DATABASE SEARCH PROGRAMS[J].Nucleic Acids Research,1997(17):3389-3402.

[12] Tsuruoka Y;Tsujii J .Improving the performance of dictionary-based approaches in protein name recognition.[J].Journal of biomedical informatics,2004(6):461-470.

[13] Cohen A M.Unsupervised gene/protein named entity nor-malization using automatically extracted dictionaries[A].,2005:17-24.

[14] Yang ZH;Lin HF;Li YP .Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature[J].Computational biology and chemistry,2008(4):287-291.

[15] Fukuda K;Tamura A;Tsunoda T.Toward information extraction:Identifying protein names from biological papers[A].,1998:707-718.

[16] Kristofer Franzen;Gunnar Eriksson;Fredrik Olsson;Lars Asker;Per Liden;Joakim Coester .Protein names and how to find them[J].International journal of medical informatics,2002(1/3):49-61.

[17] Hou WJ;Chen HH .Enhancing performance of protein and gene name recognizers with filtering and integration strategies.[J].Journal of biomedical informatics,2004(6):448-460.

[18] R. Gaizauskas;G. Demetriou;P. J. Artymiuk;P. Willett .Protein Structures and Information Extraction from Biological Texts: The PASTA System[J].Bioinformatics,2003(1):135-143.

[19] Hanisch D;Fundel K;Mevissen H T et al.Prominen Rule-based protein and gene entity recognition[J].BMC Bioinformatics,2005,6(z1):S14.

[20] Yang H;Nenadic G;Keane J A .A cascaded approach to normalising gene mentions in biomedical literature[J].Bioinformatics,2007,2(05):197-206.

[21] Cohen K B;Dolbey A E;Acquaah-Mensah G K.Contrast and variability in gene names[A].Pennsylvania,USA,2002:14-20.

[22] Fang H;Murphy K;Jin Y.Human gene name normalization using text matching with automatically extracted synonym dictionaries[M].New York,2006:41-48.

[23] Schuemie MJ;Mons B;Weeber M;Kors JA .Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.[J].Journal of biomedical informatics,2007(3):316-324.

[24] Tsuruoka Y;McNaught Y;Ananiadou S .Normalizing biomedical terms by minimizing ambiguity and variability[J].BMC Bioinformatics,2008,9(z3):S2.

[25] Hirschman L;Yeh A;Blaschke C et al.Overview of biocreative:Critical assessment of information extraction for biology[J].BMC Bioinformatics,2005,6(z1):S1.

[26] Kim J D;Ohta T;Tateisi Y et al.Genia corpus梥emanti-cally annotated corpus for bio-textmining[J].BIOINFORMATICS,2003,19(z1):180-182.

[27] Tanabe L;Xie N;Thorn L H et al.Genetag:A tagged corpus for gene/protein named entity recognition[J].BMC Bioinformatics,2005,6(z1):S3.

[28] Makino T;Ohta Y.Tuning support vector machines for biomedical named entity recognition[A].Pennsylvania,USA,2002:1-8.

[29] Lee KJ;Hwang YS;Kim S;Rim HC .Biomedical named entity recognition using two-phase model based on SVMs.[J].Journal of biomedical informatics,2004(6):436-447.

[30] Mitsumori T;Fation S;Murata M et al.Gene/protein name recognition based on support vector machine using dictionary as features[J].BMC Bioinformatics,2005,6(z1):S8.

[31] Shen D;Zhang J;Zhou G.Effective adaptation of a Hidden Markov model-based named entity recognizer for biomedical domain[A].,2003:49-56.

[32] Zhou C;Shen D;Zhang J et al.Recognition of protein/gene names from text using an ensemble of classifiers[J].BMC Bioinformatics,2005,6(z1):S7.

[33] Song Y;Kim E;Lee GG;Yi BK .POSBIOTM-NER: a trainable biomedical named-entity recognition system[J].Bioinformatics,2005(11):2794-2796.

[34] Sun C;Guan Y;Wang X;Lin L .Rich features based Conditional Random Fields for biological named entities recognition.[J].Computers in Biology and Medicine,2007(9):1327-1333.

[35] Tsai R T;Sung C L;Dai H J et al.Nerbio:Using selected word conjunctions,term normalization,and global patterns to improve biomedical named entity recognition[J].BMC Bioinformatics,2006,7(z5):S11.

[36] Yang Z;Lin H;Li Y .Exploiting the contextual cues for bio-entity name recognition in biomedical literature.[J].Journal of biomedical informatics,2008(4):580-587.

[37] Tanabe L;Wilbur WJ .Tagging gene and protein names in biomedical text.[J].Bioinformatics,2002(8):1124-1132.

[38] Brill E.A simple rule-based part of speech tagger[A].Trento,Italy,1992:112-116.

[39] Kinoshita S;Cohen K B;Ogren P V et al.Biocreative task la:entity identification with a stochastic tagger[J].BMC Bioinformatics,2005,6(z1):S4.

[40] Brants T.Tnt-a Statistical Part-of-Speech Tagger[A].Washington,DC,2000:224-231.

[41] Perez Iratxeta C;Bork P;Andrade MA .Association of genes to genetically inherited diseases using data mining.[J].Nature Genetics,2002(3):316-319.

[42] Hristovski D;Peterlin B;Mitchell J A et al.Improving literature based discovery support by genetic knowledge integration[J].Studies in Health Technology and Information,2003,95:68-73.

[43] Rindflesch T C;Hunter L;Aronson A R.Mining molecular binding terminology from biomedical text[A].,1999:127-131.

[44] Rindflesch T C;Tanabe L;Weinstein J N.Edgar.Extraction of Drugs,Genes and Relations from the BiomedicalLiterature[A].,2000:517-528.

[45] Xuan W;Wang P;Watson SJ;Meng F .Medline search engine for finding genetic markers with biological significance[J].Bioinformatics,2007(18):2477-2484.

[46] Fang Y C;Huang H C;Juan H F .Meinfotext:Associated gene methylation and cancer information from text mining[J].BMC Bioinformatics,2008,9:22.

[47] Ongenaert M;Van Neste L;De Meyer T et al.Pubmeth:A cancer methylation database combining text -mining and expert annotation[J].Nucleic Acids Research,2008,36(Database issue):842-846.

[48] Cooper J W;Kershenbaum A .Discovery of protein-protein interactions using a combination of linguistic,statistical and graphical information[J].BMC Bioinformatics,2005,6:143.

[49] Blaschke C;Oliveros J C;Valencia A .Mining functional information associated with expression arrays[J].Functional & Integrative Genomics,2001,1(04):256-268.

[50] Ono T;Hishigaki H;Tanigami A;Takagi T .Automated extraction of information on protein-protein interactions from the biological literature.[J].Bioinformatics,2001(2):155-161.

[51] Huang M;Zhu X;Hao Y;Payan DG;Qu K;Li M .Discovering patterns to extract protein-protein interactions from full texts.[J].Bioinformatics,2004(18):3604-3612.

[52] Hermjakob H;Montecchi-Palazzi L;Lewington C et al.Intact:An open source molecular interaction database[J].Nucleic Acids Research,2004,32(Database issue):452-455.

[53] Kerrien S;Alam-Faruque;Aranda B et al.Intactacl-pen source rresource for molecular interaction data[J].Nucleic Acids Research,2007,35(Database issue):561-565.

[54] Aranda B;Achuthan P;Alam-Faruque Y et al.Tlie intact molecular interaction database in 2010[J].Nucleic Acids Research,2009,38(Database issue):525-531.

[55] 俞晓晶 .基于蛋白质序列和生物医学文献的蛋白质功能挖掘[D].中国科学院上海生命科学研究院,2006.

[56] Wang H;Huang M;Zhu X .Extract interaction detection methods from the biological literature[J].BMC Bioinformatics,2009,10(Sl):S55.

[57] Stapley B J;Kelley L A;Sternberg M J.Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines[A].,2002:374-385.

[58] Van Auken K;Jaffery J;Chan J et al.Semi-automated cu-ration of protein subcellular localization:A text mining-based approach to gene ontology (go) cellular component curation[J].BMC Bioinformatics,2009,10:228.

[59] Blake C;Pratt W.Automatically Identifying Candidate Treatments from Existing Medical Literature[A].Stanford,California,2002

[60] Fukuda K;ist.go.jp;Takagi T .Knowledge representation of signal transduction pathways.[J].Bioinformatics,2001(9):829-837.

[61] Rzhetsky A;Iossifov I;Koike T;Krauthammer M;Kra P;Morris M;Yu H;Duboue PA;Weng W;Wilbur WJ;Hatzivassiloglou V;Friedman C .GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data.[J].Journal of biomedical informatics,2004(1):43-53.

[62] Daniel M. McDonald;Hsinchun Chen;Hua Su;Byron B. Marshall .Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser[J].Bioinformatics,2004(18):3370-3378.

[63] Krallinger M;Valencia A .Text-mining and information-retrieval services for molecular biology[J].Genome Biology,2005,6(07):224.

[64] Kim J D;Ohta T;Tsuruoka Y.Introduction to the Bio-Entity Recognition Task at Jnlpba[A].Gevena,2004:70-75.

[65] Cohen K B;Hunter L .Getting started in text mining[J].PLoS Computational Biology,2008,4(01):20.


DOI: http://dx.doi.org/10.3969/j.issn.1671-9964.2010.02.008

语种: 中文   

基金国家高技术研究发展计划(863计划)(2006AA10ZIE3)

关键词生物信息学 生物命名实体识别 生物医学文献


期刊热词
  • + 更多
  • 字体大小