Dartmouth College Computer Science
Technical Report series
TR search TR listserv
|By author:||A B C D E F G H I J K L M N O P Q R S T U V W X Y Z|
|By number:||2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986|
Objective: Currently, a major limitation for natural language processing (NLP) analyses in clinical applications is that a concept can be referenced in various forms across different texts. This paper introduces Multi-Ontology Refined Embeddings (MORE), a novel hybrid framework for incorporating domain knowledge from various ontologies into a distributional semantic model, learned from a corpus of clinical text. This approach generates word embeddings that are more accurate and extensible for computing the semantic similarity of biomedical concepts than previous methods.
Materials and Methods: We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE. For the ontology-based component, we use the Medical Subject Headings (MeSH) ontology and two state-of-the-art ontology-based similarity measures. In our approach, we propose a new learning objective, modified from the Sigmoid cross-entropy objective function, to incorporate domain knowledge into the process for generating the word embeddings.
Results and Discussion: We evaluate the quality of the generated word embeddings using an established dataset of semantic similarities among biomedical concept pairs. We show that the similarity scores produced by MORE have the highest average correlation (60.2%), with the similarity scores being established by multiple physicians and domain experts, which is 4.3% higher than that of the word2vec baseline model and 6.8% higher than that of the best ontology-based similarity measure.
Conclusion: MORE incorporates knowledge from biomedical ontologies into an existing distributional semantics model (i.e. word2vec), improving both the flexibility and accuracy of the learned word embeddings. We demonstrate that MORE outperforms the baseline word2vec model, as well as the individual UMLS-Similarity ontology similarity measures.
Senior Honors Thesis. Advisor: Saeed Hassanpour.
Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]
Or copy and paste:
Steven Jiang, "Multi-Ontology Refined Embeddings (MORE): A Hybrid Multi-Ontology and Corpus-based Semantic Representation for Biomedical Concepts." Dartmouth Computer Science Technical Report TR2019-873, June 2019.
Notify me about new tech reports.
Search the technical reports.
To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu
Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Technical reports collection maintained by David Kotz.