报告题目:Statistical Approaches to Analysis of Traditional Chinese Medicine Patient Records
报 告 人:Professor ChengXiang Zhai
Computer Science and Willett Faculty Scholar at the University of Illinois at Urbana-Champaign
ChengXiang Zhai is a Professor of Computer Science and a Willett Faculty Scholar at the University of Illinois at Urbana-Champaign, where he is also affiliated with the Institute for Genomic Biology, Department of Statistics, and School of Information Sciences. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests broadly include intelligent information systems, data mining, natural language processing, machine learning, and their applications in many domains, particularly biomedical and health informatics and intelligent education systems. He has published over 200 papers in these areas with high citations. He is an Editor-in-Chief of Springer - Information Retrieval Book Series, and previously served as an Associate Editor of ACM Transactions on Information Systems, Associate Editor of Elsevier - Information Processing and Management.
He is an ACM Distinguished Scientist, and received a number of awards, including Association for Computing Machinery SIGIR Test of Time Paper Award (three times), the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), an Alfred P. Sloan Research Fellowship, and many industry lab awards such as IBM Faculty Award and HP Innovation Research Award.
Traditional Chinese medicine (TCM) can provide important complementary medical care to modern medicine, and is widely practiced in China and many other countries. Recently, TCM patient records have been digitalized, leading to a large number of online patient records. The data contains potentially valuable knowledge about diagnosis and treatment of various diseases using the TCM methodology and thus creates an interesting opportunity to apply data mining techniques to extract such knowledge.
In this talk, I will present some of our recent work on using statistical approaches to analyze TCM patient records for disease profiling and subcategorization. In disease profiling, we propose a new probabilistic model for the joint analysis of symptoms, diagnoses, and herbs in patient records to discover the typical symptoms and typical herbs associated with different diseases (called disease profiles). In disease subcategorization, we study how to cluster patient records to discover subcategories of diseases and show that we can use machine learning to leverage the knowledge in a TCM dictionary of herb functions for improving the accuracy of subcategorization. Experiment results on real TCM patient records show promising results.
Finally, I will briefly discuss the vision for performing integrative mining of TCM patient records and other biomedical data sets to deeply understand the effectiveness of herbs and their chemical components.