For many business intelligence applications, decision making depends critically on the information contained in all forms of “informal” text documents, such as emails, meeting summaries, attachments and web documents. For example, in a meeting, the topic of developing a new product was first raised. In subsequent follow-up emails, additional comments and discussions were added, which included links to web documents describing similar products in the market and user reviews on those products. A concise summary of this “conversation” is obviously valuable. However, existing technologies are inadequate in at least two fundamental ways. First, extracting “conversations” embedded in multi-genre documents is very challenging. Second, applying existing multi-document summarization techniques, where were designed mainly for formal documents, have proved to be highly ineffective when applied to informal documents like emails. In this presentation, we give an overview of email summarization and meeting summarization methods. We conclude by presenting several open problems that need to be solved for multi-modal extraction and summarization of conversations to become a reality.
Dr. Raymond Ng is a professor in Computer Science at the University of British Columbia. His main research area for the past two decades is on data mining, with a specific focus on health informatics and text mining. He has published over 150 peer-reviewed publications on data clustering, outlier detection, OLAP processing, health informatics and text mining. He is the recipient of two best paper awards - from 2001 ACM SIGKDD conference, which is the premier data mining conference worldwide, and the 2005 ACM SIGMOD conference, which is one of the top database conferences worldwide. He was one of the program co-chairs of the 2009 International conference on Data Engineering, and one of the program co-chairs of the 2002 ACM SIGKDD conference. He was also one of the general co-chairs of the 2008 ACM SIGMOD conference. He was an editorial board member of the Very large Database Journal and the IEEE Transactions on Knowledge and Data Engineering until 2008.