Home

Short Bio | Resume | Research | Photo Album | Favorite Links | Contact
Xuerui Wang's Home Page
Research

Publications |Projects

Statistical and computational machine learning (ML), data mining (DM) for large data sets, online advertising, information retrieval (IR), topic models of text, and social network analysis (SNA).
 
PUBLICATIONS TOP

Xuerui Wang, Wei Li, Ying Cui, Bruce Zhang and Jianchang Mao, Click-Through Rate Estimation for Rare Events in Online Advertising, Online Multimedia Advertising: Techniques and Technologies, IGI Global, 2010.

Wei Li, Xuerui Wang, Ruofei Zhang, Ying Cui, Jianchang Mao and Rong Jin, Exploitation and Exploration in a Performance based Contextual Advertising System, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010.

Xing Wei, Fuchun Peng, Huishin Tseng, Yumao Lu, Xuerui Wang and Benoit Dumoulin, Search with Synonyms: Problems and Solutions, Proceedings of the 23rd International Conference on Computional Linguistics, 2010.

Haizheng Zhang, Ke Ke, Wei Li and Xuerui Wang, Graphical Models based Hierarchical Probabilistic Community Discovery in Large-Scale Social Networks, International Journal of Data Mining, Modelling and Management, Vol. 2, No. 2, pp. 95-116, 2010.

Xuerui Wang and Andrew McCallum, Structured Topic Models: Jointly Modeling Text and Its Accompanying Modalities, VDM Verlag, ISBN: 978-3-639-20557-2, 2009.

Xuerui Wang, Andrei Broder, Marcus Fontoura and Vanja Josifovski, A Search based Method for Forecasting Ad Impression in Contextual Advertising, Proceedings of the 18th International World Wide Web Conference, pp. 491-500, 2009.

Xuerui Wang, Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities, Ph.D. Thesis, UM-CS-PhD-2009-003, 2009.

Xuerui Wang, Andrei Broder, Evgeniy Gabrilovich, Vanja Josifovski and Bo Pang, Cross-Language Query Classification using Web Search for Exogenous Knowledge, Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, pp. 74-83, 2009.

Xuerui Wang, Andrei Broder, Marcus Fontoura and Vanja Josifovski, A Note on Search based Forecasting of Ad Volume in Contextual Advertising, Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 1343-1344, 2008.

Xuerui Wang, Andrei Broder, Evgeniy Gabrilovich, Vanja Josifovski and Bo Pang, Cross-lingual Query Classification: a Preliminary Study, Proceedings of the 17th ACM Conference on Information and Knowledge Management Workshop on Improving Non-English Web Searching, pp. 101-104, 2008.

Xuerui Wang, Chris Pal and Andrew McCallum, Generalized Component Analysis for Text with Heterogeneous Attributes, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 794-803, 2007.

Xuerui Wang, Andrew McCallum and Xing Wei, Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval, Proceedings of the 7th IEEE International Conference on Data Mining, pp. 697-702, 2007.

Haizheng Zhang, Wei Li, Xuerui Wang, C. Lee Giles, Henry C. Foley and John Yen, HSN-PAM: Finding Hierarchical Probabilistic Groups from Large-Scale Networks, Proceedings of the 7th IEEE International Conference on Data Mining Workshop on Data Mining in Web 2.0 Environments, pp. 27-32, 2007.

Andrew McCallum, Xuerui Wang and Andres Corrada-Emmanuel, Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email, Journal of Artificial Intelligence Research, Vol. 30, pp. 249-272, 2007.

Xing Wei, Jimeng Sun and Xuerui Wang, Dynamic Mixture Models for Multiple Time Series, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2909-2914,2007.

Andrew McCallum, Xuerui Wang and Natasha Mohanty, Joint Group and Topic Discovery from Relations and Text, Statistical Network Analysis: Models, Issues and New Directions, Lecture Notes in Computer Science 4503, pp. 28-44, 2007.

Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck and Andrew McCallum, On Discriminative and Semi-Supervised Dimensionality Reduction, The 20th Annual Conference on Neural Information Processing Systems Workshop on Novel Applications of Dimensionality Reduction, 2006.

Xuerui Wang and Andrew McCallum, Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424-433, 2006.

Xuerui Wang, David Kulp and Andrew McCallum, Inferring Gene Annotations in Gene Ontology from Gene Expression Data, UMass CS Synthesis Project Report, 2006.

Andrew McCallum, Chris Pal, Greg Druck and Xuerui Wang, Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification, Proceedings of the 21st National Conference on Artificial Intelligence, pp. 433-439, 2006.

Wei Li, Xuerui Wang and Andrew McCallum, A Continuous-Time Model of Topic Co-occurrence Trends, Proceedings of the 21st National Conference on Artificial Intelligence Workshop on Event Extraction and Synthesis, pp. 48-53, 2006.

Xuerui Wang, Natasha Mohanty and Andrew McCallum, Group and Topic Discovery from Relations and Their Attributes, Advances in Neural Information Processing Systems 18, pp. 1449-1456, 2006.

Xuerui Wang and Andrew McCallum, A Note on Topical N-grams, UMass Technical Report UM-CS-2005-071, 2005

Chris Pal, Xuerui Wang, Michael Kelm and Andrew McCallum, Multi-Conditional Learning for Joint Probability Models with Latent Variables, The 19th Annual Conference on Neural Information Processing Systems Workshop on Advances in Structured Learning for Text and Speech Processing, 2005.

Andrew McCallum, Xuerui Wang and Chris Pal, Predictive Random Fields: Latent Variable Models Fit by Multiway Conditional Probability with Applications to Document Analysis, UMass Technical Report UM-CS-2005-053, 2005

Xuerui Wang, Natasha Mohanty and Andrew McCallum, Group and Topic Discovery from Relations and Text, Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD-05), pp. 28-35, 2005.

Andrew McCallum, Andres Corrada-Emmanuel and Xuerui Wang, Topic and Role Discovery in Social Networks, Proceedings of 19th International Joint Conference on Artificial Intelligence, pp.786-791, 2005.

Andrew McCallum, Andres Corrada-Emmanuel and Xuerui Wang, The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email, The 18th Annual Conference on Neural Information Processing Systems Workshop on Structured Data and Representations in Probabilistic Models for Categorization, also appeared as UMass Technical Report UM-CS-2004-096, 2004.

Xuerui Wang, Rebecca Hutchinson and Tom Mitchell, Training fMRI Classifiers to Discriminate Cognitive States across Multiple Subjects, Advances in Neural Information Processing Systems 16, pp. 709-716, 2004.

Tom Mitchell, Rebecca Hutchinson, Radu Niculescu, Francisco Pereira, Xuerui Wang, Marcel Just and Sharlene Newman, Learning to Decode Cognitive States from Brain Images, Machine Learning, Vol. 57, Issue 1-2, pp. 145-175, 2004.

Tom Mitchell, Rebecca Hutchinson, Marcel Just, Radu Niculescu, Francisco Pereira and Xuerui Wang, Classifying Instantaneous Cognitive States from fMRI data, Proceedings of the American Medical Informatics Association 2003 Annual Symposium, pp. 465-469, 2003. Best Paper – Foundational’  Award.

Xuerui Wang, Tom Mitchell and Rebecca Hutchinson, Using Machine Learning to Detect Cognitive States across Multiple Subjects, CMU CALD KDD Project Report, 2003.

Tom Mitchell, Rebecca Hutchinson, Marcel Just, Sharlene Newman, Radu Stefan Niculescu, Francisco Pereira and Xuerui Wang, Machine Learning of fMRI Virtual Sensors of Cognitive States, The 16th Annual Conference on Neural Information Processing Systems Workshop on Computational Neuroimaging: Foundations, Concepts & Methods, 2002.

Xuerui Wang and Tom Mitchell, Detecting Cognitive States Using Machine Learning, CMU CALD Technical Report for Summer Work, 2002.

Xuerui Wang and Wenhuang Liu, Research on CBR in Knowledge Management Systems, Computer Engineering and Applications, Vol. 38, No.2,  pp. 181-184, 2002.

Zefeng Zheng, Xuerui Wang and Wenhuang Liu, Research on a Solution of Business Questions Modeling in Decision Support Systems, Proceedings of the 2001 International Conference on Artificial Intelligence, 2001.

Xiu Li, Shouju Ren, Wenhuang Liu and Xuerui Wang, A Solution of Job-Shop Scheduling Problems based on Genetic Algorithms, Proceedings of the 2001 IEEE International Conference on Systems, Man and Cybernetics, Vol. 3, pp. 1823-1828, 2001.

Xiu Li, Xuerui Wang, Wenhuang Liu, and Lin Liao, Research on Web-based Data Warehouse using XML, Proceeding of the 3rd International Conferences on Information Technology and Information Networks, Vol. 5 ,  pp. 42-47, 2001.

Xuerui Wang, Wenhuang Liu, Lin Lei and Shouju Ren, Research for Internet-based Integration & Self-organization Supply Chain Management, Proceedings of the 4th World Multiconference on Systemics, Cybernetics and Informatics, Volume I in Information Systems, 2000.

Lin Lei, Wenhuang Liu, Shouju Ren and Xuerui Wang, ERP-Based Business Process Reengineering, Proceedings of the 16th IFIP World Computer Congress, Information Technology for Business Management, 2000.

 

PROJECTS TOP
Social Network Analysis from Multiple Modalities

Previous work in social network analysis has modeled the existence of links from one entity to another, but not the language content or topics on those links. Attributes of the interactions between entities play a very important role in forming social network. For example, the Author-Recipient-Topic (ART) model learns topic distributions based on the direction-sensitive email messages sent between entities and provides evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people’s roles. For group discovery, the Group-Topic (GT) model simultaneously discovers groups from the entity relations and topics among the corresponding text attributes. The GT model’s joint inference improves both the groups and topics discovered by clustering in several modalities at once. Multiple modalities are driven by the common goal of increasing data likelihood. Significantly, joint inference allows the discovery of groups to be guided by the emerging topics, and vice-versa. We are also interested in the dynamic evolution of social network due to the flux in other modalities.

   
Disruptive, Pattern-Driven Image Compression

As demand for limited bandwidth escalates, networks suffer from delays, bottlenecks, and even paralysis in transmission and storage. Because imagery is more than 60 percent of the information transmitted (the rest is voice and text) and is predicted to increase in the years ahead, a powerful software solution that efficiently compresses imagery is crucial and more cost effective than expensive upgrades in network infrastructure. This project, supported by National Institute of Standards and Technology, comprehends an image as a unified and interrelated entity, instead of unrelated blocks of data. Compared to data-driven compression approaches, which break images into small units, thereby losing spatial relations and, thus, lowering the compression ratio, the new paradigm simulates the cognitive process by compressing intricate networks of visual patterns. An adaptive filtering scheme partitions an image into complex but predictable structures, and unpredictable "noise-like" structures. Artificial intelligence and Machine Learning techniques are heavily employed efficiently to capture high-quality images at superior compression ratios. This capability is achieved by training the system to recognize a repertoire of patterns common to images. This technology holds out the promise of compression ratios four times better than the competing data-driven approaches, such as those used in JPEG-2000 and MPEG-4.

   
Scientific Data Mining to Understand Human Brain Function

The new brain imaging technologies (e.g., fMRI)  create a dramatic opportunity and dramatic need for new data analysis methods to help scientists distill theories of cognitive brain function from their experiments. Our research focuses on developing data analysis tools to support brain imaging studies, and especially to support scientists developing computer-based cognitive architectures that model a broad range of human activities from language processing, to solving algebra problems, to answering questions about images, to human learning from experience.  We will pursue our research agenda of developing data analysis algorithms and software, while collaborating closely with two mature, separately-funded cognitive architecture research efforts (4CAPS and ACT-R).  In the past few years, both efforts have turned to fMRI for obvious reasons: the data provided by fMRI presents a revolutionary advance in our ability to directly observe brain activity, and hence to guide the formation of theories about human cognitive processing. Our research goal is to develop new machine learning methods to put these cognitive architecture efforts on the strongest possible footing to take advantage of brain imaging methods.

 
Multi-agent Reinforcement Learning

The issue of learning and adaptation in multi-agent systems has been given increasing attention in artificial intelligence research. It is becoming clear, given the dynamic environments in which we want our agent teams to interact, that behavioral repertoires and activities cannot simply be defined in advance. Our approach to multi-agent learning, unlike the top-down model of assuming an agent's state in advance, is notable for its similarity to the types of learning exhibited by lower animal societies. Our method is Profit Sharing Plan (PSP), which is a type of reinforcement learning algorithm. The PSP algorithm allows an autonomous agent to learn a behavior progressively without any instruction and only with delayed rewards. PSP differs from other approaches to learning (like Markov Decision Processes) in that it does not assume an agent's state in advance.

   
Fly through the Universe

During the last decade, there were many astronomical surveys that have generated Terabyte datasets of measurable parameters of stars and galaxies. Current methods to accessing very large databases do not scale well into Terabyte size. We want to scale up to millions of galaxies, and to take into account the illumination of each galaxy. The engineering challenge is speed: how to store, retrieve and display 500 million galaxies. We would also like to show the (available) telescope image of a galaxy, if we 'fly' close enough to it. Given information about galaxies (x, y, z, illumination), our project crudely simulates a space-craft through the universe (millions of galaxies) that is indexed by an improved R-tree.

Last updated on August 24, 2006