Guang's homepage

Welcome to Guang's home



Guang Xiang
Twitter, Inc.
1355 Market St, Suite 900
San Francisco, CA 94103

Education

Ph.D., Language Technologies Institute, School of Computer Scence, Carnegie Mellon. Aug 2007 - Feb 2013

M.S., Machine Learning Department, School of Computer Scence, Carnegie Mellon. 2009 - 2011

Research

My general research interests lie in conducting data mining tasks on various corpus, especially big data, to discover knowledge and interesting patterns. To that ends, I heavily used machine learning, information retrieval, natural language processing, and other techniques.

Anti-phishing. NEW!!! Check out our online cascaded phish detector.
Company acquisition prediction based on CrunchBase profiles and TechCrunch articles. NEW!!! Find more details and download our corpus.
Offensive Twitter tweets classification (big data with MapReduce processing)
Activity recommendation based on users' GPS data, location semantics from Foursquare, and mobile app profiles on Google Play
Build parallel Chinese-English corpus automatically from Sina Weibo (the leading microblogging service in China) and Twitter for microblog message translation. NEW!!! With more than 500 million registered users, Sina Weibo is a rich and great corpus for various research tasks. Drop us a line if you want a copy of our Weibo corpus. Moreover, read our tutorial about how to create a Weibo account and use the Weibo API.

My advisors are Prof.Jason Hong and Prof.Carolyn Rose. My thesis committee includes Prof.Jason Hong, Prof.Carolyn Rose, Dr.Alex Hauptmann, Prof.Christos Faloutsos, and Dr.Markus Jakobsson.

I am also a big fan of Android, which is used intensively in my advisor Prof.Jason Hong's Chimps Group in building prototype systems and various mobile applications. It is really fun to learn from those experts.

Internship

Research intern at the Internet Services Research Center, Microsoft Research, Redmond, WA (Jun 2011 - Aug 2011)
Software engineering intern at Facebook, Palo Alto, CA (Jun 2010 - Sep 2010)

Selected Publications

Journal

Guang Xiang, Jason Hong, Carolyn Rose, and Lorrie Cranor. CANTINA+: A Feature-rich Machine Learning Framework for Detecting Phishing Web Sites. ACM TISSEC'11, 2011

Conference and Workshop

Wang Ling, Guang Xiang, Chris Dyer, Alan Black, and Isabel Trancoso. Microblogs as Parallel Corpora. ACL'13, 2013. [pdf]
Yingze Wang, Guang Xiang, and Shi-Kuo Chang. Sparse Multi-task Learning for Detecting Influential Nodes in an Implicit Diffusion Network. AAAI'13, 2013
Miaomiao Wen, Zeyu Zheng, Hyeju Jang, Guang Xiang, and Carolyn Rose. Extracting Events with Informal Temporal References in Personal Histories in Online Communities. ACL'13 (short paper), 2013
Guang Xiang, Zeyu Zheng, Miaomiao Wen, Jason Hong, Carolyn Rose, and Chao Liu. A Supervised Approach to Predict Company Acquisition with Factual and Topic Features Using Profiles and News Articles on TechCrunch. ICWSM'12 (short paper), 2012. [Short-version][Long-version][Our data set and usage][The official CrunchBase data (released on June 6, 2013)]
Guang Xiang, Bin Fan, Wang Ling, Carolyn Rose, and Jason Hong. Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus. CIKM'12 (short paper), 2012. [pdf]
Lu Jiang, Alexander Hauptmann, and Guang Xiang. Leveraging High-level and Low-level Features for Multimedia Event Detection. ACM Multimedia 2012
Wang Ling, Nadi Tomeh, Guang Xiang, Alan Black, and Isabel Trancoso. Improving Relative-Entropy Pruning using Statistical Significance. COLING 2012
Gang Liu, Guang Xiang, Bryan Pendleton, Jason Hong, and Wenyin Liu. Smartening the Crowds: Computational Techniques for Improving Human Verification to Fight Phishing Scams. SOUPS'11, 2011
Guang Xiang, Carolyn Rose, Jason Hong, and Bryan Pendleton. A Hierarchical Adaptive Probabilistic Approach for Zero Hour Phish Detection. ESORICS'10, 2010
Jialiu Lin, Guang Xiang, Jason Hong, and Norman Sadeh. Modeling People’s Place Naming Preferences in Location Sharing. Ubicomp'10, 2010
Guang Xiang and Jason Hong. A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval. WWW'09, 2009

Working Papers

Guang Xiang, Jason Hong, and Carolyn Rose. A Feature-type-aware Cascaded Learning Framework for Efficient Phish Detection.

Service

To be updated

Misc

To be updated

This page is under construction.