Research.

Personal Details | Publications | Misc | CV | back

Network analysis, large graph mining, evolution of social networks

In our recent work we found interesting and unintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. The main objective of observing the evolution patterns is to develop models that explain processes which govern the network evolution. Such models can then be fitted to real networks, and used to generate realistic graphs or give formal explanations about their properties. In addition, our work has a wide range of applications: we can spot anomalous graphs and outliers, design better graph sampling algorithms, forecast future graph structure and run simulations of network evolution.
Another important aspect of this research is the study of ''local'' patterns and structures of propagation in networks. We aim to identify building blocks of the networks and find the patterns of influence that these block have on information or virus propagation over the network. Our recent work included the study of the spread of influence in a large person-to- person product recommendation network and its effect on purchases. We also model the propagation of information on the blogosphere, and propose algorithms to efficiently find influential nodes in the network.
Questions I ask in my work:
- How do large social networks evolve?
  In contrast to common wisdom and belief of last 30 years we show that as the network grows and evolves it gets denser and the diameter shrinks.
- How to detect virus outbreaks in networks and how to find influential notes in a network?
  What are most informative blogs and where to place sensors in city water distribution network to save most people.
- How does the social network of the whole-world look like?
  Analysis of Microsoft Instant Messenger network and conversation. 1 month of data, 240 million people, 1.3 billion links, 30 billion conversations, 4.5TB of data.
- How does viral marketing work (propagation of recommendations on the social network)?
  For the first time we had large scale data (16 million) on exact influence propagation, i.e., people recommending products to each other and purchasing them.
- How to generate a realistic synthetic graph?
  Given a large real graph, how can we generate a similar synthetic graph. We present an analytically and computationally tractable Kronecker graphs model and a fitting procedure that can do this.
- How to predict the quality of web search result without looking at the content?
- How does the information propagate on the web?
  Propagation of information on the blogosphere.
- See other publications on these topics.

pre-PhD things:

Text mining

With Natasa Milic-Frayling from Microsoft Research and Marko Grobelnik from Jozef Stefan Institute we are working on a method for summarizing text documents by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject--predicate--object. We then apply cross-sentence pronoun resolution, co-reference resolution, and semantic normalization to refine the set of triples and merge them into a semantic graph. This procedure is applied to both documents and corresponding summary extracts. We train linear Support Vector Machine on the logical form triples to learn how to extract triples that belong to sentences in document summaries. The classifier is then used for automatic creation of document summaries of test data.
See our publications on this topic.
I spent the summer of 2002 at Royal Holloway University of London, Department of computer science in a group of John Shawe-Taylor. We were working on text classification problem on very imbalanced training sets (small Reuters categories, having 10,000 negative training examples and only a few (10) positive training examples). We extended the notion of linear programming boosting to handle uneven datasets and extensively compared the performance of a number of different boosting strategies, concentrating on the problems posed by uneven datasets.
See the ICML 2003 paper: Publications.

Link analysis

With Janez Brank we won download estimation task in KDD Cup 2003, a knowledge discovery and data mining competition held in conjunction with the Ninth Annual ACM SIGKDD Conference. Competition focused on problems motivated by network mining and the analysis of usage logs. See our KddCup website with the technical report: http://ai.ijs.si/kddcup03.

Text-to-Speech Synthesis

Govorec is a High Quality Text to Speech Synthesis System for Slovenian Language. Govorec uses phonetic dictionary of 500,000 words and a set of rules for letter-to-sound transformation. Our voice data-base is consisted from 1224 diphones. For automatic segment concatenation synthesis a TD-PSOLA algorithm is used. Govorec is Microsoft SAPI 4.0 compliant text-to-speech engine.
Govorec was awarded by Government Office for Disabled and Chronically Sick of the Republic of Slovenia as best innovation in year 2000 in the field of training, life and work of the disabled. Read more and download the engine.

Information retrieval

During the summer of 2001 I was at Microsoft Research working on WebTrails. Our application supports user in accessing pages that the user has seen during an ongoing search and navigation session. To support the user in accessing pages in the web navigation history. We introduced web-trails or sessions - each page is part of a trail.
We provided a search facility over the pages in web navigation history. It comprises the following features:
- Search on color scheme of the page (user can choose from a more specific to more general color palette and specify colors for individual regions of page)
- Search by example. We abstract page color scheme and present a set of representative pages. User can browse the tree and get more and more specific about page layout.
- Search on page contents or annotations (link text, search query, page content)
- Search based on time the page was seen.
We also provided tool for supporting the current navigation/search session: for each page we create it's thumbnail and present a set of thumbnails to a user. This provides much easier navigation than standard Back/Forward buttons. User can look to a number of linear views of the navigation history: time ordered accumulation, unique page accessed, topology of navigated pages, time/content abstraction of navigation history.
See WebTrails presentation slides.

Old stuff

Real time stereoscopic computer vision: Detection of human bodies using a sequence of stereo images. Back in 1999 I got a Small vision System stereo camera from SRI and built a real time system for tracking people and determining their position in 3-D.
Constraint Logic Programming. In high school I was playing with Constraint Logic Programming for Scheduling School Timetables. I developed a finite domain CLP library written in C++.