LTI Colloquium Report on Miles Osborne's talk on "Cross-stream Event Detection in Twitter"

by Prasanna Kumar

This report is an example of constrained writing. Starting from the next paragraph, the writing will have two constraints imposed on it. In
keeping with the theme of the talk, the first constraint is that each line will only have 140 characters or less (I set the emacs word wrap
parameter to 140 characters) . There may be multiple sentences on each line but no sentence is allowed to cross line boundaries. The second
constraint is that I am only allowed to use the top 1000 words in UK English (http://www.bckelk.ukfsn.org/words/uk1000n.html) to write this
report (with the exceptions of the words 'Twitter', 'tweet', and 'Wikipedia'). Well, let's begin...

--------------

This talk is on trying to use Twitter as a means of getting news about the world around us faster than human written news. 
There are many hundred thousand thousand tweets from the people every second. Most tweets talk about things which are of no use to us.
Our work is to find real time good news out of the thousands of things people tweet every second. 
There are some really good tweets for news, especially the tweets from first people at places when bad things are happening.
We have seen such cases in places where the earth moves, man made things fall to the ground, or when big brother goes from bad to worse. 
We also want to answer the question "Does Twitter really get news before the usual news places"? This is a hard question to answer.

Over all, this talk talks about how to find the news from tweets, how to do it fast, and how to do this by using many many tweets.
This talk will also talk about to make it even better, and see if it does better or worse than the usual news.
Let us start thinking about how to find the news from tweets. We really need to find the first person who tweets about the happenings.
We should make sure to talk about this as quickly as possible, and make sure not to talk about the second, third, or other people's tweets.
We will do this by taking each story and changing it into a line of numbers and comparing it to other lines of numbers from other stories.
If two lines of numbers are close to each other, then it means that the stories are also close to each other. 
If the line of numbers for a new story is not close to any line of numbers for any other story that we have, then it is a new story.
What we are doing is a kind of nearest next door person finding. The way we talked about this earlier is a slow way of finding this.

A faster way to do this finding is by 'changing' each line of numbers into a different set of line of numbers which may be longer or shorter.
Lines of numbers which are close to each other before being 'changed' are still close to each other after being 'changed'. 
But seeing which lines of numbers are close to each other is a lot faster after 'changing' the lines of numbers.
Doing the finding this way means that the finding will take the same amount of time no matter how many tweets we have.
Where as in the old way, doing the finding will take as much time as the number of tweets we have. So, our new way is much faster.
But since we have many hundred thousand thousand tweets, the new way still takes too much time. We need to make it a lot faster.
We can make it faster by running the new way at many many places at the same time. This lets us go through the many number of tweets faster.

Now, let us think about how to make this better. Only five out of one hundred tweets talk about something that is news worthy. 
Only one out of one hundred 'news' that our way finds, is actually news. Our way thinks that a lot of things which are not news is news. 
One way to find better news out of Twitter is to slow down and wait for a long time for our way to find the right news from the tweets.
Another way to find better news is to remember that if something is news worthy, many people will talk about it at many places. 
Also, if someone tweets something, and lots and lots of people talk about that person's tweet, then the tweet can be considered as news.
So, we can do both things together by waiting for a while to see if lots of people talk about something. This will make our way better.
The longer we wait, the better our way becomes.

A great way of making this even better is to use Wikipedia. If a particular story looks like it might be news, we look at it on Wikipedia.
If the things that the tweet talks about are changing a lot on Wikipedia, it means that a lot of people are reading it at that time.
This usually only happens if the story that lots of people are reading about on Wikipedia is news worthy.
We can also read the usual news and find tweets that came out at the same time as the news stories.

Now, going back to our first question, does twitter really get news before the usual news places? Sometimes yes, sometimes no. 
In general, twitter gets news from the usual news places. The usual news places carry a lot of news that twitter does not. 
However, twitter does carry news that the usual news doesn't, in cases like when really bad things happen, and people see it and tweet it.
Or when there is some news which only people in some places will care about, like something that happens in a small town. 

---------------

Whew! That was hard. 
As for the page length requirements, the above text is about 3 pages long on Libreoffice with 12 point font and 1.5 spaced lines.

Inspired by Dinosaur comics (www.qwantz.com) and http://xkcd.com/1133/