Good News or Bad News - Objections Didn't get some observations, pluging laptop :( Best results (explanations) using sories o predic results in the same day If used to predict next day results then i's random News stories is abou what happens odya and not what happens tomorrow corpus has been removed It can't be used for investment, but it can used acording to previous paper. Long history of sthese studies in finance - Event Study It is very hard to find patterns (everybody is looking for it) - Example with Disney bying Marvel. he discrete Jump event (fla before and after). Efficiency gets incorporated pretty quickly. This kind of study is an old idea in finance. In finance people typically look a specific events (merger, ec) - Goldman Sachs where on identifying sock storeis early on. But its pretty challenging to stay ahead of the curve. - Paper goal is sentiment analysis, not stock prediction. It is a computer science paper. - What is feature selection? Description of basic feature selection. - he authors are ignoring frequency. This could impact the results negatively. They threw out most of the words. - Interesting thing to do today is would be to analyze lots and lots of news. News vs blogs - With bag of words, lots of noise? Other source of nois (atribution, source, etc) - Presence is superior to frequency? - There is a lot of data available in this case, could we use much more data (rss reader) and process it efficiently? - Biggest issues, tamp down the crazy noise. - Are stock prices are noisy and crazy (random numbers?) excess speculation. Should we regulate it? - very hard to predict on monday morning explanation. Hard question - Bag of words approach is likely to fail. - need model of how the stock market works and what news would trigger what events. Discussion of how to incporporate prior knowledge into machine learning. How to encode knowledge in a way that can be used by machine learning. - Most events have been studied and their connection to the stories. Mining Concurrent Text and Time Series UMASS Information Retrieval Group - they use a language model vs SVM - What is the story? bunch of preprocessing on data non-parametric method ot discretize the series. Detect trends. Then they align in the news stories. Naive Bayes, corresponds to a set of naive assumptions. Mixure model for trens First they pick the trend. Given the trend you pick the corresponding language model. The rest is about tuning parameters ans smoothing. - Objections Doing naive bayes without prior? Numerator is independent of the data. Plays role in smoothing. Always number equations. Where does Lambda come from? Set to the witten and bell estimate - We want to know if predicting a document given the trend will be different then predicting the document given general language model. The problem is that they plan to make money out of trend, it is not clear that the correlation is valid. The maximization problem is intresting from an NLP view, but not from a "making money" prespective. - Generative vs Discriminative they should be discriminative rather than generative. We care about making money, not every little thing. Generative Model: Tell me what the probability of every possible world and I can model the entire world. Discriminative: I care about a specific thing, not about everything. I want to optimize that one thing. I don't need a model of the whole world. Two views of machine learning that people often seek middle ground between. - Paper fully generative model. - To extend the model I will try to learn a model over policies. - How was it tested? buy $10,000 if positive, sell $10,000 if negative. (BAD) you should have the notion of money investment and then model average return plus standard deviation Are you buying based on trends or stock? an Active - Passive decomposition would be a better way to evaluate or Sharpe ratio gives a better understanding of the value of the algorithm. It makes the evaluation look dumb, but it doesn't bias the evaluation. Distributions in Finance are (flat/fat)? tailed, it come from the fact that variation through time is time-variant. - For writing finance