Some problems

Introduction to Machine Learning - 10-715

Design a streaming algorithm to find frequent items. Note that the distribution might change over time. A possible strategy is to modify the a-priori algorithm.
Use secondary information to improve collaborative filtering, e.g. for the Netflix problem you could incorporate IMDB and Wikipedia.
Financial forecasting as a high-dimensional multivariate regression problem. E.g. you could try predicting the price of a very large of securities at the same time. Possibly using news, tweets, and financial data releases to improve the estimates beyond a simple technical analysis.
Detect trends e.g. in the Tweet stream. Forecast tomorrow's keywords today. How quickly can you detect new events (earthquakes, assassinations, elections)?
Nonlinear function classes. Can you find efficient sets of basis functions that are both fast to compute and sufficiently nonlinear to address a large set of estimation problems.
Parallel decision trees. Can you design a data parallel decision tree / boosted decision tree algorithm? The published results are essentially sequential in the construction of the trees. One suggestion would be to take the Random Forests algorithm, re-interpret it as a Pitman estimator sampling from the version space of consistent trees, and then extend it to other objectives.