Implementing and consuming Machine Learning at scale are difficult tasks. MLbase is a platform addressing both issues, and consists of three components -- MLlib, MLI, ML Optimizer.
ML Optimizer: This layer aims to
automating the task of ML pipeline construction.
The optimizer solves a search problem over feature
extractors and ML algorithms included in MLI and MLlib. The ML
Optimizer is currently under active development.
MLI: An experimental API for feature
extraction and algorithm development that introduces high-level
ML programming abstractions. A prototype of MLI has been implemented against
Spark, and serves as a testbed for MLlib.
MLlib: Apache Spark's distributed
ML
library. MLlib
was initially developed as part of the MLbase project, and the
library is currently supported by the Spark community. Many
features in MLlib have been borrowed from ML Optimizer
and MLI, e.g., the model and algorithm APIs, multimodel training, sparse
data support, design of local / distributed matrices, etc.