MILL:  A Multiple Instance Learning Library

Developed by:
Jun Yang
School of Computer Science
Carnegie Mellon University

Version: 1.00 Version
Last updated: 11.26.2008

Review of Multi-instance learning
Download & Installation
Quick Help
Full Help
Experiments Result


MILL (MIL Library) is an open-source toolkit for multiple instance learning algorithms written in Matlab. Multiple-instance learning (MIL) is a form of semi-supervised learning where there is only incomplete knowledge on the labels of the training data. Specifically, instances in MIL are grouped into a set of bags. The labels of the bags are provided, but the labels of instances in the bags are unknown. However, a bag is labeled positive if at least one instance in the bag is positive, and a bag is negative if all the instances in it are negative. MIL algorithms attempt to learn a classification function that can predict the labels of bags and/or instances in the testing data. The applications of MIL include molecule activity prediction, text categorization, image classification and retrieval, etc.

MILL consists of the following multi-instance learning algorithms:

  • Iterated-discrim APR  [Dietterich et al., 1997]
  • Diverse Density (DD) [Maron and Lozano-Perez, 1998]
  • EM-DD [Zhang and Goldman, 2001]
  • Two SVM variants for MIL [Andrews et al., 2002]
  • Citation-kNN for MIL  [Wang and Zucker, 2000]

Moreover, it supports a variety of evaluation methods, including:

  • Cross-validation
  • Leave-one-out evaluation
  • Train only or test only
  • Train and test using separate files
  • Split the data in one file for training and testing

Literature Review of Multi-Instance Learning

To get a better idea on what multi-instance learning (MIL) is about and how MIL methods work, please read a literature review of MIL methods and applications or its PDF version.

Source Code

The software is free for any scientific use. Please contact me, if you are planning to use the software for commercial purposes. The software must not be modified and distributed without prior permission of the author. If you use MILL in your scientific work, please cite as

  • Jun Yang, MILL: A Multiple Instance Learning Library,

System Requirement

You need Matlab and Perl to run MILL. You can run it in Windows or Linux/Unix. However, the two of the SVM-based MIL algorithms (inst_MI_SVM and bag_MI_SVM) need to call executables in LibSVM package, and these executables are complied in Windows. If you have these Linux\Unix versions of LibSVM executables, you can run them in Linux\Unix as well.

Download & Installation

  1. Install Matlab in your machine (Skip this if it is already installed).
  2. Download the package from
  3. Unzip the .zip files into a arbitrary directory, say $MILLRoot
  4. Add the path $MILLRoot in Matlab. Use addpath command or menu File->Set Path.
  5. It is ready to go. 

A Quick Example

This section explains how to use the MILL software. The main module of MILL is called "MIL_run", which can be called using the following approaches:

(1) In Matlab command line, type

MIL_run('classify -t input_file [options] [-- EvaluationMethod [options]] –- BaseClassifier [options]'); 

For example, the following command line uses the Diverse Density (DD) method to classify the data in, using 10 random starting points and aggregating the results from the 10 runs by average. 5-fold cross-validation is used as the evaluation method. The data are shuffled and normalized to [0,1] before classification.

MIL_run('classify -t –sf 1 -n 1 -- cross_validate -t 5 -- DD -NumRuns 10 –Aggregate avg');

You might need to run “clear global preprocess;”  before classification to clean the global variable "preprocess". The various options supported will be explained in details later.

(2) Write a .m files in the current directory as follows, and run

      global preprocess; 
      %Normalize the data
      preprocess.Normalization = 1;
      %Evaluation Method: 0: Train-Test Split; 1: Cross Validation
      preprocess.Evaluation = 1;
      preprocess.root = '.'; // Change to your files 
      preprocess.output_file = sprintf('%s/_Result', preprocess.root);
      preprocess.input_file = sprintf('%s/', preprocess.root);
      run = MIL_run('DD -NumRuns 10 –Aggregate avg');

This example will do exactly the same thing as the command line does.

Input Formats

The input file containing the training examples can be in two formats:

(1) Normal format (for instances with non-zero values on most of features)

<instance_name>,<bag_name>,<label>,<value> ... <value>
<instance_name>,<bag_name>,<label>,<value> ... <value>
... ...
<instance_name>,<bag_name>,<label>,<value> ... <value>

Each line of the file represents one instance, which has an instance name and a set of values as its feature vector. Instances of the same bag must have the same <bag_name>. The <label> the end of each line is set to 0 or 1, since currently MILL does not support real-valued label or multi-class labels. Note that <label> denotes the label of the instance, and MILL will automatically derive the bag labels from the instance labels (i.e., positive if at least one instance in the bag is positive, otherwise negative). However, if the instance labels are unknown, please set <label> of instances to the labels of the corresponding bags. Note that in each line the numbers must be separated by comma; blanks and tabs can be added for better separation.

An example data file is given below, where the first (negative) bag has two instances and the second (positive) one has three instances.


The normal format input can be specified using the “-if 0” in the command line mode, or by setting preprocess.InputFormat =0 in the .m file that calls MILL. But this can be skipped since it is the default.

(2) Sparse format (for instances with zero values on most of features)

<instance_name>,<bag_name>,<label>,<feature>:<value> ... <feature>:<value>
<instance_name>,<bag_name>,<label>,<feature>:<value> ... <feature>:<value>
... ...
<instance_name>,<bag_name>,<label>,<feature>:<value> ... <feature>:<value>

In sparse form, each feature value is represented as a <feature>:<value> pair where <feature> is the ID of the feature (i.e., order starting from 1).  Only non-zero features must be presented, while zero-valued features can be skipped. Hence, each line can be of different length, depending on the number of non-zero features for each instance.

The sparse format is especially helpful (actually, necessary) in application such as text categorization, where each instance (say, a document or a passage) is indexed by a term vector which can be of thousands of or even more dimensions. Internally, if the input is in sparse format, all the data are represented using sparse matrixes in Matlab, which significantly improves the time and space efficiency of handling high-dimension data.

An example data file in sparse format is given below, where each instance is represented by a 10-d feature vector.


The sparse format input can be specified using the “-if 1” in the command line mode, or by setting preprocess.InputFormat =1 in the .m file that calls MILL.

Output Formats

There are two major output files, the prediction file and the result file, which can be specified -p and -o respectively in the command line mode, or by specifying proprocess.pred_file and proprocess.output_file in the .m file. By default, their names are set to $(input_file).pred and $(input_file).result. The file $(input_file).pred contains the prediction results for each bag in the testing data, including the probability (of being positive), predicted label, and true label:

Index Probability Predict     Truth

1     0.923       1           1

2     0.761       1           0

3     0.082       0           0

4     0.624       1           0

5     0.566       1           1

6     0.887       1           1

7     0.939       1           1

8     0.139       0           0

9     0.821       1           0

10    0.765       1           0

11    0.426       0           0

12    0.318       0           0

The file $(input_file).result contains the overall prediction statistics for the test set, including the classification accuracy of bag labels, and classification accuracy of instances if the algorithm is able to classify instances. 

Processing Filename:
   Classifier: DD -NumRuns 10 –Aggregate avg
   Message: Cross Validation, Folder: 5, Shuffled 
   Bag label accuracy = 0.7790, Instance label accuracy = 0.8307

Options and Classifiers

The basic grammar for MILL command is as follows. Note that "--" is used to separate different parts of the input commands. Spaces MUST be added before and after the "--", otherwise MILL cannot parse the command correctly.

MIL_run('classify -t input_file [general_option] [-- EvaluationMethod [evaluation_option]]  -- BaseClassifier [classifier_option]); 

As we can see, there are three types of options, which are the general options, options on evaluation method, and options on base classifiers. We review these options as follows.

1. The general options

If the user runs MILL by writing a .m file, the general options are specified by setting the value of the components of preprocess. We give the component corresponding to each option in the bracket after the description after each option.

-v (def 1): Vebosity of messages. (preprocess.Vebosity)
-sf (def 0): Shuffle the data or not, 0 for no shuffle (preprocess.Shuffled)
-n (def 1): Normalize the data or not, 1 for normalize (preprocess.Normalization)
-pf (def 0): The prediction format, either 0 or 1 (preprocess.PredFormat)
-t (def ''): The input file name (preprocess.input_file)
-o (def ''): The output file name (preprocess.output_file)
-p (def ''): The prediction file name (preprocess.pred_file)
-if (def 0): The input format is normal or sparse, 1 for sparse (preprocess.InputFormat).
-oflag (def 'a'): Output flag, 'a' for appending, 'w' for 'overwriting' (preprocess.OutputFlag)
-dir (def ''): The working directory, which is $MILLRoot. (preprocess.WorkingDir)
-distrib (def 0): Enforce the same distribution of bag labels or not, 1 for enforce. (preprocess.EnforceDistrib)

The -distrib option is used to enforce the distribution of the bag labels in the testing data the same as in the training data. This is done by ranking all the bags according to their probability (of being positive), and label the top-K bags with the highest probability as positive while label the remaining as negative bags. The K is chosen so that the ratio of positive bags in the testing data is equal to that in the training data. Note that this only works in cases where the algorithms that output meaningful probabilities (bag probability is set to the maximum instance probability from the bag).

2. The evaluation methods. The default method is to split the input file equally into training and testing sets.

·         train_test_validate (default)
Desc: split the data in the input file into training and testing set
Option:      -t (def -2): Indicate the training-testing splitting boundary as the number of bags from the beginning of the input file. If this option is set to a negative value x, the first 1/(-x) portion of the data is used for training and the rest are used for testing (E.g., -2 means the first half the data for training). 
·         cross_validate
 Desc: The cross-validation method based on n even partitions of the data.
Options:     -t (def 3): The number of folders for cross-validation.
·         leave_one_out

Desc: Each time train on all the data except one data item (bag), and test it on the remaining bag. Repeat the process until every bag has been used for testing once. Report the average performance.

·         test_file_validate
Desc: Use the input file as the training set, and provide an additional file as the testing set.
options: -t (def ''): The additional testing file.
·         Train_only
Desc: Use the input file for training only, and do no testing
Options: -m (def ''): The output model file
·         Test_only
Desc: Use the input file for testing only. The model is provided in the additional model file.
Options: -m (def ''): The input model file

3. The base classifier options.

·         DD

Desc: The diverse density algorithm [Maron and Lozano-Perez, 1998]


 -Scaling (def 1): Whether to scale different dimensions of the feature space, i.e., weighting the feature dimensions. 1 for scaling.

-NumRuns (def 10): The number of runs using different starting points.        

-Aggregate (def ‘avg’): The method of combining the classification results obtained using different starting points. “avg” is to use the average of the probabilities from multiple runs as a data item’s final probability, ‘max’ or ‘min’ is to use only the results from the model that gives the maximum or minimum objective function (i.e., likelihood of training data).

-Threshold (def 0.5): The threshold on probability above which a bag/instance is classified positive.

·         EMDD
Desc: The EM-DD algorithm [Zhang and Goldman, 2001]
 - Scaling: as in DD
 - NumRuns: as in DD
 - Aggregate: as in DD
 - Threshold: as in DD
 - IterTolerance (def 0.1): The minimum change of objective function as the termination criterion for each optimization (each M-step). 
·         kNN
Desc: Citation kNN method for MIL [Wang and Zucker, 2000]. Unlike normal kNN, it not only considers the data points considered as neighbors of the current point in question (references), but also those that consider the current point as neighbors (citers), in order to be more robust in prediction. 


-BagDistType (def ‘min’): The max or min form of the Hausdorff distance, see [Wang and Zucker, 2000] for details.

 -InstDistType (def ‘euclidean’): The distance metric between two instances, as either “euclidean” or “cosine”. The latter is used for text categorization or similar applications.

-RefNum (def 2): The number of reference nodes (neighbors) considered.

 -CiterRank (def 4): The rank deciding whether a data point is the citer of the current data point. For example, CiterRank = 4 means for point A to be counted as a citer of B, B must be at least the 4th closest point to A.


·         iterdiscrim_APR [Dietterich et al., 1997] 

Desc: Dietterich (1997) proposed a series of MIL algorithms formulated as finding an axis-parallel rectangle (APR) in the feature space as the target concept. Iterdiscrim_APR is an iterative, discriminative variant which is empirically proved to outperforms all other variants.

Options (refer to the paper for details): 

-KerelWidth (def: 0.999): The width of the Gaussian kernel used in the kernel density estimation at the expansion stage.

-OutsideProb (def: 0.01): The cumulative probability falling outside the expanded APR given the density estimated by kernel density estimation.

-GridNum (def: 10000): The number of sample points used in the kernel density estimation.


·         Inst_MI_SVM 
Desc: The mi-SVM as an instance-level SVM variant for MIL [Andrews et al., 2002]. This classifier is built on the LibSVM package.


-Kernel (def 2): Type of kernel used in SVM. 0 for linear, 1 for polynomial, 2 for RBF, 3 for sigmoid
-KernelParam (def 0.05): Kernel parameter, which represents the degree of polynomial kernel, or gamma for RBF kernel.
-CostFactor (def 1): Cost factor that balances the margin width with the ratio of correct classification.  
-NegativeWeight (def 1): The weight for negative examples given that the weights for positive examples are fixed to 1. 

-Threshold (def 0.5): The threshold on probability above which a bag/instance is classified positive.


·         Bag_MI_SVM 
Desc: The MI-SVM as an bag-level SVM variant for MIL [Andrews et al., 2002]. This classifier is built on the LibSVM package.
Options: Same as the options for inst_SVM.

More Examples

Note that the following examples can be ran by simply copying the command line into the Matlab command window. The example data files used in these examples are actually available in the MILL directory.

Example 1

Classify the data in using iterdiscrim_APR method, using a 50%-50% training-testing split (default). Shuffle the data before the classification, but do not normalize the data.

MIL_run('classify -t -sf 1 –n 0 -- iterdiscrim_APR');

Example 2

Classify using the Citation-kNN approach for MIL, using min-Hausdorff distance and 3 references and rank-5 citers for each data point. for each point, and citers with rank 5. Shuffle the data before classification. Use 10-fold cross-validation to evaluate the performance.

MIL_run('classify -t -sf 1 -- cross_validate -t 10 -- kNN –RefNum 5 –CiterRank 5');

Example 3

Classify the data in using the bag-level SVM variant for MIL. Use the first 100 (un-shuffled) bags in the file for training and the rest for testing. The underlying SVM uses RBF kernel with gamma = 0.01 and cost = 10.

MIL_run('classify -t -sf 0 -- train_test_validate -t 50 -- bag_MI_SVM -Kernel 2 –KernelParam 0.1 -CostFactor 10');

Example 4

Train with the data in and test on  using EM-DD algorithm with default parameters.

MIL_run('classify -t -- test_file_validate -t -- EMDD');

Example 5

Classify the data in using the Diverse Density algorithm and evaluate it using leave_ont_out scheme. The DD algorithm uses 20 random starting points and combines the results of the 20 runs by averaging.

MIL_run('classify -t -- leave_one_out -- DD –NumRuns 20 –Aggregate avg');

Example 6

Train a classifier using instance-level SVM from the data in without testing it. Store the model file in model.txt. Use polynomial kernel of 2 degree.

MIL_run('classify -t -- train_only -m model.txt -- inst_MI_SVM -Kernel 1 –KernelParam 2');

Then, test the model trained on

MIL_run('classify -t -- test_only -m model.txt -- inst_MI_SVM');

Example 7

Do 3-folder cross-validate using instance-level SVM on a sparse data set in Using linear kernel for the SVM. Enforce the label distribution on the testing data to be the same as in the training data.

MIL_run('classify -t -if 1 –n 0 -distrib 1 -- cross_validate -t 3 -- inst_MI_SVM -Kernel 0');

Preliminary Experiments

Partially to fulfill the requirement of a course project, I’ve done some preliminary experiments of MILL on three different tasks, namely drug activity prediction, text categorization, and image classification. Please find out the report on preliminary experiment results.


Bug Reports

If you find bugs or you have problems with the code you cannot solve by yourself, please contact me via email


This software is free only for non-commercial use. It must not be modified and distributed without prior permission of the author. The author is not responsible for implications from the use of this software.


Parts of the source code for various evaluation methods are modified from the MATLAB Arsenal at developed by Rong Yan at


[Andrews et al., 2002] S. Andrews, I. Tsochantaridis, T. Hofmann. Support Vector Machines for Multiple-Instance Learning. NIPS 2002.

[Dietterich et al., 1997] T. G. Dietterich, R. H. Lathrop, T. Lozano-Perez. Solving the Multiple-Instance Problem with Axis-Parallel Rectangles. Artificial Intelligence Journal, 89, 1997.

[Maron and Lozano-Perez, 1998] Oded Maron , Tomás Lozano-Pérez, A framework for multiple-instance learning, Proc. of the 1997 Conf. on Advances in Neural Information Processing Systems 10, p.570-576, 1998.

[Wang and Zucker, 2000] J. Wang and J.-D. Zucker. Solving the multiple-instance problem: a lazy learning approach. Proc. 17th Int'l Conf. on Machine Learning, pp. 1119-1125, 2000.

[Zhang and Goldman, 2001] Q. Zhang and S. A. Goldman. EM-DD: An improved multiple-instance learning technique. In Neural Information Processing Systems 14, 2001.


* Last modified April 28th, 2005 by Jun Yang

website page counter