"better data often beats better algorithms, and designing good features goes a long way"

Linear Methods

The essential difference between linear methods is their loss functions, and different loss functions can dramatically change the position of the decision hyperplane. E.g. for regression, every data point contributes equally to the loss and the task is to regress to as many as points. For logistic regression, those that are far away from decision boundary (correctly classified) contribute very little (inversely exponentially) to the loss. Most loss is from wrongly or barely correctly classified examples. For SVM, those correctly classified don't contribute to the loss at all, hence only those within the boundaries will matter. Choices are common non-linear classifiers are SVM (with kernels), decision tree (random forest), neural networks, etc. Decision tree works with great categorical features, easy to train and its decision boundary is nonlinear. But it is prone to overfitting when feature size is large (Remember simple model generalizes better while tree has many parameters). Other methods on the other hand are not able to directly work with categorical features (unless vectorizing them). Note Naive Bayes is also a linear method (see http://pages.cs.wisc.edu/~jerryzhu/cs769/nb.pdf)

Large Scale Linear Methods