Example YOLO output https://www.youtube.com/watch?v=69Ii3HjUiTM ****************************************************************************** train with labelled data ***************************************************************************** Previous techniques Talk about k-Nearest Neighbors Decision Trees Logistic Regression Support Vector Machine Naive Bayes Random Forest Gradient Boosting Effect of priors Bayes law ***************************************************************************** What is an image? W*H*3 -> huge vector of numbers fully connected network with that input is too big. translation invariance What is a CNN? convolve local operators that are applied all over the image output is some number of features per location What is pooling? convolve local operators like max/min or sum/average stride reduces resolution goals - better features - less data Purpose of RelU nonlinear processing Backend - ``flatten'' or stop convolving What is the purpose of fully connected layers? Global interactions (2 eyes, for example). Could do multi-scale Other tricks normalization layers skip connections fixed weights? what about rotation invariance? **************************************************** 1 step YOLO 2 step Propose regions classify regions ****************** YOLO Real time Process grid cells many overlapping grids - random patch shapes (more than rectangle) of different sizes? - use some feature to guide patch selection non-maximul suppression to get rid of multiple predictions of the same thing not as good as much slower algorithms Fast RCNN How score? Intersection Over Union IOU > 0.5 -> positive prediction Average Precision - area under precision vs. recall curve Recall = total correct positive predictions made by model/total positive instances out there Precision = true positives/total predictions made by moddel Cross validation Has trouble with multiple objects in same grid cell. ***************** These networks have 10s of millions of parameters/weights Multilabel classification (multiple outputs)? ***************** Segmentation Bounding box -> actual borders train with labelled data Region growing, split, and merge - group pixels or small regions - local similarity Edge-based methods - find borders - local differences Threshold methods - classify pixels or small regions - mean shift Watershed methods - start region growing at local max distance Clustering methods - k-means - global similarity encoder - decoder convolution encode bottleneck decode expand downsampling process upsampling add skip connections ********************************************************* Latent spaces ********************************************************* LSTMs recurrent networks