The key steps of visual recognition are detection, localization, and classification. Detection checks if a particular visual entity is present in the image. Localization pinpoints to the accurate location of the visual entity in the image, and classification assigns the visual entity to a particular class. Let us look at the task of detecting, localizing, and classifying coins in a sample image. As coins are circular in shape, we can use Hough Transform to detect circles in the image. Let us quickly go over the concept of Hough Transform. Hough Transform can be considered as a model fitting technique that can elegantly recognize simple shapes like lines, circles, and ellipses. Now, consider the figure here where the points are edge pixels in an image. If we want to fit a line to these edge pixels, a general line equation is y equals to mx plus c. Now, given a point, there are infinite number of lines that could pass through that particular point. Now, how do we narrow down onto our solution? One way is to repeat the same step over all the edge pixels and then try to see which one of these lines is being satisfied by majority of these pixels. Clearly, this particular line is our solution. That particular m which corresponds to the slope, and c which corresponds to y-intercept, will have the maximum number of votes in the accumulated array. Now, y equals to mx plus c may not be a good representation, because the slope tends to infinity. Polar coordinates are a better way of representing straight lines in Hough space, if Theta is the angle subtended by the perpendicular line onto the given line and R is the perpendicular distance. Now, let's see how the voting goes on. Every individual point in the image space is going to vote on these five lines. As you can see, the consensus is reached only when the line is blue. So that's where r and Theta seemed to be the same, and that receives the maximum number of votes. You have to understand that all these set of lines in Euclidean space map to sinusoids in the polar Hough space. The point of intersection of these sinusoids is our solution. If the edge pixels are on a straight line, then our solution will be a single point. In reality, we'll have to find the solution in a region rather than at a single point. When there is no straight line, there won't be any solution. For a square, we would see that four points have the maximum number of votes, corresponding to four straight lines. Moving onto circles, we have three parameters, a,b, representing the center of the circle and the radius r. If we fix the radius r, then we just have two parameters a,b, that we'll try to find out. Now, for any edge pixel X_i,Y_i, if we fix the radius r, let's see how many circles we can draw that can pass through that particular point. The locus of all those circles is, again, a circle. Now, we can follow exactly the same procedure we followed for this straight line. So we are going to do or repeat the step of plotting all those circles at every individual point and see which one of these circles gets the maximum number of votes. So if we do that, those are the points that have maximum number of votes. As we already know the radius, I could plot the solution circles that way. Now, in reality, you can't fix the radius, because there could be circles of different lengths. So it's important that you would repeat this procedure for a range of radii. The beauty of Hough Transform is that it is immune to partial occlusions, because this is going to be a voting-based scheme, it can overcome the problem of occlusion. Detecting a circle using Hough Transform goes side-by-side with localization. Now, the next task is to classify if it is a quarter or a penny. For this task, we could use the color within the detector circle. Although this problem explains the concepts of detection, localization, and classification very well, this technique cannot be generalized to other computer vision tasks like detecting faces or cats in images. This is where we seek help from machine learning techniques. Many modern computer vision problems can be solved by using a classifier. The image classification problem is the task of assigning an input image, a single label from fixed set of categories. This is one of the core problems in computer vision that despite its simplicity, has a large variety of practical applications. Moreover, many other seemingly distinct computer vision tasks such as object detection, segmentation, can be reduced to image classification problem. Unlike writing set of rules, image classification is performed through a data-driven approach. Can you think of a way to write set of rules for identifying chairs in images, and also make sure you would not misclassify the chairs as cars? Trying to specify what each one of the categories of interest look like, directly in the code is not feasible. Instead, we can provide the computer with many examples of each class, and then develop learning algorithms that look at these examples and subsequently learn about the visual appearance of each class.