**Edge Boxes: Locating Object Proposals from Edges** Nrupatunga **TIP**: Please read the [paper](https://pdollar.github.io/files/papers/ZitnickDollarECCV14edgeBoxes.pdf) along with the post. Reading section-wise and referring back and forth between paper and the post is recommended for better understanding. ![](/assets/edgebox/onwall.png width="170px", border="2") # Overview In this post, I write my understanding of how the **_Edge Boxes_** algorithm works. The goal of generating object proposals is to create a relatively small set of candidate bounding boxes that cover the objects in the image. # Introduction Object proposal is one of the key steps in two-stage object detection methods. A good object proposal method should obtain a high recall with a modest number of candidate bounding boxes. **_Edge Boxes_** is one such object proposal methods. !!! What makes a box an object proposal candidate? If you take the edge map of an image (Figure [edge]), the number of contours that are wholly contained in a bounding box is a good indication that an object is present inside that box. In Figure [box], the **_green_** bounding box contains all the contours of the object (bird) well within the box, whereas the **_red_** box has the contours that are overlapping with the bounding box (which you can also imagine as some part of the contour is inside the box while the rest is outside the box), thus the red box is not a good object proposal candidate. ![Figure [input]: Input image](/assets/edgebox/1/input.png width=200) ![Figure [edge]: Edge map (after NMS)](/assets/edgebox/1/edgemap.png width=200) ![Figure [box]: Box candidate](/assets/edgebox/1/box_cand.png width=200) Simply by intuition, ******************************************************************************** * .------------------------------------------------------. * |Objectness score = #edges inside the box | * | - #edges overlapping box boundaries| * '------------------------------------------------------' ******************************************************************************** # Algorithm Let's deep dive into the algorithm and understand some of the key concepts. **Edge Detection**: Given an image, first step is to perform edge detection based on the random forest. Particularly, this method is used to eliminate the spurious edges in the textured regions which do not constitute an object. Figure [rf] shows the comparison betweeen Sobel and random forests based edge detections. You can observe how random forest based method eliminates some of the edges in the textured regions. ![Figure [rf]: Edge Detection comparison between Sobel and Random Forests](/assets/edgebox/sobel_random_forest.png width=600) **Non-Maximum Supression (NMS)**: Given the dense edge response map, we perform [NMS](https://docs.opencv.org/4.5.1/da/d22/tutorial_py_canny.html) orthogonal to edge response to find edge peaks as shown below. You can intuitively think this step as reducing the thickness of the contours. ![Before NMS](/assets/edgebox/rf_nonms.png width=200) ![After NMS](/assets/edgebox/rf_nms.png width=200) The output of the above two steps is basically an edge magnitude and orientation map (not shown in the figure). !!! **Note**: Edge map is filtered by considering only edges with magnitude 0.1 and also make edge values around the border pixels to be zero (because we use 8 neighbors to form edge groups, just the convenience and also it doesn’t matter much to consider these borders in our task) --- **Edge Groups**: Given the edge map (after NMS), edge groups are formed by inspecting the 8-neighbors. Among the 8-neighbors of the each edge pixel (root pixel), we find out the pixel with minimum angle difference with the root pixel and we count that into the edge group, then we use this pixel as root pixel and continue the process until the sum of the orientations differences in the updated edge group is above a threshold of $\pi / 2$. Figure below shows the color coded edge groups that is obtained from this process. ![Edge map](/assets/edgebox/edge.png width=270) ![Edge groups](/assets/edgebox/coloredge.png width=270) The next step is to merge the neighboring edge groups. Edge group merging happens considering 2x2 window only i.e., if the two edge groups are connected within 2x2 window, they are merged to form one single edge group. In our example, the changes are minimal. !!! **Summary**: For a given input image, obtain nms edge map and orientation map. Cluster the edges to form edge groups --- **Affinity**: For the set of edge groups $s_i \in S$, we obtained in the previous step, we compute the affinity score between each pair of neighboring groups. * Affinity score is defined as : $$a(s_i, s_j) = \lvert \cos(\theta_{i} - \theta_{ij}) \ \cos(\theta_{j} - \theta_{ij}) \rvert ^\gamma, \ \gamma = 2$$ * $\theta_{i}$, and $\theta_{j}$ are the mean orientations of $s_{i}$ and $s_{j}$. * let $(x_{i}, y_{i})$, $(x_{j}, y_{j})$ be the mean positions of the edge groups $s_{i}$ and $s_{j}$, then $\theta_{ij}$ is the angle between $(x_{i}, y_{i})$ and $(x_{j}, y_{j})$. If you consider the one edge group shown in Figure [edgegroup], the mean x, y positions are (19.1, 98.1) and the mean orientation is 14 degrees. This is done for all the edge groups we found in the previously. ![Figure [edgegroup]: One edge group](/assets/edgebox/edgegroup.png width=400) !!! **Note**: Mean positions/orientations are calculated as a weighted combination of positions/orientations. Weights are defined by the magnitude of the edge maps --- **Intuition behind affinity score:** Please note that for the sake of understanding, we assume the weights are all 1 for mean position or orientation calculation. **_Case-1_**: ![Figure [case1]: Edge groups](/assets/edgebox/case1.png width=300) * Consider two edge groups $e_{1}$ and $e_{2}$ as shown in Figure [case1]. $(x_{1}, y_{1})$, $(x_{2}, y_{2})$ are the mean positions of $e_{1}$ and $e_{2}$. Here $y_{1}$ = $y_{2}$ (in cartesian coordinate system). * We calculate the angle between mean positions, which will be $\arctan(y_{2} - y_{1} / x_{2} - x_{1})$ = $\arctan(0)$ = 0 degrees. * Similarly, we calculate $\theta_{i}$, $\theta_{j}$, the mean angle for $e_{1}$ and $e_{2}$ which will be 0 degrees and 0 degrees respectively * So the affinity score will be $\cos(0 - 0) * \cos(0 - 0)$ = $\cos(0) * \cos(0)$ = 1 * 1 = 1. * Affinity score = 1, so the intuition is that if the two edge groups are following up each other, then the affinity is 1 and this indicates they belong to each other and they could form one single edge group **_Case-2_**: ![Figure [case2]: Edge groups](/assets/edgebox/case2.png width=300) * Consider again two edge groups $e_{1}$ and $e_{2}$ as shown in Figure [case2]. $(x_{1}, y_{1})$, $(x_{2}, y_{2})$ are the mean positions of $e_{1}$ and $e_{2}$. Here $y_{1}$ = $y_{2}$ * We calculate the angle between mean positions, which will be $\arctan(y_{2} - y_{1} / x_{2} - x_{1})$ = $\arctan(0)$ = 0 degrees. * Similarly, we calculate $\theta_{i}$, $\theta_{j}$, the mean angle for $e_{1}$ and $e_{2}$ which will be 0 degrees and 90 degrees respectively * So the affinity score will be $\cos(0 - 0) * \cos(90 - 0)$ = $\cos(0) * \cos(90)$ = 1 * 0 = 0. * Affinity score = 0, so the intuition is that if the two edge groups are not following up each other, then the affinity is 0, they do not belong to each other and they couldnt form a single edge group So in Figure [affinity], if you consider the two edge groups shown in the figure as $e_{1}$ and $e_{2}$, they should have high-affinity right, and yes the affinity score is 0.8351 ![Figure [affinity]: Affinity between two edge groups](/assets/edgebox/affinity.png width=300) !!! Edge groups which are not connected within 2x2 window, the affinity is set to zero. --- **Bounding box scoring**: At this stage, we have different edge groups and the affinity scores between neighboring edge groups. Now, given the bounding box, we identify if that makes a good candidate for the object proposal. We define a score for each bounding box. This score indicates how well this box contains the object as a whole and thus will be an object proposal candidate. ![Figure [bbox_score]: Example of scoring a bounding box](/assets/edgebox/scoring.png width=750) Let us understand the scoring mechanism. In Figure [bbox_score], consider the bounding boxes $b_{1}$ and $b_{2}$ (please note that edge groups that are less than 2 pixels are not highlighted for sake of simplicity). $b_{2}$ is derived from $b_{1}$, basically is the center portion of $b_{1}$. The edges inside $b_{2}$ are not considered as their contribution to the objectness is not significant. Given the bounding box $b_{1}$, we find the edge groups which are intersecting with the bounding box edges. We can easily identify that by traversing along the bounding box boundary and find the segments intersecting. In this case, edge group $s_{1}$ intersects with the left edge of the bounding box. Knowing that $s_{1}$ intersects with the bounding box, we also find other edge groups that are neighbors to $s_{1}$ and have a high affinity with $s_{1}$. Based on the affinity score, $s_{3}$ is the neighboring edge group to $s_{1}$. $s_{2}$ and $s_{4}$ are the neighbors for $s_{3}$. We carry out this process until we figure out all the neighbors. All these neighbors are weighted by their affinity scores and we define the weight $W_{s}$. So our final score for this box = Sum of edge magnitudes inside the box $b_{1}$ - Sum of the edge magnitude inside the box $b_{2}$ - Sum of the edge magnitudes from edge groups $s_{1}$, $s_{2}$, $s_{3}$, $s_{4}$, $s_{5}$, $s_{6}$, $s_{7}$ weighted by $W_{s}$. # Conclusion Finally, we obtain scores for all the bounding boxes using the above scoring mechanism and sort them in increasing order. We pick the top-$K$ bounding boxes. In this post, I tried to highlight the key ideas in this paper. Please check out the paper for more details. # References [1] [Edge Boxes: Locating Object Proposals from Edges](https://pdollar.github.io/files/papers/ZitnickDollarECCV14edgeBoxes.pdf) [2] [OpenCV Code](https://github.com/opencv/opencv_contrib/blob/master/modules/ximgproc/src/edgeboxes.cpp)