**Edge Boxes: Locating Object Proposals from Edges**
Nrupatunga
**TIP**: Please read the
[paper](https://pdollar.github.io/files/papers/ZitnickDollarECCV14edgeBoxes.pdf)
along with the post. Reading section-wise and referring back and forth
between paper and the post is recommended for better understanding.
![](/assets/edgebox/onwall.png width="170px", border="2")
# Overview
In this post, I write my understanding of how the **_Edge Boxes_**
algorithm works.
The goal of generating object proposals is to create a relatively small
set of candidate bounding boxes that cover the objects in the image.
# Introduction
Object proposal is one of the key steps in two-stage object detection
methods. A good object proposal method should obtain a high recall with
a modest number of candidate bounding boxes. **_Edge Boxes_** is one
such object proposal methods.
!!!
What makes a box an object proposal candidate?
If you take the edge map of an image (Figure [edge]), the number of
contours that are wholly contained in a bounding box is a good
indication that an object is present inside that box.
In Figure [box], the **_green_** bounding box contains all the contours
of the object (bird) well within the box, whereas the **_red_** box has
the contours that are overlapping with the bounding box (which you can
also imagine as some part of the contour is inside the box while the
rest is outside the box), thus the red box is not a good object proposal
candidate.
![Figure [input]: Input image](/assets/edgebox/1/input.png width=200) ![Figure [edge]: Edge map (after NMS)](/assets/edgebox/1/edgemap.png width=200) ![Figure [box]: Box candidate](/assets/edgebox/1/box_cand.png width=200)
Simply by intuition,
********************************************************************************
* .------------------------------------------------------.
* |Objectness score = #edges inside the box |
* | - #edges overlapping box boundaries|
* '------------------------------------------------------'
********************************************************************************
# Algorithm
Let's deep dive into the algorithm and understand some of the key
concepts.
**Edge Detection**: Given an image, first step is to perform
edge detection based on the random forest. Particularly, this method is
used to eliminate the spurious edges in the textured regions which do
not constitute an object. Figure [rf] shows the comparison betweeen
Sobel and random forests based edge detections. You can observe how
random forest based method eliminates some of the edges in the textured
regions.
![Figure [rf]: Edge Detection comparison between Sobel and Random
Forests](/assets/edgebox/sobel_random_forest.png width=600)
**Non-Maximum Supression (NMS)**: Given the dense edge response map, we
perform [NMS](https://docs.opencv.org/4.5.1/da/d22/tutorial_py_canny.html)
orthogonal to edge response to find edge peaks as shown below. You can
intuitively think this step as reducing the thickness of the contours.
![Before NMS](/assets/edgebox/rf_nonms.png width=200) ![After NMS](/assets/edgebox/rf_nms.png width=200)
The output of the above two steps is basically an edge magnitude and
orientation map (not shown in the figure).
!!!
**Note**: Edge map is filtered by considering only edges with magnitude
0.1 and also make edge values around the border pixels to be zero
(because we use 8 neighbors to form edge groups, just the convenience
and also it doesnâ€™t matter much to consider these borders in our task)
---
**Edge Groups**: Given the edge map (after NMS), edge groups are formed
by inspecting the 8-neighbors. Among the 8-neighbors of the each edge
pixel (root pixel), we find out the pixel with minimum angle difference
with the root pixel and we count that into the edge group, then we use
this pixel as root pixel and continue the process until the sum of the
orientations differences in the updated edge group is above a threshold
of $\pi / 2$. Figure below shows the color coded edge groups that is
obtained from this process.
![Edge map](/assets/edgebox/edge.png width=270) ![Edge groups](/assets/edgebox/coloredge.png width=270)
The next step is to merge the neighboring edge groups. Edge group
merging happens considering 2x2 window only i.e., if the two edge groups
are connected within 2x2 window, they are merged to form one single edge
group. In our example, the changes are minimal.
!!!
**Summary**: For a given input image, obtain nms edge map and
orientation map. Cluster the edges to form edge groups
---
**Affinity**: For the set of edge groups $s_i \in S$, we obtained in the
previous step, we compute the affinity score between each pair of
neighboring groups.
* Affinity score is defined as :
$$a(s_i, s_j) = \lvert \cos(\theta_{i} - \theta_{ij}) \ \cos(\theta_{j} -
\theta_{ij}) \rvert ^\gamma, \ \gamma = 2$$
* $\theta_{i}$, and $\theta_{j}$ are the mean orientations of $s_{i}$ and $s_{j}$.
* let $(x_{i}, y_{i})$, $(x_{j}, y_{j})$ be the mean positions of the edge groups $s_{i}$ and $s_{j}$, then $\theta_{ij}$ is the angle between $(x_{i}, y_{i})$ and $(x_{j}, y_{j})$.
If you consider the one edge group shown in Figure
[edgegroup], the mean x, y positions are (19.1, 98.1) and the mean
orientation is 14 degrees. This is done for all the edge groups we
found in the previously.
![Figure [edgegroup]: One edge group](/assets/edgebox/edgegroup.png width=400)
!!!
**Note**: Mean positions/orientations are calculated as a weighted
combination of positions/orientations. Weights are defined by the
magnitude of the edge maps
---
**Intuition behind affinity score:**
Please note that for the sake of understanding, we assume the weights
are all 1 for mean position or orientation calculation.
**_Case-1_**:
![Figure [case1]: Edge groups](/assets/edgebox/case1.png width=300)
* Consider two edge groups $e_{1}$ and $e_{2}$ as shown in Figure [case1]. $(x_{1}, y_{1})$, $(x_{2}, y_{2})$ are the mean positions of $e_{1}$ and $e_{2}$. Here $y_{1}$ = $y_{2}$ (in cartesian coordinate system).
* We calculate the angle between mean positions, which will be $\arctan(y_{2} - y_{1} / x_{2} - x_{1})$ = $\arctan(0)$ = 0 degrees.
* Similarly, we calculate $\theta_{i}$, $\theta_{j}$, the mean angle for $e_{1}$ and $e_{2}$ which will be 0 degrees and 0 degrees respectively
* So the affinity score will be $\cos(0 - 0) * \cos(0 - 0)$ = $\cos(0) * \cos(0)$ = 1 * 1 = 1.
* Affinity score = 1, so the intuition is that if the two edge groups are following up each other, then the affinity is 1 and this indicates they belong to each other and they could form one single edge group
**_Case-2_**:
![Figure [case2]: Edge groups](/assets/edgebox/case2.png width=300)
* Consider again two edge groups $e_{1}$ and $e_{2}$ as shown in Figure [case2]. $(x_{1}, y_{1})$, $(x_{2}, y_{2})$ are the mean positions of $e_{1}$ and $e_{2}$. Here $y_{1}$ = $y_{2}$
* We calculate the angle between mean positions, which will be $\arctan(y_{2} - y_{1} / x_{2} - x_{1})$ = $\arctan(0)$ = 0 degrees.
* Similarly, we calculate $\theta_{i}$, $\theta_{j}$, the mean angle for $e_{1}$ and $e_{2}$ which will be 0 degrees and 90 degrees respectively
* So the affinity score will be $\cos(0 - 0) * \cos(90 - 0)$ = $\cos(0) * \cos(90)$ = 1 * 0 = 0.
* Affinity score = 0, so the intuition is that if the two edge groups are not following up each other, then the affinity is 0, they do not belong to each other and they couldnt form a single edge group
So in Figure [affinity], if you consider the two edge groups shown in the
figure as $e_{1}$ and $e_{2}$, they should have high-affinity right, and
yes the affinity score is 0.8351
![Figure [affinity]: Affinity between two edge groups](/assets/edgebox/affinity.png width=300)
!!!
Edge groups which are not connected within 2x2 window, the affinity
is set to zero.
---
**Bounding box scoring**:
At this stage, we have different edge groups and the affinity scores
between neighboring edge groups. Now, given the bounding box, we
identify if that makes a good candidate for the object proposal. We
define a score for each bounding box. This score indicates how well this
box contains the object as a whole and thus will be an object proposal
candidate.
![Figure [bbox_score]: Example of scoring a bounding box](/assets/edgebox/scoring.png width=750)
Let us understand the scoring mechanism. In Figure [bbox_score],
consider the bounding boxes $b_{1}$ and $b_{2}$ (please note that edge
groups that are less than 2 pixels are not highlighted for sake of simplicity).
$b_{2}$ is derived from $b_{1}$, basically is the center portion of
$b_{1}$. The edges inside $b_{2}$ are not considered as their
contribution to the objectness is not significant.
Given the bounding box $b_{1}$, we find the edge groups which are
intersecting with the bounding box edges. We can easily identify that by
traversing along the bounding box boundary and find the segments
intersecting.
In this case, edge group $s_{1}$ intersects with the left edge of the
bounding box. Knowing that $s_{1}$ intersects with the bounding box, we
also find other edge groups that are neighbors to $s_{1}$ and have a
high affinity with $s_{1}$. Based on the affinity score, $s_{3}$ is the
neighboring edge group to $s_{1}$. $s_{2}$ and $s_{4}$ are the neighbors
for $s_{3}$. We carry out this process until we figure out all the
neighbors. All these neighbors are weighted by their affinity scores
and we define the weight $W_{s}$.
So our final score for this box = Sum of edge magnitudes inside the box
$b_{1}$ - Sum of the edge magnitude inside the box $b_{2}$ - Sum of the
edge magnitudes from edge groups $s_{1}$, $s_{2}$, $s_{3}$, $s_{4}$,
$s_{5}$, $s_{6}$, $s_{7}$ weighted by $W_{s}$.
# Conclusion
Finally, we obtain scores for all the bounding boxes using the above
scoring mechanism and sort them in increasing order. We pick the top-$K$
bounding boxes. In this post, I tried to highlight the key ideas in this
paper. Please check out the paper for more details.
# References
[1] [Edge Boxes: Locating Object Proposals from Edges](https://pdollar.github.io/files/papers/ZitnickDollarECCV14edgeBoxes.pdf)
[2] [OpenCV Code](https://github.com/opencv/opencv_contrib/blob/master/modules/ximgproc/src/edgeboxes.cpp)