Overview description is still in progress of being written up...
There exists a range of work in the contour-based paradigm which achieve state-
of-the-art performance for several object categories using contour information,
for an overview see Figure 1. The research falls into four main categories, namely
(i) learning codebooks of contour fragments, (ii) approximating contours by
piecewise segments, (iii) using local description of the contour at selected in-
terest points, or (iv) assigning entire edges to either foreground or background.
Additional techniques are used in each work, for example learning deformation
models, sophisticated cost functions or probabilistic grouping.
Evaluations common datasets:
The first is ETHZ Shape Classes dataset from Vittorio Ferrari. It consists of ve
object classes and a total of 255 images. All classes contain signicant intra-class
variations and scale changes. The images sometimes contain multiple instances
of a category and have a large amount of background clutter. Download
ETHZ Shape Classes dataset v1.2
The second is the INRIA horses dataset from Frédéric Jurie and Vittorio Ferrari, which consists of 170 images with one or more horses in side-view at several scales and cluttered background, and 170 images without horses. Download INRIA horses datasett v1.03
Learning codebooks:
Shotton et al. [1] and Opelt et al. [2] concurrently proposed
to construct shape fragments tailored to specic object classes. Both find matches
to a predefined fragment codebook by chamfer matching to the query image
and then nd detections by a star-shaped voting model. Their methods rely on
chamfer matching which is sensitive to clutter and rotation. In both approaches
the major aspect is to learn discriminative combinations of boundary parts as
weak classiers using boosting to build a strong detector.
Method
Dataset
Protocol
Detection rate
Shotton [1]
Cow detection
XYZ Overlap
??% @ FPPI 1.0
Chamfer+Boosting fragments
Opelt [2]
Cow detection
XYZ Overlap
??% @ FPPI 1.0
Chamfer+Boosting fragments
Piecewise approximation:
Ferrari et al. [15, 3] build groups of approximately straight adjacent segments (kAS) to work together in a team to match the model parts. The segments are matched within a contour segmentation network which provides the combinations of multiple simple segments using the power of connectedness. In later work they also show how to automatically learn codebooks [3], or how to learn category shape models from cropped training images [16]. In a verication step they use a thin-plate-spline (TPS)-based matching to accurately localize the object boundary. Similar to this, Ravishankar et al. [4] use short segments to approximate the outer contour of objects. In contrast to straight segments, they prefer slightly curved segments to have better discriminative power between the segments. They further use a sophisticated scoring function which takes local deformations in scale and orientations into account. However, they break the reference template at high curvature points to be able to match parts, again resulting in disjoint approximations of the actual contour. In their verication stage, the gradient maps are used as underlying basis for object detection avoiding the error-prone detection of edges.
Method
Dataset
Protocol
Detection rate
Notes
Ferrari [15]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Contour segmentation network
Ferrari [16]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Contour segmentation network
Ferrari [16]
Caltech
PASCAL 50%
??% @ FPPI 1.0
Learn models
Ravishankar [4]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Curved segments
Shape-based interest points:
This category uses descriptors to capture and match coarse descriptions of the local shape around interest points. Leordeanu et al. [17] use simple features based on normal orientations and pairwise interactions between them to learn and detect object models in images. Their simple features are represented in pairwise relations in category specic models that can learn hundreds of parts. Berg et al. [14] formulate the object detection problem as a deformable shape matching problem. However, they require hand-segmented training images and do not learn deformation models in training. Further in the line are the works of Maji and Malik [5] and Ommer and Malik [6] which match geometric blur features to training images. The former use a max-margin framework to learn discriminative weights for each feature type to ensure maximal discrimination during the voting stage. The latter provide an interesting adaptation of the usual Hough-style center voting. Ommer and Malik transform the discrete scale voting to a continuous domain where the scale is another unknown in the voting space. Instead of multiple discrete center vectors, they formulate the votes as lines and cluster these to nd scale-coherent hypotheses. The verification is done using a HOG-based fast SVM kernel (IKSVM).
Method
Dataset
Protocol
Detection rate
Notes
Leordeanu [17]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Simple pairwise features
Berg [14]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Deformable shapes
Maji [5]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Max-margin
Ommer [6]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Voting lines
Figure / ground assignment:
Similar in concept but not in practice are the works of Zhu et al. [8] and Lu et al. [9]. They cast the problem as gure / ground labeling of edges and decide for a rather small set of edges which belong to the foreground and which are background clutter. By this labeling they reduce the clutter and focus on salient edges in their verication step. Lu et al. use particle lters under static observation to simultaneously group and label the edge contours. They use a new shape descriptor based on angles to decide edge contour similarity. Zhu et al. use control points along the reference contour to find possible edge contour combinations and then solve cost functions eciently using linear programming. They nd a maximal matching between a set of query image contours and a set of salient contour parts from the reference template, which
was manually split into a set of reference segments. Both assume to match entire edge contours to the reference sets and require long salient contours. Recent work by Bai et al. [7] is also based on a background clutter removal stage called shapeband. Shapeband is a new type of sliding window adapted to the shape of objects. It is used to provide location hypotheses and to select edge contour candidates. However, in their runtime intensive verication step they iteratively compute shape context descriptors [18] to select similar edge contours. Another recent approach by Gu et al. [19] proposes to use regions instead of local interest points or contours to better estimate the location and scale of objects.
Method
Dataset
Protocol
Detection rate
Notes
Zhu [8]
ETHZ
Ferrari 20%
??% @ FPPI 1.0
Set2Set
Lu [9]
ETHZ
Ferrari 20%
??% @ FPPI 1.0
ParticleFilter
Bai [7]
ETHZ
PASCAL 50%
??% @ FPPI 1.0
Shapeband
Gu [19]
ETHZ
Ferrari 50%
??% @ FPPI 1.0
Regions
Efficient Partial Contour Matching
We place our method in between the aforementioned approaches. We use edge contours in the query image and match them at any length from short contour segments up to full regions boundaries using partial shape matching. In such a setting the similarity to the prototype shape decides the complexity and length of the considered contours.
We propose a method for object category localization by partially matching edge contours to a single shape prototype of the category. Previous work in this area either relies on piecewise contour approximations, requires meaningful supervised decompositions, or matches coarse shape-based descriptions at local interest points. Our method avoids error-prone pre-processing steps by using all obtained edges in a partial contour matching setting. The matched fragments are efficiently summarized and aggregated to form location hypotheses. The efficiency and accuracy of our edge fragment based voting step yields high quality hypotheses in low computation time.
Using Partial Edge Contour Matches for Efficient Object Category Localization (Paper as PDF)
Hayko Riemenschneider, Michael Donoser and Horst Bischof
Proceedings of European Conference on Computer Vision (ECCV), 2010
Efficient Partial Shape Matching of Outer Contours (Paper as PDF)
Michael Donoser, Hayko Riemenschneider and Horst Bischof
Proceedings of Asian Conference on Computer Vision (ACCV), 2009