Integrating Multiple Multi-Channel CBIR Systems
James C. French1 (Extended Abstract) James V. S. Watson Xiangyu Jin W. N. Martin
Department of Computer Science University of Virginia Charlottesville, VA 01-434-982-2213
Content-based image retrieval (CBIR) uses features that can be extracted from the images themselves. In previous work we have shown that using more than one representation of the images in a collection can improve the results presented to a user without changing the underlying feature extraction or search technologies. In this paper we show that we can also merge the results of multiple CBIR systems to achieve even greater retrieval effectiveness again without changing the underlying CBIR technology. We also present an example of this combined approach and show that it can dramatically improve retrieval effectiveness in content-based image retrieval systems.
Content-based image retrieval (CBIR) systems [1,10,16] search collections of images based on features that can be extracted from the image files themselves without manual descriptive or indexing labor from humans. Identifying such features and methods of extracting them are open areas of research. Using multiple image representations, we have been able to improve the results of existing image retrieval systems without developing any such new methods. The central strategy in our approach is to provide a diversity of representations and search strategies to produce several intermediate results that we can merge into a more effective retrieval result. Our intial work considered a diversity of representations; the current paper extends that work to consider additional search strategies. Both techniques result in substantial improvements in retrieval effectiveness and the combination of techniques is even more impressive. Our work is analogous to the work in text IR on combination of evidence strategies that dates back to the early 90’s. Two approaches have generally been used. In the first approach a diversity of queries is used to capture an information need more precisely. The several queries are can be combined before searching, or issued individually and the results of each query merged afterwards. The work of Belkin et al.[3,4] adopts this approach. In earlier work we investigated the application of query diversification in CBIR systems. The second strategy is to use a diversity of representations, that is, create several indexes over the same corpus of documents. The typical strategy is to index the corpus with the same technology varying indexing parameters, or to index the corpus with different technologies. Queries are processed in each setting with the results being merged afterwards. The work of Fox and Shaw and Shaw and Fox adopts this strategy. Bartell et al. also look at combing evidence in this framework. The first approach we adopted for extending CBIR systems to combine multiple evidence was to use a single CBIR technology with multiple image representations. The approach described in this paper extends that approach to use multiple CBIR technologies. As we describe in the next section, we
This work was supported in part by NASA grant NAG5-12025 and by the National Science Foundation. The views expressed here are those of the authors and do not necessarily represent those of NASA or the National Science Foundation.
investigate the use of a diversity of representations and CBIR technologies to achieve retrieval effectiveness gains over conventional CBIR systems. In the remainder of the paper, we describe our approach, our experimental setup, and finally discuss our results.
2. MULTIPLE VIEWPOINT SYSTEMS
A multiple viewpoint system is one which employs more than one organizational approach across a corpus of information. The idea is to provide a user with complementary access strategies under different organizations of the same data and to encapsulate these into a common interface. In earlier work we have shown the potential of this approach in text IR systems. We have also cast CBIR in this framework referring to alternative image representations as channels .
2.1 Single Channel CBIR
This is the conventional approach to CBIR. We are given a corpus of images. We extract a set of features from each image that typically capture color, shape and texture information although spatial and other information might also be used. Image features might be computed globally, or they might be associated with individual objects. After feature extraction the features are generally combined into a feature vector thereby implicitly placing the image (image objects) in a high-dimensional feature space. In the typical query-by-example approach to retrieval, a query image is presented to the system. The query image is processed in the same way as the stored images to produce a compatible representation, the query vector. Subsequent retrieval is done by producing a ranked list of images at increasing distance from the query vector. Although details among individual CBIR systems will vary, the conceptual model is the same: there is a single representation for each image and that representation is consulted when retrieving images. Thus, we have a single channel into the image collection. Figure 1 shows this simplified conceptual model. A query Q is shown entering the CBIR system which in turn produces a ranked list of results R. For our purposes, the CBIR system can be regarded as a black box.
Figure 1. Conventional CBIR
2.2 Multi-channel CBIR
We defined multi-channel CBIR in . Conceptually it is a straightforward extension of the single channel case. We create several different representations of the images and consult some or all of them during the retrieval process. In our approach we transform the images and index the transformed images. In  we held the CBIR system constant. Our multi-channel framework does not impose this as a requirement and we relax it in the present work. Figure 2 shows a multi-channel CBIR configuration. The dotted line encapsulates the CBIR black box of Figure 1. Four channels are shown. Details of the channel transforms are covered below. For now it suffices to note that we have used a CBIR technology to index the original images and three transforms of those images. This results in a single set of stored images (the original images) together with four indexes comprising the different representations produced by the CBIR technology. To retrieve in this multi-channel framework, we transform the query image Q to be consistent with each representation, that is, we produce four queries, Q1, Q2, Q3 and Q4, and process them separately using the appropriate index to produce results R1 through R4. At this point we can either: (a) present the top k results of all channels to the user for inspection or (b) merge the top k results from each channel and present the user with k or more merged results. We have done both and demonstrate the value with an example later.
Note that while we asserted that our approach is analogous to the multiple representation strategy used in text IR, there is a subtle difference. In the text IR approach, the same content (i.e., bit stream) is presented to all the indexing technologies. In our approach, it is the transformed content that is indexed.
2.3 Simple 4-Channel Model
The four representations chosen for our work here are shown in Figure 2. The rationale for the choice of our representations was given in . We use the original color image (C+) together with the black and white image (B+) and both the color (C-) and black and white (B-) negatives. Our four channels are, therefore, the color positive and negative and the black and white positive and negative images. Color is generally considered to be a three-dimensional attribute. Here we will use red (R), green (G) and blue (B) as the three dimensions so the pixel at location (x,y) is represented by P(x,y) = (R(x,y),G(x,y),B(x,y)). To create multiple representations we define a gray-scale operator g(P(x,y)) = (i, i, i) where i = (R(x,y)+G(x,y)+B(x,y))/3 and a negative operator n(v) = 2b-1-v for b bit resolution.
Q Q Q Q
Figure 2. Four-Channel CBIR We can define our channels in terms of transformations more precisely as follows. C+(x,y) = P(x,y) C-(x,y) = (n(R(x,y)),n(G(x,y)),n(B(x,y))) B+(x,y) = g(P(x,y)) B-(x,y) = n(g(P(x,y))) The intuition for including black and white channels is to provide channels where shape and texture will not be dominated by color. The multiple channels are intended to be recall enhancing, while the merge operator is precision enhancing. Note that channel C+ corresponds to conventional single-channel CBIR systems and is our performance baseline.
2.4 Combining Multiple CBIR Systems
We now consider combining the results from several k-channel CBIR systems2 into a single retrieval result. Figure 3 depicts two systems, CBIR1 and CBIR2 operating on the same set of images. Each of the CBIR technologies could be single or multi channel. A query Q is transformed if necessary into Q1 and Q2. These queries are processed as shown in Figure 2 and the results, R1 and R2 can be combined or presented to the user separately.
Figure 3. Multiple k-channel CBIR
2.5 A Retrieval Example
There are several practical ways to deploy this technology to end users. The two obvious strategies are: (a) keep the channels transparent and present a conventional CBIR interface (e.g. Figure 1) to the users; or (b) expose the channels completely and let the user decide how to exploit the new information to improve searching. In the current section we will give examples of the utility of our techniques in an interactive CBIR system that embodies these strategies. 2.5.1 The Scenario We assume that a user is querying an image database for images of roses. For the purposes of this example, we will regard an image as relevant if it contains a rose in the foreground. (Our testbed has been manually annotated as to foreground and background objects.) We have at our disposal two CBIR systems, CBIR1 and CBIR2. An image of a rose is given as Q0, the initial query (see Figure 4). 2.5.2 Using Two Conventional CBIR Systems Figure 4(a) shows the top 40 images presented by CBIR1 in response the initial query, Q0. There is only one relevant image and it occurs at rank 25.3 Let us use that image as the next query, Q1. Figure 4(b) shows the response from CBIR1 to query Q1. Four new relevant documents are returned.4 Now consider the response to query Q0 by CBIR2 (Figure 4(c)). No relevant images are returned. In fact Q0 is nearly an intractable query for CBIR2. If we were to persist looking down its result list we would
The term k-channel CBIR system includes conventional CBIR systems (k=1) as well as multi-channel systems (k>1). Note that all our queries are contained in the testbed so the query itself will always be returned as the highest ranking response. We do not include the query image in our counts of relevant images in the response set. The relevant images are shown with a box around them in the figures. Note that all counts of relevant documents are given as unique images so there may be more images boxed in the figures than the count indicates.
(a) Top 40 images provided by conventional, one-channel CBIR system (C+). Image at rank 25 is relevant.
(b) Top 40 images in response to query by Image-25 of part (a).
(c) Top 40 images from second conventional, one-channel CBIR system (C+). No relevant images found.
(d) Top 40 images in response to Image-25 of part (a) using second CBIR system. Two images are relevant. Figure 4. Single-channel CBIR Retrieval Results
not find a relevant image until rank 599 with the next two relevant images at ranks 667 and 749. This is not to say that CBIR2 is poor technology, rather it is simply unable to deal with this query effectively. This will be the case with any technology and is a very powerful argument in favor of using multiple systems in tandem. If the response from CBIR2 is all that is available to a user, that user would likely conclude that there are no images of interest in this database. Because CBIR1 found at least one relevant image, CBIR2 can be used to follow that path. The outcome of Q1 via CBIR2 is shown in Figure 4(d). Two relevant images are found and one of them is different from those found by CBIR1 (Figure 4(b)) for a total of 5 unique relevant images using two conventional CBIR systems. 2.5.3 Using Two 4-Channel CBIR Systems Figure 5 demonstrates the interactive use of two 4-channel CBIR systems. We begin as with the conventional system by presenting Q0 to CBIR1. The outcome is shown in Figure 5(a). Each channel is labeled and the query itself is the top-ranked image on each channel. Three relevant images have been found and, more interestingly, they were found on the black-and-white channels. In fact, both black-andwhite channels have placed a relevant image as their top-ranked images and, moreover, all three relevant images are in the top 10. Finally, note that the most useful channel is B-, the negative black-and-white image, placing three relevant images at ranks 2, 7 and 9 versus C+ which had only one relevant image at rank 25 (Figure 4(a)). We can see this effect with Q1 as well. Figure 6 shows the top 40 images on the B+ channel produced in response to Q1. Eight relevant images are uncovered as compared to the two images on the C+ channel shown in Figure 4(d). These examples indicate the value of a diversity of representations for the retrieval process. Again, judging from Figure 5(c), CBIR2 has apparently not benefited from the multiple channels. If we were to look further down the channels, we would find that the first relevant image occurs at rank 54 on the B- channel. As we have noted earlier, that should be compared to the C+ channel on which the first relevant image occurs at rank 599. Again, the negative channel appears to be more useful, albeit marginally, in this case. Now suppose we use the top-ranked query on the B+ (B-) channel of Figure 5(a) as the next query, say Q2, to each of the 4-channel CBIR systems. This is shown in Figure5(b) and 5(c). The effect is profound. CBIR1 produces 12 relevant images. CBIR2 also contributes 12 relevant images of which seven are different from CBIR1 for a total of 19 relevant images as compared with five relevant images found by the conventional approach. Figure 7 shows two possible outcomes when channels are merged. Figure 7(a) shows the top 40 images resulting from a combSUM merge of all four channels, while Figure 7(b) shows the top 40 images resulting from the combSUM merge of the black-and-white channels. The four channel merge has 3 relevant images, while the two channel merge has eight. Figure 8 shows the top 10 images resulting from applying two different merge algorithms to the four channels of Figure 5(b). Both produce seven relevant images in the top 10 for a precision of 70%. There are two conclusions to take away from this example. First, using more than one technology can help overcome pathological behavior. It decreases the chance that both technologies will fail on the same queries. Figure 4(d) and Figure 5(d) make this point very clearly. Second, multiple channels can boost retrieval effectiveness. More particularly, the inclusion of black-and-white channels can provide extra resilience to a strong color bias in the CBIR technology. 2.5.4 Discussion This section has made a circumstantial case for the utility of multi-system, multi-channel CBIR approaches based on a qualitative assessment of a retrieval scenario. These strategies could be used in a variety of search interfaces to provide additional functionality to existing systems. In the remainder of the paper we discuss a set of experiments that establish the superiority of these techniques over the conventional approaches.
(a) Top 10 images in four-channel configuration of first CBIR system. Three images are relevant.
(b) Top 10 images on each channel in response to Image-2 (B+ and B-). Eleven images are relevant.
(c) Top 10 images in four-channel configuration of second CBIR system. No images are relevant.
(d) Top 10 images on each channel of second CBIR system in response to Image-2 (B+ and B-) of part (a). Twelve images are relevant, nine are different from part (b). Figure 5. Four-channel CBIR Retrieval Results
Figure 6. Top 40 images on B+ channel of second CBIR system. Eight images are relevant.
(a) Top 40 images from merge of all four channels of first CBIR system. Three images are relevant.
(b) Top 40 images from merge of black and white channels (B+ and B-) of first CBIR system. Seven images are relevant. Figure 7. Merged Results from Multi-channel CBIR System
Figure 8. Top 10 results from two merge algorithms operating on 4 channels of Figure 5(b).
3. EXPERIMENTAL SETUP 3.1 Testbed
3.1.1 CBIR Technologies We used two different CBIR technologies in 4-channel configurations for the experiments reported here. Because we employ the technologies as black boxes, their internal details are unimportant. It suffices to know that they are state-or-the-art and different. (If space permits we can add details in a full paper.) 3.1.2 Test Data Our test data consisted of 3,400 images drawn from 34 categories of the COREL image collection. Each category contains 100 images. The categories were chosen because each of the images has a salient foreground object. 3.1.3 Ground Truth Each of the images in our testbed was labeled as to foreground and background objects. The image labeling is described in . 3.1.4 Indexing the Images We created four indexes corresponding to each channel in our testbed. The images were transformed into the representation of the channel and then indexed by our CBIR systems. Thus, we have a single corpus of images over which we have four separate indexes. This was done for both CBIR technologies.
Each image in our test data collection was used as a test query in each channel of our multi-system, multichannel testbed. Since each image is annotated with labels denoting foreground and background objects, we had de facto relevance assessments. For the results reported here, we declared an image to be relevant to a query image if it had any foreground label in common with the query image. We used trec_eval to generate the performance results. Our multi-channel merging results were produced using the combSUM[5,11] approach, that is, we summed the similarity values for images across the channels in which the image was included in the response set. (The conditions set out by Vogt for linearly combining relevance scores apply here: our channels will be seen to have reasonable performance and they do not rank relevant documents similarly.) We made no attempt to optimize the merging algorithm although intuition suggests that a weighted sum is almost certainly appropriate. Merging algorithm M1 of Figure 8 is a visual example of this algorithm. We report the multi-system merge results using both combSUM and a rank merging algorithm that depends only on the ordinal ranks of the images. This algorithm is appropriate when CBIR systems to be combined have incompatible similarity values. Merging algorithm M2 of Figure 8 is a visual example of this algorithm. Finally, we also report the perfect merge — as if the results were merged by an oracle. In this approach, we sorted the merged list by known relevance, that is, we assumed an oracle would place the r relevant images in position 1 through r while placing the non-relevant images in positions r+1, r+2 and so on. This merge represents the maximum possible performance achievable by any merge algorithm and gives us an operational upper bound on performance. All queries retrieved the top 100 images and only those were used in subsequent merging. Finally, in our discussion of results, we adopt Sparck Jones’ standard of assessing significance [12, page 397]: an effect will be noticeable if the performance increase is 5-10%; it will be material if greater than 10%.
Table 1. Average Precision (Non-interpolated)
CBIR-1 Merged CBIR-2
Conventional System (C+) 4 Channel combSUM Merge 4 Channel Rank Merge 4 Channel Perfect Merge 2 Systems Rank Merge 2 Systems combSUM Merge 2 Systems Perfect Merge
0.1049 0.1277 — 0.3042 0.1695 0.1748 0.4069
0.1089 0.1323 0.1306 0.3049
The results of our experiments are summarized in Table 1. For brevity we chose to report only the average precision (non-interpolated).5 Due to space limitations we have had to summarize our results considerably. We ran our experiments against both our CBIR technologies, denoted generically as CBIR1 and CBIR2. We ran the experiments using the technologies conventionally as well as in 4-channel configurations. Our results are summarized in Table 1. The performance of the two CBIR technologies when run conventionally is essentially the same (4% difference). When we run the technologies in a 4-channel configuration the performance of both is boosted 21-22%, so again their individual performance is about the same. We also tried the rank merge algorithm to merge one of the technologies and found it to provide essentially the same degree of performance boost. We show the performance achievable by an oracle, the perfect merge, to give an idea of the operational upper bound on the performance of each technology in a 4-channel configuration. As can be seen, there is considerable potential for improving the merging algorithm although it is not at all clear how much of that potential is practically realizable. Nevertheless, the performance boost obtained is substantial. When we merge the technologies, both merging approaches achieve excellent performance gains. The performance of the combSUM merge is about 9% better than the rank merge and is 37% (32%) better than the baseline performance of CBIR1 (CBIR2). The total performance improvement of both techniques combined is an impressive 67% (61%) better than the baseline performance of CBIR1 (CBIR2). Again, we show the perfect merge to establish an operational upper bound on the performance for the aggregate system. It is clear from these results that the two techniques, multi-channel CBIR and merging multiple technologies, can provide dramatic performance gains. Although because of space limitations we can only report a small portion of our results here, we have run extensive additional experiments considering different list lengths, different ground truth, different channel combinations and so on. All the evidence suggests that we can, in fact, combine the channels, even naively, to realize retrieval effectiveness gains over the conventional single-channel CBIR approach. Moreover, we can also effectively merge the results of separate CBIR technologies to achieve even greater performance gains.
We have described a simple approach for improving the retrieval effectiveness of conventional CBIR systems. Our approach treats the CBIR technology as a black box which can be used to provide different
For each query the precision is computed after each relevant image is retrieved. If no relevant images are retrieved, the precision is zero. All the precision values for a query are averaged to yield the query performance. All the query values are averaged to yield the system performance.
channels of retrieval results for subsequent merging or for use in interactive retrieval interfaces. The channels are implemented as additional indexes over simple image transforms. Our approach offers a simple, cost-effective strategy for boosting the performance of CBIR systems. We have also demonstrated the utility of merging the responses from multiple CBIR systems. The combination of techniques resulted in a 61-67% overall gain in retrieval effectiveness over the baseline conventional CBIR systems. The techniques are exogenous and, as such, we are able to treat the CBIR technology generically, manipulating only the inputs and merging the outputs to achieve improved retrieval performance.
 Aslandogan, Y. and Yu, C. Techniques and Systems for Image and Video Retrieval, IEEE Transactions on Knowledge and Data Engineering, 11(1) 1999, 56-63.  Bartell, B. Cottrell, G., and Belew, R. Automatic Combination of Multiple Ranked Retrieval Systems in Proceedings of ACM SIGIR (Dublin, Ireland, July 1994), 173-181.  Belkin, N., Cool, C., Croft, W., and Callan, J. The Effect of Multiple Query Representations on Information Retrieval System Performance in Proceedings of ACM SIGIR (Pittsburgh, PA, 1993), 339-346.  Belkin, N., Kantor, P., Cool, C. and Quatrain, R. Combining Evidence for Information Retrieval in Proceedings of TREC-2 (Gaithersburg MD, March 1994), 35-44.  Fox, E. and Shaw J. Combination of Multiple Searches in Proceedings of TREC-2 (Gaithersburg MD, March 1994), 243-252.  French, J. C., Chapin, A. C. and Martin, W. N. Multiple Viewpoints as an Approach to Digital Library Interfaces, Workshop on Document Search Interface Design and Intelligent Access in Large-scale Collections, Portland, OR, July, 2002.  French, J. C., Martin, W.N and Watson, J. V. S. A Qualitative Examination of Content-Based Image Retrieval Behavior using Systematically Modified Test Images, 45th IEEE International Midwest Symposium on Circuits and Systems, Tulsa, OK, August 4-7, 2002.  French, J. C., Martin, W. N., Watson, J.V.S., Jin, X. Using Multiple Image Representations to Improve the Quality of Content-Based Image Retrieval. Technical Report CS-2003-10, Dept. of Computer Science, University of Virgina, March, 2003.  Powell, A. L. and French, J. C. Using Multiple Views of a Document Collection in Information Exploration, CHI’98 Workshop on Innovation and Evaluation in Information Exploration Interfaces, Los Angeles, CA, April 19, 1998.  Rui, Y., Huang, T. and Chang, S.-F. Image Retrieval: Past, Present, and Future in Proceedings of International Symposium on Multimedia Information Processing (Taiwan, Dec. 1997).  Shaw, J. and Fox, E. Combination of Multiple Searches in Proceedings of TREC-3 (April 1995), 105-108.  Sparck Jones, K. Automatic Indexing in Journal of Documentation, 3(4), 1974, 393-432.  Sun J., Sun, Z., Zhou, R., Wang, H. A Semantic-based Image Retrieval System: VisEngine, IEEE 1st International Conference on Machine Learning and Cybernetics, Beijing, China, November, 2002, 349–353.  Vogt, C. When Does It Make Sense to Linearly Combine Relevance Scores? Poster in Proceedings of ACM SIGIR (Philadelphia, PA, 1997).  Wenyin, L., Dumais, S., Sun, Y., Zhang, H., Czerwinski, M., and Field, B. Semi-Automatic Image Annotation in Proceedings of Human-Computer Interaction-Interact (2001), 326-333.  Yoshitaka, A. and Ichikawa, T. A Survey on Content-Based Retrieval for Multimedia Databases, IEEE Transactions on Knowledge and Data Engineering, 11(1), 81-93.
copyright ©right 2010-2020。