Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      05 May 2011, Volume 26 Issue 3   
    For Selected: View Abstracts Toggle Thumbnails
    Special Section on Advanced Computing Technology in China
    Preface
    Xiao-Dong Zhang
    Journal of Data Acquisition and Processing, 2011, 26 (3): 343-343. 
    Abstract   PDF(109KB) ( 1395 )  
    A supercomputer has been competitively ranked by its computing speed, memory and storage capacity, and parallel processing scalability year by year. Since it was introduced in 1960s, the major players in this field have been mainly dominated by US companies, such as Cray, IBM and Hewlett-Packard, and by Japanese companies, such as NEC, Fujitsu, and Hitachi.
    This leadership pattern of supercomputers has been changed since 2004 when Dawning 4000A of China joined the top 10 group in the Top 500 Supercomputer List. This machine was developed by the Institute of Computing Technology, Chinese Academy of Sciences. Since then, besides the United States, China has become the second country that is capable of developing and producing ultra high performance supercomputers. In November 2010, the Tianhe-1A Supercomputer developed by the National University of Defense Technology in China, became the fastest in the world. The other Chinese supercomputer, Dawning Nebulae was ranked the second in the Top 500 List in June 2010.
    I am very pleased to introduce three papers representing advanced computing technology of China in this special section. The first paper entitled "The TianHe-1A Supercomputer: Its Hardware and Software" is written by the R&D team from the National University of Defense Technology. The authors provide technical insights into the design and realization of the TianHe-1A supercomputer, the fastest one in the world by November 2010. The second paper entitled "Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure" is written by the R&D team from the Institute of Computing Technology, Chinese Academy of Sciences. The authors describe how they have achieved ultra performance with a heterogeneous structure in a large scale cluster.
    In addition to developing advanced supercomputing technology, Chinese researchers have made a 10 years' independent e?ort to develop CPU chips. This core technology has been mainly owned by major US companies, such as Intel, AMD, and NVIDIA. The third paper entitled "The Godson Processors: Its Research, Development, and Contributions" is written by several architects of the Godson processor chips. They give a roadmap of the Godson project, including its history, advancement of the technology and unique technical merits.
    All the authors of the three papers are young researchers, representing the technical competence and strong capability of new generation computer scientists and engineers in China. I would like to thank the JCST editorial management team for the help and professional service to make this special section be published timely.
    The TianHe-1A Supercomputer: Its Hardware and Software
    Xue-Jun Yang (杨学军), Senior Member, CCF, Member, ACM, IEEE, Xiang-Ke Liao (廖湘科), Senior Member CCF, Member, ACM, Kai Lu
    Journal of Data Acquisition and Processing, 2011, 26 (3): 344-351. 
    Abstract   PDF(737KB) ( 5380 )  
    This paper presents an overview of TianHe-1A (TH-1A) supercomputer, which is built by National University of Defense Technology of China (NUDT). TH-1A adopts a hybrid architecture by integrating CPUs and GPUs, and its interconnect network is a proprietary high-speed communication network. The theoretical peak performance of TH-1A is 4700 TFlops, and its LINPACK test result is 2566 TFlops. It was ranked the No. 1 on the TOP500 List released in November, 2010. TH-1A is now deployed in National Supercomputer Center in Tianjin and provides high performance computing services. TH-1A has played an important role in many applications, such as oil exploration, weather forecast, bio-medical research.
    Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure
    Ning-Hui Sun (孙凝辉), Member, CCF, IEEE, Jing Xing (邢晶), Zhi-Gang Huo (霍志刚), Member, CCF, ACM, Guang-Ming Tan
    Journal of Data Acquisition and Processing, 2011, 26 (3): 352-362. 
    Abstract   PDF(556KB) ( 6024 )  
    Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. System tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.
    The Godson Processors: Its Research, Development, and Contributions
    Wei-Wu Hu (胡伟武), Senior Member, CCF, Yan-Ping Gao (高燕萍), Member, CCF, Tian-Shi Chen (陈天石), and Jun-Hua Xiao (肖俊华), Member, CCF
    Journal of Data Acquisition and Processing, 2011, 26 (3): 363-372. 
    Abstract   PDF(1156KB) ( 1823 )  
    The Godson project with an R&D history of 10 years is an independent national program of China that aims at developing advanced microprocessor technologies based on fundamental research and commercialization of the chip technology. We will give a comprehensive presentation of the Godson project, including its history, technical roadmaps, and several unique technical merits.
    Special Section on High-Performance Computing for Embedded Multi-Core Systems
    Preface
    Min-Yi Guo, Zi-Li Shao, Edwin Hsing-Mean Sha
    Journal of Data Acquisition and Processing, 2011, 26 (3): 373-374. 
    Abstract   PDF(190KB) ( 1181 )  
    Multi-core processor brings a major technical innovation in computing hardware. The leap from single-core to multi-core technology has permanently altered the concept about computing. Embedded devices with multi-core technology will rapidly spread throughout the world, and the shift from single-core to multi-core posts a lot of challenges. This special section aims to address some of these challenges by including three invited papers and six regular papers.
    Unified UDispatch: A User Dispatching Tool for Multicore Systems
    Tang-Hsun Tu and Chih-Wen Hsueh, Member, IEEE
    Journal of Data Acquisition and Processing, 2011, 26 (3): 375-391. 
    Abstract   PDF(1038KB) ( 1600 )  
    In multicore environment, multithreading is often used to improve application performance. However, even in many simple applications, the performance might degrade when the number of threads increases. Users usually impute this phenomenon to the overhead of creation or termination of threads. In our observation, how the threads are dispatched to the multiple cores might have a more significant effect. We formally defined the problems on using threads as multithreading anomalies, and presented a novel user dispatching mechanism (UDispatch) which provides controllability in user space to improve application performance. Through modification of application source codes with the UDispatch application programming interface (API), the application performance can be improved significantly. However, since the application source codes might not be available or it might be too complicated to modify application source codes, we provided an extension, called UDispatch+, to dispatch threads without any modification of application source codes. In this paper, the UDispatch and UDispatch+ are integrated and wrapped for more portability and introduced as a tool called Unified UDispatch (UUD) with more detailed experiments and description. It can dispatch the application threads to specific cores at the discretion of users with up to 171.8% performance improvement on a 4-core machine.
    Data Transmission with the Battery Utilization Maximization
    Che-Wei Chang (张哲维), Tie-Fei Zhang (章铁飞), Chuan-Yue Yang (杨川岳), Ying-Jheng Chen (陈应正), Shih-Hao Hung (洪士灏), Member, IEEE, Tei-Wei Kuo (郭大维), Fellow, IEEE, and Tian-
    Journal of Data Acquisition and Processing, 2011, 26 (3): 392-404. 
    Abstract   PDF(1223KB) ( 1559 )  
    With the growing popularity of 3G-powered devices, there are growing demands on energy-efficient data trans- mission strategies for various embedded systems. Different from the past work in energy-efficient real-time task scheduling, we explore strategies to maximize the amount of data transmitted by a 3G module under a given battery capacity. In particular, we present algorithms under different workload configurations with and without timing constraint considera- tions. Experiments were then conducted to verify the validity of the strategies and develop insights in energy-efficient data transmission.
    Leakage-Aware Modulo Scheduling for Embedded VLIW Processors
    Yong Guan (关永) and Jingling Xue (薛京灵), Senior Member, IEEE, Member, ACM
    Journal of Data Acquisition and Processing, 2011, 26 (3): 405-417. 
    Abstract   PDF(524KB) ( 1446 )  
    As semi-conductor technologies move down to the nanometer scale, leakage power has become a significant component of the total power consumption. In this paper, we present a leakage-aware modulo scheduling algorithm to achieve leakage energy saving for applications with loops on Very Long Instruction Word (VLIW) architectures. The proposed algorithm is designed to maximize the idleness of function units integrated with the dual-threshold domino logic, and reduce the number of transitions between the active and sleep modes. We have implemented our technique in the Trimaran compiler and conducted experiments using a set of embedded benchmarks from DSPstone and Mibench on the cycle-accurate VLIW simulator of Trimaran. The results show that our technique achieves significant leakage energy saving compared with a previously published DAG-based (Directed Acyclic Graph) leakage-aware scheduling algorithm.
    Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
    Xuan Qi (祁轩) and Da-Kai Zhu (朱大开), Member, IEEE
    Journal of Data Acquisition and Processing, 2011, 26 (3): 418-433. 
    Abstract   PDF(473KB) ( 1476 )  
    Due to the increasing power consumption in modern computing systems, energy management has become an important research area in the last decade. Recently, multicore has emerged to be an energy efficient architecture that exploits parallelisms in modern applications. However, as the number of cores on a single chip continues to increase, it has been a grand challenge on how to effectively manage the energy efficiency of multicore-based systems. In this paper, based on the voltage island and dynamic voltage and frequency scaling (DVFS) techniques, we investigate the energy efficiency of block-partitioned multicore processors, where cores are grouped into blocks with the cores on one block sharing a DVFSenabled power supply. Depending on the number of cores on each block, we study both symmetric and asymmetric block configurations. We develop a system-level power model (which can support various power management techniques) and derive both block- and system-wide energy-efficient frequencies for systems with block-partitioned multicore processors. Based on the power model, we prove that, for embarrassingly parallel applications, having all cores on a single block can achieve the same energy savings as that of the individual block configuration (where each core forms a single block and has its own power supply). However, for applications with limited degrees of parallelism, we show the superiority of the buddy-asymmetric block configuration, where the number of required blocks (and power supplies) is logarithmically related to the number of cores on the chip, in that it can achieve the same amount of energy savings as that of the individual block configuration. The energy efficiency of different block configurations is further evaluated through extensive simulations with both synthetic as well as a real life application.
    A Resource-Efficient Communication Architecture for Chip Multiprocessors on FPGAs
    Xiaofang (Maggie) Wang, Member, IEEE, and Swetha Thota
    Journal of Data Acquisition and Processing, 2011, 26 (3): 434-447. 
    Abstract   PDF(743KB) ( 1867 )  
    Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip. For multiprocessors, an efficient communication network that matches the needs of the target application is always critical to the overall performance. Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs). However, the quest for high performance networks has led to very complex and resource-expensive NoC designs, leaving little room for the real computing force, i.e., PEs. Moreover, many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers. We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs. This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs. We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology, but also the resource requirement of each router. Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network. The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications.
    VERTAF/Multi-Core: A SysML-Based Application Framework for Multi-Core Embedded Software Development
    Chao-Sheng Lin (林朝圣), Chun-Hsien Lu (吕俊贤), Shang-Wei Lin (林尚威), Yean-Ru Chen (陈盈如), and Pao-Ann Hsiung (熊博安), Senior Member, ACM, IEEE
    Journal of Data Acquisition and Processing, 2011, 26 (3): 448-462. 
    Abstract   PDF(712KB) ( 1976 )  
    Multi-core processors are becoming prevalent rapidly in personal computing and embedded systems. Nevertheless, the programming environment for multi-core processor-based systems is still quite immature and lacks efficient tools. In this work, we present a new VERTAF/Multi-Core framework and show how software code can be automatically generated from SysML models of multi-core embedded systems. We illustrate how model-driven design based on SysML can be seamlessly integrated with Intel’s threading building blocks (TBB) and the quantum framework (QF) middleware. We use a digital video recording system to illustrate the benefits of the framework. Our experiments show how SysML/QF/TBB help in making multi-core embedded system programming model-driven, easy, and efficient.
    Configuration Reusing in On-Line Task Scheduling for Reconfigurable Computing Systems
    Maisam Mansub Bassiri and Hadi Shahriar Shahhoseini
    Journal of Data Acquisition and Processing, 2011, 26 (3): 463-473. 
    Abstract   PDF(465KB) ( 1989 )  
    Reconfigurable computing systems can be reconfigured at runtime and support partial reconfigurability which makes us able to execute tasks in a true multitasking manner. To manage such systems at runtime, a reconfigurable operating system is needed. The main part of this operating system is resource management unit which performs on-line scheduling and placement of hardware tasks at runtime. Reconfiguration overhead is an important obstacle that limits the performance of on-line scheduling algorithms in reconfigurable computing systems and increases the overall execution time. Configuration reusing (task reusing) can decrease reconfiguration overhead considerably, particularly in periodic applications or the applications in which the probability of tasks recurrence is high. In this paper, we present a technique called reusing-based scheduling (RBS), for on-line scheduling and placement in which configuration reusing is considered as a main characteristic in order to reduce reconfiguration overhead and decrease total execution time of the tasks. Several experiments have been conducted on the proposed algorithm. Obtained results show considerable improvement in overall execution time of the tasks.
    Partitioning the Conventional DBT System for Multiprocessors
    Ru-Hui Ma (马汝辉), Hai-Bing Guan (管海兵), Member, CCF, Er-Zhou Zhu (朱二周), Hong-Bo Yang (杨洪波), Yin-Dong Yang (杨吟东), and A-Lei Liang (梁阿磊), Member, CCF
    Journal of Data Acquisition and Processing, 2011, 26 (3): 474-490. 
    Abstract   PDF(480KB) ( 1488 )  
    Noticeable performance improvement via ever-increasing transistors is gradually trapped into a predicament since software cannot logically and efficiently utilize hardware resource, such as multi-core resource. This is an inevitable problem in dynamic binary translation (DBT) system as well. Though special purpose hardware as aide tool, through some interfaces, provided by DBT enables the system to achieve higher performance, the limitation of it is significant, that is, it is impossible to be used widely by another one. To overcome this drawback, we focus on building compatible software architecture to acquire higher performance without platform dependence. In this paper, we propose a novel multithreaded architecture for DBT system through partitioning distinct function module, which is to adequately utilize multiprocessors resource. This new architecture devides couples the common DBT system (DBTs) working routine into dynamic translation, optimization, and translated code execution phases, and then ramifies them into different threads to enable them concurrently executed. In this new architecture, several efficient novel methods are presented to cope with intractable work that puzzles most researchers, such as communication mechanism, cache layout, and mutual exclusion between threads. Experimental results using SPECint 2000 indicate that this new architecture for DBT system can achieve higher performance — speed up the traditional DBT system by about average 10.75%, with better CPU utilization.
    Energy Efficiency of a Multi-Core Processor by Tag Reduction
    Long Zheng (郑龙), Mian-Xiong Dong (董冕雄), Student Member, IEEE, Kaoru Ota, Hai Jin (金海), Senior Member, IEEE, Member, ACM, Song Guo, Senior Member, IEEE, Member, ACM, and Jun Ma (马俊), Student Member, IEEE
    Journal of Data Acquisition and Processing, 2011, 26 (3): 491-503. 
    Abstract   PDF(610KB) ( 1883 )  
    We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This paper extends the tag reduction technique on a single-core processor to a multi-core processor and investigates the potential of energy saving for multi-core processors. We formulate our approach as an equivalent problem which is to find an assignment of the whole instruction pages in the physical memory to a set of cores such that the tag-reduction conflicts for each core can be mostly avoided or reduced. We then propose three algorithms using different heuristics for this assignment problem. We provide convincing experimental results by collecting experimental data from a real operating system instead of the traditional way using a processor simulator that cannot simulate operating system functions and the full memory hierarchy. Experimental results show that our proposed algorithms can save total energy up to 83.93% on an 8-core processor and 76.16% on a 4-core processor in average compared to the one that the tag-reduction is not used for. They also significantly outperform the tag reduction based algorithm on a single-core processor.
    Architecture and High Performance Computer Systems
    Accurate and Simplified Prediction of AVF for Delay and Energy Efficient Cache Design
    An-Guo Ma (马安国), Yu Cheng (成玉), and Zuo-Cheng Xing (邢座程), Senior Member, CCF
    Journal of Data Acquisition and Processing, 2011, 26 (3): 504-519. 
    Abstract   PDF(735KB) ( 1927 )  
    With continuous technology scaling, on-chip structures are becoming more and more susceptible to soft errors. Architectural vulnerability factor (AVF) has been introduced to quantify the architectural vulnerability of on-chip structures to soft errors. Recent studies have found that designing soft error protection techniques with the awareness of AVF is greatly helpful to achieve a tradeoff between performance and reliability for several structures (i.e., issue queue, reorder buffer). Cache is one of the most susceptible components to soft errors and is commonly protected with error correcting codes (ECC). However, protecting caches closer to the processor (i.e., L1 data cache (L1D)) using ECC could result in high overhead. Protecting caches without accurate knowledge of the vulnerability characteristics may lead to over-protection. Therefore, designing AVF-aware ECC is attractive for designers to balance among performance, power and reliability for cache, especially at early design stage. In this paper, we improve the methodology of cache AVF computation and develop a new AVF estimation framework, soft error reliability analysis based on SimpleScalar. Then we characterize dynamic vulnerability behavior of L1D and detect the correlations between L1D AVF and various performance metrics. We propose to employ Bayesian additive regression trees to accurately model the variation of L1D AVF and to quantitatively explain the important effects of several key performance metrics on L1D AVF. Then, we employ bump hunting technique to reduce the complexity of L1D AVF prediction and extract some simple selecting rules based on several key performance metrics, thus enabling a simplified and fast estimation of L1D AVF. Based on the simplified and fast estimation of L1D AVF, intervals of high L1D AVF can be identified online, enabling us to develop the AVF-aware ECC technique to reduce the overhead of ECC. Experimental results show that compared with traditional ECC technique which provides complete ECC protection throughout the entire lifetime of a program, AVF-aware ECC technique reduces the L1D access latency by 35% and saves power consumption by 14% for SPEC2K benchmarks averagely.
    Physical Implementation of the Eight-Core Godson-3B Microprocessor
    Ru Wang (王茹), Bao-Xia Fan (范宝峡), Liang Yang (杨梁), Yan-Ping Gao (高燕萍), Dong Liu (刘动), Bin Xiao (肖斌), Jiang-Mei Wang (王江嵋), Yi-Fu Zhang (张译夫), Hong Wang (王宏), and Wei-Wu Hu (胡伟武)
    Journal of Data Acquisition and Processing, 2011, 26 (3): 520-527. 
    Abstract   PDF(699KB) ( 1744 )  
    The Godson-3B processor is a powerful processor designed for high performance servers including Dawning Servers. It offers significantly improved performance over previous Godson-3 series CPUs by incorporating eight CPU cores and vector computing units. It contains 582.6M transistors within 300mm2 area in 65 nm technology and is implemented in parallel with full hierarchical design flows. In Godson-3B, advanced clock distribution mechanisms including GALS (Globally Asynchronous Locally Synchronous) and clock mesh are adopted to obtain an OCV tolerable clock network. Custom-designed de-skew modules are also implemented to afford further latency balance after fabrication. The power reduction of Godson- 3B is maintained by MLMM (Multi Level Multi Mode) clock gating and multi-threshold-voltage cells substitution schemes. The highest frequency of Godson-3B is 1.05GHz and the peak performance is 128GFlops (double-precision) or 256GFlops (single-precision) with 40W power consumption.
    Computer Graphics and Visualization
    Automatic Narrow-Deep Feature Recognition for Mould Manufacturing
    Zheng-Ming Chen (陈正鸣), Senior Member, CCF, Kun-Jin He (何坤金), Member, CCF, and Jing Liu (刘景), Member, CCF
    Journal of Data Acquisition and Processing, 2011, 26 (3): 528-537. 
    Abstract   PDF(701KB) ( 1484 )  
    There usually exist narrow-long-deep areas in mould needed to be machined in special machining. To identify the narrow-deep areas automatically, an automatic narrow-deep feature (NF) recognition method is put forward accordingly. First, the narrow-deep feature is defined innovatively in this field and then feature hint is extracted from the mould by the characteristics of narrow-deep feature. Second, the elementary constituent faces (ECF) of a feature are found on the basis of the feature hint. By means of extending and clipping the ECF, the feature faces are obtained incrementally by geometric reasoning. As a result, basic narrow-deep features (BNF) related are combined heuristically. The proposed NF recognition method provides an intelligent connection between CAD and CAPP for machining narrow-deep areas in mould.
    Automatic Cage Building with Quadric Error Metrics
    Zheng-Jie Deng (邓正杰), Xiao-Nan Luo (罗笑南), and Xiao-Ping Miao (苗晓萍), Member, IEEE
    Journal of Data Acquisition and Processing, 2011, 26 (3): 538-547. 
    Abstract   PDF(736KB) ( 2421 )  
    Modern computer graphics applications usually require high resolution object models for realistic rendering. However, it is expensive and difficult to deform such models in real time. In order to reduce the computational cost during deformations, a dense model is often manipulated through a simplified structure, called cage, which envelops the model. However, cages are usually built interactively by users, which is tedious and time-consuming. In this paper, we introduce a novel method that can build cages automatically for both 2D polygons and 3D triangular meshes. The method consists of two steps: 1) simplifying the input model with quadric error metrics and quadratic programming to build a coarse cage; 2) removing the self-intersections of the coarse cage with Delaunay partitions. With this new method, a user can build a cage to envelop an input model either entirely or partially with the approximate vertex number the user specifies. Experimental results show that, compared to other cage building methods with the same number of vertex, cages built by our method are more similar to the input models. Thus, the dense models can be manipulated with higher accuracy through our cages.
    As-Rigid-As-Possible Surface Morphing
    Ya-Shu Liu (刘亚珠), Han-Bing Yan (严寒冰), and Ralph R. Martin
    Journal of Data Acquisition and Processing, 2011, 26 (3): 548-557. 
    Abstract   PDF(496KB) ( 1765 )  
    This paper presents a new morphing method based on the “as-rigid-as-possible” approach. Unlike the original as-rigid-as-possible method, we avoid the need to construct a consistent tetrahedral mesh, but instead require a consistent triangle surface mesh and from it create a tetrahedron for each surface triangle. Our new approach has several significant advantages. It is much easier to create a consistent triangle mesh than to create a consistent tetrahedral mesh. Secondly, the equations arising from our approach can be solved much more efficiently than the corresponding equations for a tetrahedral mesh. Finally, by incorporating the translation vector in the energy functional controlling interpolation, our new method does not need the user to arbitrarily fix any vertex to obtain a solution, allowing artists automatic control of interpolated mesh positions.
    PM-DFT: A New Local Invariant Descriptor Towards Image Copy Detection
    He-Fei Ling (凌贺飞), Senior Member, CCF, Member, ACM, IEEE, Li-YunWang (王丽云), Ling-Yu Yan (严灵毓), Fu-Hao Zou (邹复好), and Zheng-Ding Lu (卢正鼎)
    Journal of Data Acquisition and Processing, 2011, 26 (3): 558-567. 
    Abstract   PDF(473KB) ( 2266 )  
    Currently, global-features-based image copy detection is vulnerable to geometric transformations like cropping, shift, and rotations. To resolve this problem, some algorithms based on local descriptors have been proposed. However, the local descriptors, which were originally designed for object recognition, are not suitable for copy detection because they cause the problems of false positives and ambiguities. Instead of relying on the local gradient statistic as many existing descriptors do, we propose a new invariant local descriptor based on local polar-mapping and discrete Fourier transform. Then based on this descriptor, we propose a new framework of copy detection, in which virtual prior attacks and attack weight are employed for training and selecting only a few robust features. This consequently improves the storage and detection efficiency. In addition, it is worth noting that the feature matching takes the locations and orientations of interest points into consideration, which increases the number of matched regions and improves the recall. Experimental results show that the new descriptor is more robust and distinctive, and the proposed copy detection scheme using this descriptor can substantially enhance the accuracy and recall of copy detection and lower the false positives and ambiguities.
SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved