Journal of Data Acquisition and Processing

Table of Content

05 January 2013, Volume 28 Issue 1

For Selected:

View Abstracts

Download Citations
EndNote Reference Manager ProCite BibTeX RefWorks

Toggle Thumbnails

Special Section on Selected Paper from NPC 2011

Select

Preface

Erik Altman, Wei-Song Shi

Journal of Data Acquisition and Processing, 2013, 28 (1): 1-2.

Abstract

PDF(143KB) ( 1857 )

Chinese Summary

On behalf of the program committee co-chairs of IFIP 2011 International Conference on Network and Parallel Computing (NPC 2011), we are very happy to organize a special section of five best paper nominees selected from the conference. The committee reviewed our 54 submissions over a 6-week period from mid-April to the end of May 2011, with an average of 3.7 reviews per paper. The committee then conducted a virtual and email PC meeting over the first two weeks of June to discuss papers and make selections for the conference. The EasyChair reviewing system used for the conference facilitated this virtual meeting and discussion. Comments made by any PC member on the EasyChair site were emailed to all other PC members assigned that paper, to the two of us as PC Chairs, and to any other PC members interested in that paper. In addition, we tried to seed discussions especially among controversial papers by commenting on the key points of reviews and highlighting differences of opinion. To avoid any hint of bias in selecting papers, discussion was double-blind throughout. There was clear consensus on almost all papers, and we had discussions and majority rule of PC members who reviewed a paper in the small set of cases where there was continuing disagreement.

Select

Pinned OS/Services: A Case Study of XML Parsing on Intel SCC

Jie Tang, Chen Liu, Shao-Shan Liu, Zhi-Min Gu, and Jean-Luc Gaudiot

Journal of Data Acquisition and Processing, 2013, 28 (1): 3-13.

Abstract

PDF(747KB) ( 2091 )

Chinese Summary

Nowadays, we are heading towards integrating hundreds to thousands of cores on a single chip. However, traditional system software and middleware are not well suited to manage and provide services at such large scale. To improve the scalability and adaptability of operating system and middleware services on future many-core platform, we propose the pinned OS/services. By porting each OS and runtime system (middleware) service to a separate core (special hardware acceleration), we expect to achieve maximal performance gain and energy efficiency in many-core environments. As a case study, we target on XML (Extensible Markup Language), the commonly used data transfer/store standard in the world. We have successfully implemented and evaluated the design of porting XML parsing service onto Intel 48-core Single-Chip Cloud Computer (SCC) platform. The results show that it can provide considerable energy saving. However, we also identified heavy performance penalties introduced from memory side, making the parsing service bloated. Hence, as a further step, we propose the memory-side hardware accelerator for XML parsing. With specified hardware design, we can further enhance the performance gain and energy efficiency, where the performance can be improved by 20% with 12.27% energy reduction.

Select

Reorder Write Sequence by Hetero-Buffer to Extend SSD's Lifespan

Zhi-Guang Chen, Nong Xiao, Fang Liu, and Yi-Mo Du

Journal of Data Acquisition and Processing, 2013, 28 (1): 14-27.

Abstract

PDF(983KB) ( 2802 )

Chinese Summary

The limited lifespan is the Achilles' heel of solid state drives (SSDs) based on NAND flash. NAND flash has two drawbacks that degrade SSDs' lifespan. One is the out-of-place update. Another is the sequential write constraint within a block. SSDs usually employ write buffer to extend their lifetime. However, existing write buffer schemes only pay attention to the first drawback, while neglect the second one. We propose a hetero-buffer architecture covering both aspects simultaneously. The hetero-buffer consists of two components, dynamic random access memory (DRAM) and the reorder area. DRAM endeavors to reduce write traffic as much as possible by pursuing a higher hit ratio (overcome the first drawback). The reorder area focuses on reordering write sequence (overcome the second drawback). Our hetero-buffer outperforms traditional write buffers because of two reasons. First, the DRAM can adopt existing superior cache replacement policy, thus achieves higher hit ratio. Second, the hetero-buffer reorders the write sequence, which has not been exploited by traditional write buffers. Besides the optimizations mentioned above, our hetero-buffer considers the work environment of write buffer, which is also neglected by traditional write buffers. By this way, the hetero-buffer is further improved. The performance is evaluated via trace-driven simulations. Experimental results show that, SSDs employing the hetero-buffer survive longer lifespan on most workloads.

Select

CSWL: Cross-SSD Wear-Leveling Method in SSD-Based RAID Systems for System Endurance and Performance

Yi-Mo Du, Nong Xiao, Fang Liu, and Zhi-Guang Chen

Journal of Data Acquisition and Processing, 2013, 28 (1): 28-41.

Abstract

PDF(1091KB) ( 3555 )

Chinese Summary

Flash memory has limited erasure/program cycles. Hence, to meet their advertised capacity all the time, flash- based solid state drives (SSDs) must prolong their life span through a wear-leveling mechanism. As a very important part of flash translation layer (FTL), wear leveling is usually implemented in SSD controllers, which is called internal wear leveling. However, there is no wear leveling among SSDs in SSD-based redundant array of independent disks (RAIDs) systems, making some SSDs wear out faster than others. Once an SSD fails, reconstruction must be triggered immediately, but the cost of this process is so high that both system reliability and availability are affected seriously. We therefore propose cross-SSD wear leveling (CSWL) to enhance the endurance of entire SSD-based RAID systems. Under the workload of random access pattern, parity stripes suffer from much more updates because updating to a data stripe will cause the modification of other all related parity stripes. Based on this principle, we introduce an age-driven parity distribution scheme to guarantee wear leveling among flash SSDs and thereby prolong the endurance of RAID systems. Furthermore, age-driven parity distribution benefits performance by maintaining better load balance. With insignificant overhead, CSWL can significantly improve both the life span and performance of SSD-based RAID.

Select

Power Adjusting Algorithm: A New Cross-Layer Power Saving Mechanism for Mobile Ad-Hoc Networks

Jin-Tao Meng, Jian-Rui Yuan, Sheng-Zhong Feng, and Lian-Sheng Tan

Journal of Data Acquisition and Processing, 2013, 28 (1): 42-53.

Abstract

PDF(1052KB) ( 2024 )

Chinese Summary

Power saving is one of the key issues in Mobile Ad-Hoc Networks (MANETs). It can be realized in Medium Access Control (MAC) layer and network layer. However, previous attentions were mainly paid to MAC layer or network layer with the aim of improving the channel utilization by adopting variable-range transmission power control. In this paper we focus on the power saving in both MAC layer and network layer, and propose a Power Adjusting Algorithm (PAA). In the presence of host's mobility, PAA is designed to conserve energy by adjusting the transmission power to maintain the route's connectivity and restarting the route discovery periodically to find a new route with better energy efficiency dynamically. After analyzing the operations of PAA, we find that the length of route discovery restarting period is a critical argument which will affect power saving, and an energy consumption model is abstracted to find the optimal value of the restarting period by analyzing the energy consumption of this algorithm. PAA can handle the mobility of MANET by adjusting the transmission power and in the meantime save energy by restarting route discovery periodically to balance the energy consumption on route discovery and packet delivering. Simulation results show that, PAA saves nearly 40% energy compared with Dynamic Source Routing protocol when the maximum speed of mobile hosts is larger than 8 m/s.

Select

Energy Efficient Run-Time Incremental Mapping for 3-D Networks-on-Chip

Xiao-Hang Wang, Peng Liu, Mei Yang, Maurizio Palesi, Ying-Tao Jiang, and Michael C Huang

Journal of Data Acquisition and Processing, 2013, 28 (1): 54-71.

Abstract

PDF(2178KB) ( 2083 )

Chinese Summary

3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging, as the arrival order and task graphs of the target applications are typically not known a priori, which can be further complicated by stringent energy requirements for NoC systems. This paper thus presents an energy-aware run-time incremental mapping algorithm (ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores, while reducing the fragmentation effect on the incoming applications to be mapped, and simultaneously satisfying the thermal constraints imposed on each incoming application. Specifically, incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing. Fragment tiles due to system fragmentation can be gleaned for better resource utilization. Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM, and the results are compared against the optimal mapping algorithm (branch-and-bound) and two heuristic algorithms (TB and TL). The experiments show that ERIM outperforms TB and TL methods with significant energy saving (more than 10%), much reduced average response time, and improved system utilization.

Architecture and VLSI Design

Select

Design and Implementation of an Extended Collectives Library for Unified Parallel C

Carlos Teijeiro, Guillermo L. Taboada, Juan Touriño, Ramón Doallo, José C. Mouriño, Damián A. Mallón, and Brian Wibecan

Journal of Data Acquisition and Processing, 2013, 28 (1): 72-89.

Abstract

PDF(2418KB) ( 1509 )

Chinese Summary

Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.

Select

MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs

Yan Li, Yun-Quan Zhang, Yi-Qun Liu, Guo-Ping Long, and Hai-Peng Jia

Journal of Data Acquisition and Processing, 2013, 28 (1): 90-105.

Abstract

PDF(8117KB) ( 2586 )

Chinese Summary

Fourier methods have revolutionized many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, and the fast Fourier transform (FFT) is a computationally efficient method of generating a Fourier transform. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to software. However, the complexity of GPU programming poses a significant challenge to developers. In this paper, we propose an automatic performance tuning framework for FFT on various OpenCL GPUs, and implement a high performance library named MPFFT based on this framework. For power-of-two length FFTs, our library substantially outperforms the clAmdFft library on AMD GPUs and achieves comparable performance as the CUFFT library on NVIDIA GPUs. Furthermore, our library also supports non-power-of-two size. For 3D non-power-of-two FFTs, our library delivers 1.5x to 28x faster than FFTW with 4 threads and 20.01x average speedup over CUFFT 4.0 on Tesla C2050.

Select

Resources Snapshot Model for Concurrent Transactions in Multi-Core Processors

Lei Zhao, Ji-Wen Yang

Journal of Data Acquisition and Processing, 2013, 28 (1): 106-118.

Abstract

PDF(478KB) ( 1607 )

Chinese Summary

Transaction parallelism in database systems is an attractive way of improving transaction performance. There exists two levels of transaction parallelism, inter-transaction level and intra-transaction level. With the advent of multicore processors, new hopes of improving transaction parallelism appear on the scene. The greatest execution efficiency of concurrent transactions comes from the lowest dependencies of them. However, the dependencies of concurrent transactions stand in the way of exploiting parallelism. In this paper, we present Resource Snapshot Model (RSM) for resource modeling in both levels. We propose a non-restarting scheduling algorithm in the inter-transaction level and a processor assignment algorithm in the intra-transaction level in terms of multi-core processors. Through these algorithms, execution performance of transaction streams will be improved in a parallel system with multiple heterogeneous processors that have different number of cores.

Select

TSV Minimization for Circuit — Partitioned 3D SoC Test Wrapper Design

Yuan-Qing Cheng, Lei Zhang, Yin-He Han, and Xiao-Wei Li

Journal of Data Acquisition and Processing, 2013, 28 (1): 119-128.

Abstract

PDF(1392KB) ( 2547 )

Chinese Summary

Semiconductor technology continues advancing, while global on-chip interconnects do not scale with the same pace as transistors, which has become the major bottleneck for performance and integration of future giga-scale ICs. Threedimensional (3D) integration has been proposed to sustain Moore’s law by incorporating through-silicon vias (TSVs) to integrate different circuit modules in the vertical direction, which is believed to be one of the most promising techniques to tackle the interconnect scaling problem. Due to its unique characteristics, there are many research opportunities, and in this paper we focus on the test wrapper optimization for the individual circuit-partitioned embedded cores within 3D System-on- Chips (SoCs). Firstly, we use existing 2D SoCs algorithms to minimize test time for individual embedded cores. In addition, vertical interconnects, i.e., TSVs that are used to construct the test wrapper should be taken into consideration as well. This is because TSVs typically employ bonding pads to tackle the misalignment problem, and they will occupy significant planar chip area, which may result in routing congestion. In this paper, we propose a series of heuristic algorithms to reduce the number of TSVs used in test wrapper chain construction without affecting test time negatively. It is composed of two steps, i.e., scan chain allocation and functional input/output insertion, both of which can reduce TSV count significantly. Through extensive experimental evaluations, it is shown that the test wrapper chain structure designed by our method can reduce the number of test TSVs dramatically, i.e., as much as 60.5% reductions in comparison with the random method and 26% in comparison with the intuitive method.

Information Security

Select

Collision Attack on the Full Extended MD4 and Pseudo-Preimage Attack on RIPEMD

Gao-Li Wang

Journal of Data Acquisition and Processing, 2013, 28 (1): 129-143.

Abstract

PDF(370KB) ( 1620 )

Chinese Summary

The cryptographic hash functions Extended MD4 and RIPEMD are double-branch hash functions, which consist of two parallel branches. Extended MD4 was proposed by Rivest in 1990, and RIPEMD was devised in the framework of the RIPE project (RACE Integrity Primitives Evaluation, 1988~1992). On the basis of differential analysis and meet-in-the- middle attack principle, this paper proposes a collision attack on the full Extended MD4 and a pseudo-preimage attack on the full RIPEMD respectively. The collision attack on Extended MD4 holds with a complexity of 2³⁷, and a collision instance is presented. The pseudo-preimage attack on RIPEMD holds with a complexity of 2^125,4, which optimizes the complexity order for brute-force attack. The results in this study will also be beneficial to the analysis of other double-branch hash functions such as RIPEMD-160.

Select

A New Batch Verifying Scheme for Identifying Illegal Signatures

Adrian Atanasiu

Journal of Data Acquisition and Processing, 2013, 28 (1): 144-151.

Abstract

PDF(249KB) ( 1803 )

Chinese Summary

The concept of batch verifying multiple digital signatures is to find a method by which multiple digital signatures can be verified simultaneously in a lower time complexity than separately verifying all the signatures. In this article, we analyze the complexity of the batch verifying schemes defined by Li, Hwang and Chen in 2010, and propose a new batch verifying multiple digital signature scheme, in two variants: one for RSA - by completing the Harn’s schema with an identifying illegal signatures algorithm, and the other adapted for a modified Elliptic Curve Digital Signature Algorithm protocol.

Select

A Secure Scalar Product Protocol Against Malicious Adversaries

Bo Yang, Yong Yu, and Chung-Huang Yang

Journal of Data Acquisition and Processing, 2013, 28 (1): 152-158.

Abstract

PDF(272KB) ( 1750 )

Chinese Summary

A secure scalar product protocol is a type of specific secure multi-party computation problem. Using this kind of protocol, two involved parties are able to jointly compute the scalar product of their private vectors, but no party will reveal any information about his/her private vector to another one. The secure scalar product protocol is of great importance in many privacy-preserving applications such as privacy-preserving data mining, privacy-preserving cooperative statistical analysis, and privacy-preserving geometry computation. In this paper, we give an efficient and secure scalar product protocol in the presence of malicious adversaries based on two important tools: the proof of knowledge of a discrete logarithm and the verifiable encryption. The security of the new protocol is proved under the standard simulation-based definitions. Compared with the existing schemes, our scheme offers higher efficiency because of avoiding inefficient cut-and-choose proofs.

Select

Cryptanalysis of Reduced-Round DASH

Shu-Sheng Liu, Zheng Gong, Li-Bin Wang

Journal of Data Acquisition and Processing, 2013, 28 (1): 159-164.

Abstract

PDF(634KB) ( 1310 )

Chinese Summary

In ACISP 2008, the hash family DASH has been proposed by Billet et al., which considers the design of Rijndael and RC6. DASH family has two variants that support 256-bit and 512-bit output length respectively. This paper presents the first third-party cryptanalysis of DASH-256 with a focus on the underlying block cipher A₂₅₆. In particular, we study the distinguisher using differential and boomerang attack. As a result, we build a distinguishing attack for the compression function of DASH-256 with 8-round A₂₅₆ using the differential cryptanalysis. Finally, we obtain a boomerang distinguisher of 9-round A₂₅₆.

Software Engineering

Select

Fuzzy Self-Adaptation of Mission-Critical Software Under Uncertainty

Qi-Liang Yang, Jian Lv, Xian-Ping Tao, Xiao-Xing Ma, Jian-Chun Xing, and Wei Song

Journal of Data Acquisition and Processing, 2013, 28 (1): 165-187.

Abstract

PDF(2064KB) ( 1879 )

Chinese Summary

Mission-critical software (MCS) must provide continuous, online services to ensure the successful accomplishment of critical missions. Self-adaptation is particularly desirable for assuring the quality of service (QoS) and availability of MCS under uncertainty. Few techniques have insofar addressed the issue of MCS self-adaptation, and most existing approaches to software self-adaptation fail to take into account uncertainty in the self-adaptation loop. To tackle this problem, we propose a fuzzy control based approach, i.e., Software Fuzzy Self-Adaptation (SFSA), with a view to deal with the challenge of MCS self-adaptation under uncertainty. First, we present the SFSA conceptual framework, consisting of sensing, deciding and acting stages, and establish the formal model of SFSA to lay a rigorous and mathematical foundation of our approach. Second, we develop a novel SFSA implementation technology as well as its supporting tool, i.e., the SFSA toolkit, to automate the realization process of SFSA. Finally, we demonstrate the effectiveness of our approach through the development of an adaptive MCS application in process control systems. Validation experiments show that the fuzzy control based approach proposed in this work is effective and with low overheads.

Select

Towards a Formal Semantics for UML/MARTE State Machines Based on Hierarchical Timed Automata

Yu Zhou, Luciano Baresi, and Matteo Rossi

Journal of Data Acquisition and Processing, 2013, 28 (1): 188-202.

Abstract

PDF(934KB) ( 1854 )

Chinese Summary

UML is a widely-used, general purpose modeling language. But its lack of a rigorous semantics forbids the thorough analysis of designed solution, and thus precludes the discovery of significant problems at design time. To bridge the gap, the paper investigates the underlying semantics of UML state machine diagrams, along with the time-related modeling elements of MARTE, the profile for modeling and analysis of real-time embedded systems, and proposes a formal operational semantics based on extended hierarchical timed automata. The approach is exemplified on a simple example taken from the automotive domain. Verification is accomplished by translating designed models into the input language of the UPPAAL model checker.

Select

Nondeterministic Probabilistic Petri Net — A New Method to Study Qualitative and Quantitative Behaviors of System

Yang Liu, Huai-Kou Miao, Hong-Wei Zeng, Yan Ma, and Pan Liu

Journal of Data Acquisition and Processing, 2013, 28 (1): 203-216.

Abstract

PDF(1201KB) ( 2417 )

Chinese Summary

There are many variants of Petri net at present, and some of them can be used to model system with both function and performance specification, such as stochastic Petri net, generalized stochastic Petri net and probabilistic Petri net. In this paper, we utilize extended Petri net to address the issue of modeling and verifying system with probability and nondeterminism besides function aspects. Using probabilistic Petri net as reference, we propose a new mixed model NPPN (Nondeterministic Probabilistic Petri Net) system, which can model and verify systems with qualitative and quantitative behaviours. Then we develop a kind of process algebra for NPPN system to interpret its algebraic semantics, and an actionbased PCTL (Probabilistic Computation Tree Logic) to interpret its logical semantics. Afterwards we present the rules for compositional operation of NPPN system based on NPPN system process algebra, and the model checking algorithm based on the action-based PCTL. In order to put the NPPN system into practice, we develop a friendly and visual tool for modeling, analyzing, simulating, and verifying NPPN system using action-based PCTL. The usefulness and effectiveness of the NPPN system are illustrated by modeling and model checking an elaborate model of travel arrangements workflow.