Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      05 November 2013, Volume 28 Issue 6   
    For Selected: View Abstracts Toggle Thumbnails
    Special Section on Cloud Data Management
    Preface
    Xiao-Feng Meng, Fusheng Wang
    Journal of Data Acquisition and Processing, 2013, 28 (6): 929-931. 
    Abstract   PDF(192KB) ( 1188 )  
    Cloud computing and big data have become increasingly popular and are changing our way of thinking about the world by providing new insights and creating new forms of value. The research of cloud data management is to address the challenges in managing large collections of data in the cloud computing environment, and identifying information of value to business, science, government, and society. The huge volumes of data in cloud computing environments pose major infrastructure challenges, including data storage at Petabyte scale, massively parallel query execution, facilities for analytical processing, online query processing, resource optimization, data privacy and security.
    Non-Intrusive Elastic Query Processing in the Cloud
    Ticiana L. Coelho da Silva, Mario A. Nascimento, José Antônio F. de Macêdo, Flávio R. C. Sousa, and Javam C. Machado
    Journal of Data Acquisition and Processing, 2013, 28 (6): 932-947. 
    Abstract   PDF(1239KB) ( 2037 )  
    Cloud computing is a very promising paradigm of service-oriented computing. One major benefit of cloud computing is its elasticity, i.e., the system's capacity to provide and remove resources automatically at runtime. For that, it is essential to design and implement an efficient and effective technique that takes full advantage of the system's potential flexibility. This paper presents a non-intrusive approach that monitors the performance of relational database management systems in a cloud infrastructure, and automatically makes decisions to maximize the efficiency of the provider's environment while still satisfying agreed upon\service level agreements" (SLAs). Our experiments conducted on Amazon's cloud infrastructure, confirm that our technique is capable of automatically and dynamically adjusting the system's allocated resources observing the SLA.
    An Energy-Aware Heuristic Scheduling for Data-Intensive Workflows in Virtualized Datacenters
    Peng Xiao, Zhi-Gang Hu, and Yan-Ping Zhang
    Journal of Data Acquisition and Processing, 2013, 28 (6): 948-961. 
    Abstract   PDF(1949KB) ( 1868 )  
    With the development of cloud computing, more and more data-intensive workflows have been deployed on virtualized datacenters. As a result, the energy spent on massive data accessing grows rapidly. In this paper, an energyaware scheduling algorithm is proposed, which introduces a novel heuristic called Minimal Data-Accessing Energy Path for scheduling data-intensive workflows aiming to reduce the energy consumption of intensive data accessing. Extensive experiments based on both synthetical and real workloads are conducted to investigate the effectiveness and performance of the proposed scheduling approach. The experimental results show that the proposed heuristic scheduling can significantly reduce the energy consumption of storing/retrieving intermediate data generated during the execution of data-intensive workflow. In addition, it exhibits better robustness than existing algorithms when cloud systems are in presence of I/O-intensive workloads.
    A Framework for Supporting Tree-Like Indexes on the Chord Overlay
    Ming-Dong Zhu, De-Rong Shen, Yue Kou, Tie-Zheng Nie, and Ge Yu
    Journal of Data Acquisition and Processing, 2013, 28 (6): 962-972. 
    Abstract   PDF(776KB) ( 1558 )  
    With the explosive growth of data, to support efficient data management including queries and updates, the database system is expected to provide tree-like indexes, such as R-tree, M-tree, B+-tree, according to different types of data. In the distributed environment, the indexes have to be scattered across the compute nodes to improve reliability and scalability. Indexes can speed up queries, but they incur maintenance cost when updates occur. In the distributed environment, each compute node maintains a subset of an index tree, so keeping the communication cost small is more crucial, or else it occupies lots of network bandwidth and the scalability and availability of the database system are affected. Further, to achieve the reliability and scalability for queries, several replicas of the index are needed, but keeping the replicas consistent is not straightforward. In this paper, we propose a framework supporting tree-like indexes, based on Chord overlay, which is a popular P2P structure. The framework dynamically tunes the number of replicas of index to balance the query cost and the update cost. Several techniques are designed to improve the efficiency of updates without the cost of performance of the queries. We implement M-tree and R-tree in our framework, and extensive experiments on reallife and synthetic datasets verify the efficiency and scalability of our framework.
    HEDC++:An Extended Histogram Estimator for Data in the Cloud
    Ying-Jie Shi, Xiao-Feng Meng, Fusheng Wang, and Yan-Tao Gan
    Journal of Data Acquisition and Processing, 2013, 28 (6): 973-988. 
    Abstract   PDF(579KB) ( 1578 )  
    With increasing popularity of cloud-based data management, improving the performance of queries in the cloud is an urgent issue to solve. Summary of data distribution and statistical information has been commonly used in traditional databases to support query optimization, and histograms are of particular interest. Naturally, histograms could be used to support query optimization and efficient utilization of computing resources in the cloud. Histograms could provide helpful reference information for generating optimal query plans, and generate basic statistics useful for guaranteeing the load balance of query processing in the cloud. Since it is too expensive to construct an exact histogram on massive data, building an approximate histogram is a more feasible solution. This problem, however, is challenging to solve in the cloud environment because of the special data organization and processing mode in the cloud. In this paper, we present HEDC++, an extended histogram estimator for data in the cloud, which provides efficient approximation approaches for both equi-width and equi-depth histograms. We design the histogram estimate workflow based on an extended MapReduce framework, and propose novel sampling mechanisms to leverage the sampling efficiency and estimate accuracy. We experimentally validate our techniques on Hadoop and the results demonstrate that HEDC++ can provide promising histogram estimate for massive data in the cloud.
    Partition-Based Online Aggregation with Shared Sampling in the Cloud
    Yu-Xiang Wang, Jun-Zhou Luo, Ai-Bo Song, and Fang Dong
    Journal of Data Acquisition and Processing, 2013, 28 (6): 989-1011. 
    Abstract   PDF(4769KB) ( 2005 )  
    Online aggregation is an attractive sampling-based technology to response aggregation queries by an estimate to the final result, with the confidence interval becoming tighter over time. It has been built into a MapReduce-based cloud system for big data analytics, which allows users to monitor the query progress, and save money by killing the computation early once sufficient accuracy has been obtained. However, there are several limitations that restrict the performance of online aggregation generated from the gap between the current mechanism of MapReduce paradigm and the requirements of online aggregation, such as: 1) the low sampling efficiency due to the lack of consideration of skewed data distribution for online aggregation in MapReduce, and 2) the large redundant I/O cost of online aggregation caused by the independent job execution mechanism of MapReduce. In this paper, we present OLACloud, a MapReduce-based cloud system to well support online aggregation for different data distributions and large-scale concurrent query processing. We propose a content-aware repartition method with a fair-allocation block placement strategy to increase the sampling efficiency and guarantee the storage and computation load balancing simultaneously. We also develop a shared sampling method to share the sampling opportunities among multiple queries to reduce redundant I/O cost. We also implement OLACloud in Hadoop, and conduct an extensive experimental study on the TPC-H benchmark for skewed data distribution. Our results demonstrate the efficiency and effectiveness of OLACloud.
    Application-Aware Client-Side Data Reduction and Encryption of Personal Data in Cloud Backup Services
    Yin-Jin Fu, Nong Xiao, Xiang-Ke Liao, and Fang Liu
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1012-1024. 
    Abstract   PDF(1355KB) ( 3200 )  
    Cloud backup has been an important issue ever since large quantities of valuable data have been stored on the personal computing devices. Data reduction techniques, such as deduplication, delta encoding, and Lempel-Ziv (LZ) compression, performed at the client side before data transfer can help ease cloud backup by saving network bandwidth and reducing cloud storage space. However, client-side data reduction in cloud backup services faces efficiency and privacy challenges. In this paper, we present Pangolin, a secure and efficient cloud backup service for personal data storage by exploiting application awareness. It can speedup backup operations by application-aware client-side data reduction technique, and mitigate data security risks by integrating selective encryption into data reduction for sensitive applications. Our experimental evaluation, based on a prototype implementation, shows that our scheme can improve data reduction efficiency over the state-of-the-art methods by shortening the backup window size to 33%~75%, and its security mechanism for sensitive applications has negligible impact on backup window size.
    Architecture and VLSI Design
    A Temporal Locality-Aware Page-Mapped Flash Translation Layer
    Youngjae Kim, Aayush Gupta, and Bhuvan Urgaonkar
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1025-1044. 
    Abstract   PDF(3837KB) ( 1810 )  
    The poor performance of random writes has been a cause of major concern which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of this poor performance: the design of the flash translation layer (FTL) which performs the virtual-to-physical address translations and hides the erase-before-write characteristics of flash. We propose a complete paradigm shift in the design of the core FTL engine from the existing techniques with our Demand-Based Flash Translation Layer (DFTL) which selectively caches pagelevel address mappings. Our experimental evaluation using FlashSim with realistic enterprise-scale workloads endorses the utility of DFTL in enterprise-scale storage systems by demonstrating: 1) improved performance, 2) reduced garbage collection overhead and 3) better overload behavior compared with hybrid FTL schemes which are the most popular implementation methods. For example, a predominantly random-write dominant I/O trace from an OLTP application running at a large financial institution shows a 78% improvement in average response time (due to a 3-fold reduction in operations of the garbage collector), compared with the hybrid FTL scheme. Even for the well-known read-dominant TPC-H benchmark, for which DFTL introduces additional overheads, we improve system response time by 56%. Moreover, interestingly, when write-back cache on DFTL-based SSD is enabled, DFTL even outperforms the page-based FTL scheme, improving their response time by 72% in Financial trace.
    RevivePath:Resilient Network-on-Chip Design Through Data Path Salvaging of Router
    Yin-He Han, Cheng Liu, Hang Lu, Wen-Bo Li, Lei Zhang, and Xiao-Wei Li
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1045-1053. 
    Abstract   PDF(1835KB) ( 1446 )  
    Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.
    Low Power State Assignment Algorithm for FSMs Considering Peak Current Optimization
    Lun-Yao Wang, Zhu-Fei Chu, and Yin-Shui Xia
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1054-1062. 
    Abstract   PDF(1408KB) ( 1698 )  
    Finite state machine (FSM) plays a vital role in the sequential logic design. In an FSM, the high peak current which is drawn by state transitions can result in large voltage drop and electromigration which significantly affect circuit reliability. Several published papers show that the peak current can be reduced by post-optimization schemes or Boolean satisfiability (SAT)-based formulations. However, those methods of reducing the peak current either increase the overall power dissipation or are not efficient. This paper has proposed a low power state assignment algorithm with upper bound peak current constraints. First the peak current constraints are weighted into the objective function by Lagrangian relaxation technique with Lagrangian multipliers to penalize the violation. Second, Lagrangian sub-problems are solved by a genetic algorithm with Lagrangian multipliers updated by the subgradient optimization method. Finally, a heuristic algorithm determines the upper bound of the peak current, and achieves optimization between peak current and switching power. Experimental results of International Workshop on Logic and Synthesis (IWLS) 1993 benchmark suites show that the proposed method can achieve up to 45.27% reduction of peak current, 6.31% reduction of switching power, and significant reduction of run time compared with previously published results.
    Theoretical Computer Science
    A Shape Graph Logic and A Shape System
    Zhao-Peng Li, Yu Zhang, and Yi-Yun Chen
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1063-1084. 
    Abstract   PDF(1684KB) ( 1311 )  
    Analysis and verification of pointer programs are still difficult problems so far. This paper uses a shape graph logic and a shape system to solve these problems in two stages. First, shape graphs at every program point are constructed using an analysis tool. Then, they are used to support the verification of other properties (e.g., orderedness). Our prototype supports automatic verification of programs manipulating complex data structures such as splay trees, treaps, AVL trees and AA trees, etc. The proposed shape graph logic, as an extension to Hoare logic, uses shape graphs directly as assertions. It can be used in the analysis and verification of programs manipulating mutable data structures. The benefit using shape graphs as assertions is that it is convenient for acquiring the relations between pointers in the verification stage. The proposed shape system requires programmers to provide lightweight shape declarations in recursive structure type declarations. It can help rule out programs that construct shapes deviating from what programmers expect (reflected in shape declarations) in the analysis stage. As a benefit, programmers need not provide specifications (e.g., pre-/post-conditions, loop invariants) about pointers. Moreover, we present a method doing verification in the second stage using traditional Hoare logic rules directly by eliminating aliasing with the aid of shape graphs. Thus, verification conditions could be discharged by general theorem provers.
    Clausal Presentation of Theories in Deduction Modulo
    Jian-Hua Gao
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1085-1096. 
    Abstract   PDF(356KB) ( 1139 )  
    Resolution modulo is an extension of first-order resolution in which rewrite rules are used to rewrite clauses during the search. In the first version of this method, clauses are rewritten to arbitrary propositions. These propositions are needed to be dynamically transformed into clauses. This unpleasant feature can be eliminated when the rewrite system is clausal, i.e., when it rewrites clauses to clauses. We show in this paper how to transform any rewrite system into a clausal one, preserving the existence of cut free proofs of any sequent.
    The Infinite Evolution Mechanism of ε-Bisimilarity
    Yan-Fang Ma and Min Zhang
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1097-1105. 
    Abstract   PDF(394KB) ( 1139 )  
    In this paper, we focus on the convergence mechanism of ε-bisimulation under probabilistic processes to discuss the dynamic correctness of the software. Firstly, ε-limit bisimulation is defined for reflecting the dynamic relation between software specification and implementation. Some special ε-limit bisimulations are showed. Secondly, ε-bisimulation limit is proposed, which states the specification is the limit of implementation under ε-bisimulation. The uniqueness of ε-bisimulation limit and consistence with ε-bisimulation are presented. Finally, the substitutivity laws of ε-bisimulation limit with various combinators are proved.
    Natural Language Processing
    Improving Syntactic Parsing of Chinese with Empty Element Recovery
    Guo-Dong Zhou, and Pei-Feng Li
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1106-1116. 
    Abstract   PDF(374KB) ( 1640 )  
    This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in F1-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.
    Semantic Role Labeling of Chinese Nominal Predicates with Dependency-Driven Constituent Parse Tree Structure
    Hong-Ling Wang, and Guo-Dong Zhou
    Journal of Data Acquisition and Processing, 2013, 28 (6): 1117-1126. 
    Abstract   PDF(370KB) ( 1408 )  
    This paper explores a tree kernel based method for semantic role labeling (SRL) of Chinese nominal predicates via a convolution tree kernel. In particular, a new parse tree representation structure, called dependency-driven constituent parse tree (D-CPT), is proposed to combine the advantages of both constituent and dependence parse trees. This is achieved by directly representing various kinds of dependency relations in a CPT-style structure, which employs dependency relation types instead of phrase labels in CPT (Constituent Parse Tree). In this way, D-CPT not only keeps the dependency relationship information in the dependency parse tree (DPT) structure but also retains the basic hierarchical structure of CPT style. Moreover, several schemes are designed to extract various kinds of necessary information, such as the shortest path between the nominal predicate and the argument candidate, the support verb of the nominal predicate and the head argument modified by the argument candidate, from D-CPT. This largely reduces the noisy information inherent in D-CPT. Finally, a convolution tree kernel is employed to compute the similarity between two parse trees. Besides, we also implement a feature-based method based on D-CPT. Evaluation on Chinese NomBank corpus shows that our tree kernel based method on D-CPT performs significantly better than other tree kernel-based ones and achieves comparable performance with the state-of-the-art feature-based ones. This indicates the effectiveness of the novel D-CPT structure in representing various kinds of dependency relations in a CPT-style structure and our tree kernel based method in exploring the novel D-CPT structure. This also illustrates that the kernel-based methods are competitive and they are complementary with the featurebased methods on SRL.
SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved