Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      15 November 2018, Volume 33 Issue 6   
    For Selected: View Abstracts Toggle Thumbnails
    Computer Networks and Distributed Computing
    An Efficient Message Dissemination Scheme for Minimizing Delivery Delay in Delay Tolerant Networks
    En Wang, Wei-Shih Yang, Yong-Jian Yang, Jie Wu, Wen-Bin Liu
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1101-1124. 
    Abstract   PDF(1026KB) ( 661 )  
    Delay tolerant networks (DTNs) are a kind of sparse and highly mobile wireless networks, where no stable connectivity guarantee can be assumed. Most DTN users have several points of interest (PoIs), and they enjoy disseminating messages to the other users of the same PoI through WiFi. In DTNs, some time-sensitive messages (disaster warnings, search notices, etc.) need to be rapidly propagated among specific users or areas. Therefore, finding a path from the source to the destination with the shortest delay is the key problem. Taking the dissemination cost into consideration, we propose an efficient message dissemination strategy for minimizing delivery delay (MDMD) in DTNs, which first defines the user's activeness according to the transiting habit among different PoIs. Furthermore, depending on the activeness, an optimal user in each PoI is selected to constitute the path with the shortest delay. Finally, the MDMD with inactive state (on the way between PoIs) is further proposed to enhance the applicability. Simulation results show that, compared with other dissemination strategies, MDMD achieves the lowest average delay, and the comparable average hopcounts, on the premise that the delivery ratio is guaranteed to be 100% by the sufficient simulation time.
    A Task Allocation Method for Stream Processing with Recovery Latency Constraint
    Hong-Liang Li, Jie Wu, Zhen Jiang, Xiang Li, Xiao-Hui Wei
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1125-1139. 
    Abstract   PDF(2109KB) ( 458 )  
    Stream processing applications continuously process large amounts of online streaming data in real time or near real time. They have strict latency constraints. However, the continuous processing makes them vulnerable to any failures,and the recoveries may slow down the entire processing pipeline and break latency constraints. The upstream backup scheme is one of the most widely applied fault-tolerant schemes for stream processing systems. It introduces complex backup dependencies to tasks, which increases the difficulty of controlling recovery latencies. Moreover, when dependent tasks are located on the same processor, they fail at the same time in processor-level failures, bringing extra recovery latencies that increase the impacts of failures. This paper studies the relationship between the task allocation and the recovery latency of a stream processing application. We present a correlated failure effect model to describe the recovery latency of a stream topology in processor-level failures under a task allocation plan. We introduce a recovery-latency aware task allocation problem (RTAP) that seeks task allocation plans for stream topologies that will achieve guaranteed recovery latencies. We discuss the difference between RTAP and classic task allocation problems and present a heuristic algorithm with a computational complexity of O(n log2 n) to solve the problem. Extensive experiments were conducted to verify the correctness and effectiveness of our approach. It improves the resource usage by 15%-20% on average.
    CRL: Efficient Concurrent Regeneration Codes with Local Reconstruction in Geo-Distributed Storage Systems
    Quan-Qing Xu, Wei-Ya Xi, Khai Leong Yong, Chao Jin
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1140-1151. 
    Abstract   PDF(2963KB) ( 508 )  
    As a typical erasure coding choice, Reed-Solomon (RS) codes have such high repair cost that there is a penalty for high reliability and storage efficiency, thereby they are not suitable in geo-distributed storage systems. We present a novel family of concurrent regeneration codes with local reconstruction (CRL) in this paper. The CRL codes enjoy three benefits. Firstly, they are able to minimize the network bandwidth for node repair. Secondly, they can reduce the number of accessed nodes by calculating parities from a subset of data chunks and using an implied parity chunk. Thirdly, they are faster than existing erasure codes for reconstruction in geo-distributed storage systems. In addition, we demonstrate how the CRL codes overcome the limitations of the Reed Solomon codes. We also illustrate analytically that they are excellent in the trade-off between chunk locality and minimum distance. Furthermore, we present theoretical analysis including latency analysis and reliability analysis for the CRL codes. By using quantity comparisons, we prove that CRL(6, 2, 2) is only 0.657x of Azure LRC(6, 2, 2), where there are six data chunks, two global parities, and two local parities, and CRL(10, 4, 2) is only 0.656x of HDFS-Xorbas(10, 4, 2), where there are 10 data chunks, four local parities, and two global parities respectively, in terms of data reconstruction times. Our experimental results show the performance of CRL by conducting performance evaluations in both two kinds of environments:1) it is at least 57.25% and 66.85% more than its competitors in terms of encoding and decoding throughputs in memory, and 2) it has at least 1.46x and 1.21x higher encoding and decoding throughputs than its competitors in JBOD (Just a Bunch Of Disks). We also illustrate that CRL is 28.79% and 30.19% more than LRC on encoding and decoding throughputs in a geo-distributed environment.
    More Requests, Less Cost: Uncertain Inter-Datacenter Traffic Transmission with Multi-Tier Pricing
    Xiao-Dong Dong, Sheng Chen, Lai-Ping Zhao, Xiao-Bo Zhou, Heng Qi, Ke-Qiu Li
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1152-1163. 
    Abstract   PDF(329KB) ( 460 )  
    With the multi-tier pricing scheme provided by most of the cloud service providers (CSPs), the cloud users typically select a high enough transmission service level to ensure the quality of service (QoS), due to the severe penalty of missing the transmission deadline. This leads to the so-called over-provisioning problem, which increases the transmission cost of the cloud user. Given the fact that cloud users may not be aware of their traffic demand before accessing the network, the over-provisioning problem becomes more serious. In this paper, we investigate how to reduce the transmission cost from the perspective of cloud users, especially when they are not aware of their traffic demand before the transmission deadline. The key idea is to split a long-term transmission request into several short ones. By selecting the most suitable transmission service level for each short-term request, a cost-efficient inter-datacenter transmission service level selection framework is obtained. We further formulate the transmission service level selection problem as a linear programming problem and resolve it in an on-line style with Lyapunov optimization. We evaluate the proposed approach with real traffic data. The experimental results show that our method can reduce the transmission cost by up to 65.04%.
    Computer Graphics and Multimedia
    A Geometry-Based Point Cloud Reduction Method for Mobile Augmented Reality System
    Hao-Ren Wang, Juan Lei, Ao Li, Yi-Hong Wu
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1164-1177. 
    Abstract   PDF(3560KB) ( 460 )  
    In this paper, a geometry-based point cloud reduction method is proposed, and a real-time mobile augmented reality system is explored for applications in urban environments. We formulate a new objective function which combines the point reconstruction errors and constraints on spatial point distribution. Based on this formulation, a mixed integer programming scheme is utilized to solve the points reduction problem. The mobile augmented reality system explored in this paper is composed of the offline and online stages. At the offline stage, we build up the localization database using structure from motion and compress the point cloud by the proposed point cloud reduction method. While at the online stage, we compute the camera pose in real time by combining an image-based localization algorithm and a continuous pose tracking algorithm. Experimental results on benchmark and real data show that compared with the existing methods, this geometry-based point cloud reduction method selects a point cloud subset which helps the image-based localization method to achieve higher success rate. Also, the experiments conducted on a mobile platform show that the reduced point cloud not only reduces the time consuming for initialization and re-initialization, but also makes the memory footprint small, resulting a scalable and real-time mobile augmented reality system.
    Isometric 3D Shape Partial Matching Using GD-DNA
    Guo-Guang Du, Cong-Li Yin, Ming-Quan Zhou, Zhong-Ke Wu, Ya-Chun Fan, Fu-Qing Duan, Peng-Bo Zhou
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1178-1191. 
    Abstract   PDF(5454KB) ( 476 )  
    Isometric 3D shape partial matching has attracted a great amount of interest, with a plethora of applications ranging from shape recognition to texture mapping. In this paper, we propose a novel isometric 3D shape partial matching algorithm using the geodesic disk Laplace spectrum (GD-DNA). It transforms the partial matching problem into the geodesic disk matching problem. Firstly, the largest enclosed geodesic disk extracted from the partial shape is matched with geodesic disks from the full shape by the Laplace spectrum of the geodesic disk. Secondly, Generalized Multi-Dimensional Scaling algorithm (GMDS) and Euclidean embedding are conducted to establish final point correspondences between the partial and the full shape using the matched geodesic disk pair. The proposed GD-DNA is discriminative for matching geodesic disks, and it can well solve the anchor point selection problem in challenging partial shape matching tasks. Experimental results on the Shape Retrieval Contest 2016 (SHREC'16) benchmark validate the proposed method, and comparisons with isometric partial matching algorithms in the literature show that our method has a higher precision.
    Back-to-Front Ordering of Triangles in Digital Terrain Models over Regular Grids
    Jesús Alonso, Robert Joan-Arinyo
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1192-1203. 
    Abstract   PDF(3227KB) ( 409 )  
    Visiting triangles that conform a digital terrain model is a core operation in a number of fields like animation and video games or generating profiles, cross-sections, and contours in civil engineering. Performing the visit in an efficient manner is an issue specially when the output of the traversal depends in some way on additional parameters or information changing over time, for example, a moving point of view. In this work we report a set of rules that, given a digital terrain model defined over a regular grid and an arbitrary point of view outside the terrain, define a total back-to-front order in the set of digital terrain model triangles with respect to the point. The set of rules is minimal, complete and correct. To assess how the rules perform, we have implemented a CPU-based algorithm for realistically rendering height fields defined over regular grids. The algorithm does not make use of the z-buffer or shaders featured by our graphics card. We show how our algorithm is implemented and show visual results obtained from synthetic and real data. We further discuss the algorithm performance with respect to two algorithms:a naive algorithm that visits triangles according to grid indices and does not solve the hidden line problem, and the z-buffer provided by the graphics card featured by our computer. Our algorithm allows real-time interaction when the point of view arbitrarily moves in 3D space and we show that its performance is as good as that of the z-buffer graphics card.
    Data Management and Data Mining
    Modeling Topic-Based Human Expertise for Crowd Entity Resolution
    Sai-Sai Gong, Wei Hu, Wei-Yi Ge, Yu-Zhong Qu
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1204-1218. 
    Abstract   PDF(550KB) ( 453 )  
    Entity resolution (ER) aims to identify whether two entities in an ER task refer to the same real-world thing.Crowd ER uses humans, in addition to machine algorithms, to obtain the truths of ER tasks. However, inaccurate or erroneous results are likely to be generated when humans give unreliable judgments. Previous studies have found that correctly estimating human accuracy or expertise in crowd ER is crucial to truth inference. However, a large number of them assume that humans have consistent expertise over all the tasks, and ignore the fact that humans may have varied expertise on different topics (e.g., music versus sport). In this paper, we deal with crowd ER in the Semantic Web area. We identify multiple topics of ER tasks and model human expertise on different topics. Furthermore, we leverage similar task clustering to enhance the topic modeling and expertise estimation. We propose a probabilistic graphical model that computes ER task similarity, estimates human expertise, and infers the task truths in a unified framework. Our evaluation results on real-world and synthetic datasets show that, compared with several state-of-the-art approaches, our proposed model achieves higher accuracy on the task truth inference and is more consistent with the human real expertise.
    Time and Location Aware Points of Interest Recommendation in Location-Based Social Networks
    Tie-Yun Qian, Bei Liu, Liang Hong, Zhen-Ni You
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1219-1230. 
    Abstract   PDF(1513KB) ( 641 )  
    The wide spread of location-based social networks brings about a huge volume of user check-in data, which facilitates the recommendation of points of interest (POIs). Recent advances on distributed representation shed light on learning low dimensional dense vectors to alleviate the data sparsity problem. Current studies on representation learning for POI recommendation embed both users and POIs in a common latent space, and users' preference is inferred based on the distance/similarity between a user and a POI. Such an approach is not in accordance with the semantics of users and POIs as they are inherently different objects. In this paper, we present a novel translation-based, time and location aware (TransTL) representation, which models the spatial and temporal information as a relationship connecting users and POIs. Our model generalizes the recent advances in knowledge graph embedding. The basic idea is that the embedding of a pair corresponds to a translation from embeddings of users to POIs. Since the POI embedding should be close to the user embedding plus the relationship vector, the recommendation can be performed by selecting the top-k POIs similar to the translated POI, which are all of the same type of objects. We conduct extensive experiments on two real-world datasets. The results demonstrate that our TransTL model achieves the state-of-the-art performance. It is also much more robust to data sparsity than the baselines.
    Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness
    Rong Wang, Yan Zhu, Tung-Shou Chen, Chin-Chen Chang
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1231-1242. 
    Abstract   PDF(666KB) ( 698 )  
    Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist several common attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closeness models address the case in which the original data have only one single sensitive attribute, data with multiple sensitive attributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitive attributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributes in any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness, both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses a clustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasiidentifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the loss of information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. The results show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm but the former can preserve more original information. In addition, compared with related approaches, both proposed algorithms can achieve stronger protection of privacy and reduce less.
    Regular Paper
    Updatable Identity-Based Hash Proof System Based on Lattices and Its Application to Leakage-Resilient Public-Key Encryption Schemes
    Qi-Qi Lai, Bo Yang, Yong Yu, Zhe Xia, Yan-Wei Zhou, Yuan Chen
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1243-1260. 
    Abstract   PDF(436KB) ( 382 )  
    Identity-based hash proof system is a basic and important primitive. It is widely utilized to construct cryptographic schemes and protocols that are secure against key-leakage attacks. In this paper, we introduce the concept of updatable identity-based hash proof system, in which the related master secret key and the identity secret key can be updated securely. Then, we instantiate this primitive based on lattices in the standard model. Moreover, we introduce an application of this new primitive by giving a generic construction of leakage-resilient public-key encryption schemes with anonymity. This construction can be considered as the integration of the bounded-retrieval model and the continual leakage model. Compared with the existing leakage-resilient schemes, our construction not only is more efficient but also can resist much more key leakage.
    Generalized Tweakable Even-Mansour Cipher and Its Applications
    Ping Zhang, Hong-Gang Hu
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1261-1277. 
    Abstract   PDF(1402KB) ( 357 )  
    This paper describes a generalized tweakable blockcipher HPH (Hash-Permutation-Hash), which is based on a public random permutation P and a family of almost-XOR-universal hash functions H={HK}KK as a tweak and key schedule, and defined as y=HP HK((t1, t2), x)=P (xHK(t1)) ⊕ HK(t2), where K is a key randomly chosen from a key space K, (t1, t2) is a tweak chosen from a valid tweak space T, x is a plaintext, and y is a ciphertext. We prove that HPH is a secure strong tweakable pseudorandom permutation (STPRP) by using H-coefficients technique. Then we focus on the security of HPH against multi-key and related-key attacks. We prove that HPH achieves both multi-key STPRP security and related-key STPRP security. HPH can be extended to wide applications. It can be directly applied to authentication and authenticated encryption modes. We apply HPH to PMAC1 and OPP, provide an improved authentication mode HPMAC and a new authenticated encryption mode OPH, and prove that the two modes achieve single-key security, multi-key security, and related-key security.
    Mining Patterns from Change Logs to Support Reuse-Driven Evolution of Software Architectures
    Aakash Ahmad, Claus Pahl, Ahmed B. Altamimi, Abdulrahman Alreshidi
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1278-1306. 
    Abstract   PDF(8701KB) ( 271 )  
    Modern software systems are subject to a continuous evolution under frequently varying requirements and changes in systems' operational environments. Lehman's law of continuing change demands for long-living and continuously evolving software to prolong its productive life and economic value by accommodating changes in existing software. Reusable knowledge and practices have proven to be successful for continuous development and evolution of the software effectively and efficiently. However, challenges such as empirical acquisition and systematic application of the reusable knowledge and practices must be addressed to enable or enhance software evolution. We investigate architecture change logs-mining histories of architecture-centric software evolution-to discover change patterns that 1) support reusability of architectural changes and 2) enhance the efficiency of the architecture evolution process. We model architecture change logs as a graph and apply graph-based formalism (i.e., graph mining techniques) to discover software architecture change patterns. We have developed a prototype that enables tool-driven automation and user decision support during software evolution. We have used the ISO-IEC-9126 model to qualitatively evaluate the proposed solution. The evaluation results suggest that the proposed solution 1) enables the reusability of frequent architectural changes and 2) enhances the efficiency of architecturecentric software evolution process. The proposed solution promotes research efforts to exploit the history of architectural changes to empirically discover knowledge that can guide architecture-centric software evolution.
    A New Method for Sentiment Analysis Using Contextual Auto-Encoders
    Hanen Ameur, Salma Jamoussi, Abdelmajid Ben Hamadou
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1307-1319. 
    Abstract   PDF(793KB) ( 688 )  
    Sentiment analysis, a hot research topic, presents new challenges for understanding users' opinions and judgments expressed online. They aim to classify the subjective texts by assigning them a polarity label. In this paper, we introduce a novel machine learning framework using auto-encoders network to predict the sentiment polarity label at the word level and the sentence level. Inspired by the dimensionality reduction and the feature extraction capabilities of the auto-encoders, we propose a new model for distributed word vector representation "PMI-SA" using as input pointwisemutual-information "PMI" word vectors. The resulted continuous word vectors are combined to represent a sentence. An unsupervised sentence embedding method, called Contextual Recursive Auto-Encoders "CoRAE", is also developed for learning sentence representation. Indeed, CoRAE follows the basic idea of the recursive auto-encoders to deeply compose the vectors of words constituting the sentence, but without relying on any syntactic parse tree. The CoRAE model consists in combining recursively each word with its context words (neighbors' words:previous and next) by considering the word order. A support vector machine classifier with fine-tuning technique is also used to show that our deep compositional representation model CoRAE improves significantly the accuracy of sentiment analysis task. Experimental results demonstrate that CoRAE remarkably outperforms several competitive baseline methods on two databases, namely, Sanders twitter corpus and Facebook comments corpus. The CoRAE model achieves an efficiency of 83.28% with the Facebook dataset and 97.57% with the Sanders dataset.
    Search-Based Cost-Effective Software Remodularization
    Rim Mahouachi
    Journal of Data Acquisition and Processing, 2018, 33 (6): 1320-1336. 
    Abstract   PDF(2378KB) ( 761 )  
    Software modularization is a technique used to divide a software system into independent modules (packages) that are expected to be cohesive and loosely coupled. However, as software systems evolve over time to meet new requirements, their modularizations become complex and gradually loose their quality. Thus, it is challenging to automatically optimize the classes' distribution in packages, also known as remodularization. To alleviate this issue, we introduce a new approach to optimize software modularization by moving classes to more suitable packages. In addition to improving design quality and preserving semantic coherence, our approach takes into consideration the refactoring effort as an objective in itself while optimizing software modularization. We adapt the Elitist Non-dominated Sorting Genetic Algorithm (NSGA-Ⅱ) of Deb et al. to find the best sequence of refactorings that 1) maximize structural quality, 2) maximize semantic cohesiveness of packages (evaluated by a semantic measure based on WordNet), and 3) minimize the refactoring effort. We report the results of an evaluation of our approach using open-source projects, and we show that our proposal is able to produce a coherent and useful sequence of recommended refactorings both in terms of quality metrics and from the developer's points of view.

    IDENTIFICATION AND PROCESSING OF PII DATA, APPLYING DEEP LEARNING MODELS WITH IMPROVED ACCURACY AND EFFICIENCY
    Mainak Mitra, Soumit Roy
    Journal of Data Acquisition and Processing, 2018, 34 (6): 

SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved