Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      05 September 2017, Volume 32 Issue 5   
    For Selected: View Abstracts Toggle Thumbnails
    Special Section on Crowdsourced Data Management
    Preface
    Xiaofang Zhou, Guo-Liang Li
    Journal of Data Acquisition and Processing, 2017, 32 (5): 843-844. 
    Abstract   PDF(84KB) ( 580 )  
    COSSET+:Crowdsourced Missing Value Imputation Optimized by Knowledge Base
    Hong-Zhi Wang, Zhi-Xin Qi, Ruo-Xi Shi, Jian-Zhong Li, Hong Gao
    Journal of Data Acquisition and Processing, 2017, 32 (5): 845-857. 
    Abstract   PDF(1383KB) ( 870 )  
    Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, time cost and overhead in crowdsourcing are high. Therefore, we have to reduce cost and guarantee accuracy of crowdsourced imputation. To achieve the optimization goal, we present COSSET+, a crowdsourced framework optimized by knowledge base. We combine the advantages of both knowledge-based filter and crowdsourcing platform to capture missing values. Since the amount of crowd values will affect the cost of COSSET+, we aim to select partial missing values to be crowdsourced. We prove that the crowd value selection problem is an NP-hard problem and develop an approximation algorithm for this problem. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches.
    Crowd-Guided Entity Matching with Consolidated Textual Data
    Zhi-Xu Li, Qiang Yang, An Liu, Guan-Feng Liu, Jia Zhu, Jia-Jie Xu, Kai Zheng, Min Zhang
    Journal of Data Acquisition and Processing, 2017, 32 (5): 858-876. 
    Abstract   PDF(1225KB) ( 827 )  
    Entity Matching (EM) identifies records referring to the same entity within or across databases. Existing methods using structured attribute values (such as digital, date or short string values) only may fail when the structured information is not enough to reflect the matching relationships between records. Nowadays more and more databases may have some unstructured textual attribute containing extra Consolidated Textual information (CText for short) of the record, but seldom work has been done on using the CText information for EM. Conventional string similarity metrics such as edit distance or bag-of-words are unsuitable for measuring the similarities between CTexts since there are hundreds or thousands of words with each CText, while existing topic models either can not work well since there is no obvious gaps between topics in CText. In this paper, we propose a novel cooccurrence-based topic model to identify various sub-topics from each CText, and then measure the similarity between CTexts on the multiple sub-topic dimensions. To avoid from ignoring some hidden important sub-topics, we let the crowd to help us decide the weight of different sub-topics in doing EM. Our empirical study on two real-world data sets based on Amzon Mechanical Turk Crowdsourcing Platform shows that our method outperforms the state-of-the-art EM methods and Text Understanding models.
    Improving the Quality of Crowdsourced Image Labeling via Label Similarity
    Yi-Li Fang, Hai-Long Sun, Peng-Peng Chen, Ting Deng
    Journal of Data Acquisition and Processing, 2017, 32 (5): 877-889. 
    Abstract   PDF(1205KB) ( 976 )  
    Crowdsourcing is an effective method to obtain large databases of manually-labeled images, which is especially important for image understanding with supervised machine learning algorithms. However, for several kinds of tasks regarding image labeling, e.g. dog breed recognition, it is hard to achieve high-quality results. Therefore, further optimizing crowdsourcing workflow mainly involves task allocation and result inference. In task allocation, we design a two-round crowdsourcing framework, which contains a smart decision mechanism based on information entropy to determine whether to perform a second round task allocation. Regarding result inference, after quantifying the similarity of all labels,two graphical models are proposed to describe the labeling process and corresponding inference algorithms are designed to further improve the result quality of image labeling. Extensive experiments on real-world tasks in Crowdflower and synthesis datasets were conducted. The experimental results demonstrate the superiority of these approaches in comparison with state-of-the-art methods.
    Budget-aware Dynamic Incentive Mechanism in Spatial Crowdsourcing
    Jia-Xu Liu, Yu-Dian Ji, Wei-Feng Lv, Ke Xu
    Journal of Data Acquisition and Processing, 2017, 32 (5): 890-904. 
    Abstract   PDF(899KB) ( 950 )  
    The ubiquitous deployment of GPS-equipped devices and mobile networks has spurred the popularity of spatial crowdsourcing. Many spatial crowdsourcing tasks require crowd workers to collect data from different locations. Since workers tend to select locations nearby or align to their routines, data collected by workers are usually unevenly distributed across the region. To encourage workers to choose remote locations so as to avoid imbalanced data collection, we investigate the incentive mechanisms in spatial crowdsourcing. We propose a price adjustment function and two algorithms, namely DFBA and DABA, which utilize price leverage to mitigate the imbalanced data collection problem. Extensive evaluations on both synthetic and real-world datasets demonstrate that the proposed incentive mechanisms are able to effectively balance the popularity of different locations.
    Privacy-preserving Task Assignment in Spatial Crowdsourcing
    An Liu, Zhi-Xu Li, Guan-Feng Liu, Kai Zheng, Min Zhang, Qing Li, Xiangliang Zhang
    Journal of Data Acquisition and Processing, 2017, 32 (5): 905-918. 
    Abstract   PDF(1195KB) ( 1073 )  
    With the progress of mobile devices and wireless networks, spatial crowdsourcing (SC) is emerging as a promising approach for problem solving. In SC, spatial tasks are assigned to and performed by a set of human workers. To enable effective task assignment, however, both workers and task requesters are required to disclose their locations to untrusted SC systems. In this paper, we study the problem of assigning workers to tasks in a way that location privacy for both workers and task requesters are preserved. We first combine Paillier cryptosystem with Yao's garbled circuits to construct a secure protocol that assigns the nearest worker to a task. Considering that this protocol cannot scale to a large number of workers, we then make use of Geohash, a hierarchical spatial index to design a more efficient protocol that can securely find approximate nearest workers. We theoretically show that these two protocols are secure against semi-honest adversaries. Through extensive experiments on two real-world datasets, we demonstrate the efficiency and effectiveness of our protocols.
    FIMI:A Constant Frugal Incentive Mechanism for Time Window Coverage in Mobile Crowdsensing
    Jia Xu, Jian-Ren Fu, De-Jun Yang, Li-Jie Xu, Lei Wang, Tao Li
    Journal of Data Acquisition and Processing, 2017, 32 (5): 919-935. 
    Abstract   PDF(1001KB) ( 691 )  
    Mobile crowdsensing has become an efficient paradigm for performing large scale sensing tasks. An incentive mechanism is important for a mobile crowdsensing system to stimulate participants and to achieve good service quality. In this paper, we explore truthful incentive mechanisms that focus on minimizing the total payment for a novel scenario, where the platform needs the complete sensing data in a Requested Time Window (RTW). We model this scenario as a reverse auction and design FIMI, a constant Frugal Incentive Mechanism for tIme window coverage. FIMI consists of two phases, the candidate selection phase and the winner selection phase. In the candidate selection phase, it selects two most competitive disjoint feasible user sets. Afterwards, in the winner selection phase, it finds all the interchangeable user sets through a graph-theoretic approach. For every pair of such user sets, FIMI chooses one of them by the weighted cost. Further, we extend FIMI to the scenario, where the RTW needs to be covered more than once. Through both rigorous theoretical analysis and extensive simulations, we demonstrate that the proposed mechanisms achieve the properties of RTW feasibility (or RTW multi-coverage) computation efficiency, individual rationality, truthfulness, and constant frugality.
    Computer Networks and Distributed Computing
    An Approach to Automatic Performance Prediction for Cloud-enhanced Mobile Applications with Sparse Data
    Wei-Qing, Liu Jing Li
    Journal of Data Acquisition and Processing, 2017, 32 (5): 936-956. 
    Abstract   PDF(2253KB) ( 722 )  
    In mobile cloud Computing (MCC), offloading compute-intensive parts of a mobile application onto the cloud is an attractive method to enhance application performance. To make good offloading decisions, history-based machine-learning techniques are proposed to predict application performance under various offloading schemes. However, the data sparsity problem is common in a realistic MCC scenario but is rarely the concern of existing works. In this paper, we employ a two-phase hybrid framework to predict performance for cloud-enhanced mobile applications, which is designed to be robust to the data sparsity. By training several multi-layer neural networks with historical execution records, the first phase automatically predicts some intermediate parameters for each execution of an application. The models learned by these neural networks can be shared among different applications thus alleviating the data sparsity. Based on these predicted intermediate parameters and the application topology, the second phase deterministically calculates the estimated values of the performance metrics. The deterministic algorithm can partially guarantee the prediction accuracy of newly published applications even with no execution records. We evaluate our approach with a cloud-enhanced object recognition application and show that our approach can precisely predict the application performance and is robust to data sparsity.
    Evaluation of Remote-I/O Support for a DSM-Based Computation Offloading Scheme
    Yuhun Jun, Jaemin Lee, Euiseong Seo
    Journal of Data Acquisition and Processing, 2017, 32 (5): 957-973. 
    Abstract   PDF(1180KB) ( 808 )  
    Computation offloading enables mobile devices to execute rich applications by using the abundant computing resources of powerful server systems. The distributed shared memory based (DSM-based) computation offloading approach is expected to be especially popular in the near future because it can dynamically migrate running threads to computing nodes and does not require any modifications of existing applications to do so. The current DSM-based computation offloading scheme, however, has focused on efficiently offloading computationally intensive applications and has not considered the significant performance degradation caused by processing the I/O requests issued by offloaded threads. Because most mobile applications are interactive and thus yield frequent I/O requests, efficient handling of I/O operations is critically important. In this paper, we quantitatively analyze the performance degradation caused by I/O processing in DSM-based computation offloading schemes using representative commodity applications. To remedy the performance degradation, we apply a remote I/O scheme based on remote device support to computation offloading. The proposed approach improves the execution time by up to 43.6% and saves up to 17.7% of energy consumption in comparison with the existing offloading schemes. Selective compression of the remote I/O scheme reduces the network traffic by up to 53.5%.
    Flexible CP-ABE Based Access Control on Encrypted Data for Mobile Users in Hybrid Cloud System
    Wen-Min Li, Xue-Lei Li, Qiao-Yan Wen, Shuo Zhang, Hua Zhang
    Journal of Data Acquisition and Processing, 2017, 32 (5): 974-990. 
    Abstract   PDF(958KB) ( 853 )  
    In hybrid cloud computing, encrypted data access control can provide a fine-grained access method for organizations to enact policies closer to organizational policies. This paper presents an improved CP-ABE scheme to construct an encrypted data access control solution that is suitable for mobile users in hybrid cloud system. In our improvement, we split the original decryption keys into a control key, a secret key and a set of transformation keys. The private cloud managed by the organization administrator takes charge of updating the transformation keys using the control key. It helps to handle the situation of flexible access management and attribute alteration. Meanwhile, the mobile user's single secret key remains unchanged as well as the ciphertext even if the data user's attribute had been revoked. In addition, we modify the access control list through adding the attributes with corresponding control key and transformation keys so as to manage user privileges depending upon the system version. Finally, the analysis shows that our scheme is secure, flexible and efficient to be applied in mobile hybrid cloud computing.
    MimiBS:Mimicking Base-Station to Provide Location Privacy Protection in Wireless Sensor Networks
    Yawar Abbas Bangash, Ling-Fang Zeng, Dan Feng
    Journal of Data Acquisition and Processing, 2017, 32 (5): 991-1007. 
    Abstract   PDF(1378KB) ( 765 )  
    In a wireless sensor network (WSN), sink node/base station (BS) gathers data from surrounding nodes and sends them to a remote server via a gateway. BS holds important data. Therefore, it is necessary to hide its location from an inside/outside attacker. Providing BS location anonymity against a local and global adversary, we propose a novel technique called MimiBS "Mimicking Base-Station".The key idea is the integration of aggregator nodes (ANs) with sensor nodes (SNs), while fine tuning TTL (Time to Live) value for fake packets, and setting some threshold value for real packet counter rpctr. MimiBS creates multiple traffic-hotspots (zones), which shifts the focus from one hotspot (BS) to the newly created ANs hotspots. Multiple traffic-hotspots confuse the adversary while determining the real BS location information. We defend the BS location information anonymity against traffic analysis attack, and traffic tracing attack. MimiBS gives an illusion of having multiple BSs, thus, if the attacker knows any about AN, he/she will be deceived between the real BS and ANs. MimiBS outperforms BLAST (base-station location anonymity and security technique), RW (random walk), and SP (shortest path), while conducting routing without fake packets, with fake packets, without energy consideration, and with energy consideration.
    Regular Paper
    Differentially Private Event Histogram Publication on Sequences over Graphs
    Ning Wang, Yu Gu, Jia Xu, Fang-Fang Li, Ge Yu
    Journal of Data Acquisition and Processing, 2017, 32 (5): 1008-1024. 
    Abstract   PDF(672KB) ( 861 )  
    The big data era is coming with strong and ever-growing demands on analyzing personal information and footprints in the cyber world. To enable such analysis without privacy leak risk, differential privacy (DP) is quickly rising in recent years, as the first practical privacy protection model with rigorous theoretical guarantee. This paper discusses how to publish differentially private histograms on events in time series domain, with sequences of personal events over graphs with events as edges. Such individual-generated sequences commonly appear in formalized industrial workflows, online game logs and spatial-temporal trajectories, the direct publication of which may compromise personal privacy. While existing DP mechanisms mainly target at normalized domains with fixed and aligned dimensions, our problem raises new challenges when the sequences could follow arbitrary paths on the graph. To tackle the problem, we reformulate the problem with a three-step framework, which 1) carefully truncates the original sequences, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy, 2) decomposes the event graph into path sub-domains based on the given query workload, and 3) employs a deeply optimized tree-based histogram construction approach for each sub-domain to benefit with less noise addition. We present a careful analysis on our framework to support thorough optimizations over each step of the framework, and verify the huge improvements of our proposals over state-of-the-art solutions.
    Visual Specification and Analysis of Contract-Based Software Architectures
    Mert Ozkaya
    Journal of Data Acquisition and Processing, 2017, 32 (5): 1025-1043. 
    Abstract   PDF(5929KB) ( 874 )  
    XCD is a design-by-contract based architecture description language that supports modular specifications in terms of components and connectors (i.e.,interaction protocols).XCD is supported by a translator that produces formal models in SPIN's ProMeLa formal verification language,which can then be formally analysed using SPIN's model checker. XCD is extended with a visual notation set called VXCD.VXCD extends UML's component diagram and adapts it to XCD's structure,contractual behaviour,and interaction protocol specifications.Visual VXCD specifications can be translated into textual XCD specifications for formal analysis.To illustrate VXCD,the well-known gas station system is used.The gas system is specified contractually using VXCD's visual notation set and then formally analysed using SPIN's model checker for a number of properties including deadlock and race-condition.
    Three-Layer Joint Modeling of Chinese Trigger Extraction with Constraints on Trigger and Argument Semantics
    Pei-Feng Li, Guo-Dong Zhou
    Journal of Data Acquisition and Processing, 2017, 32 (5): 1044-1056. 
    Abstract   PDF(366KB) ( 752 )  
    As a subtask of Information Extraction (IE), which aims to extract structured information from a text, event extraction is to recognize event trigger mentions of a predefined event type and their arguments. In general, event extraction can be divided into two subtasks:trigger extraction and argument extraction. Currently, the frequent existences of un-annotated trigger mentions and poor-context trigger mentions impose critical challenges in Chinese trigger extraction. This paper proposes a novel three-layer joint model to integrate three components in trigger extraction, i.e., trigger identification, event type determination and event subtype determination. In this way, different evidence on distinct pseudo samples can be well captured to eliminate the harmful effects of those un-annotated trigger mentions. In addition, this paper introduces various types of linguistically driven constraints on trigger and argument semantics into the joint model to recover those poor-context trigger mentions. The experimental results show that our joint model significantly outperforms the state-of-the-art Chinese trigger extraction and Chinese event extraction as a whole.
SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved