Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      16 November 2019, Volume 34 Issue 6   
    For Selected: View Abstracts Toggle Thumbnails
    Data Management and Data Mining
    HybridTune: Spatio-Temporal Performance Data Correlation for Performance Diagnosis of Big Data Systems
    Rui Ren, Jiechao Cheng, Xi-Wen He, Lei Wang, Jian-Feng Zhan, Wan-Ling Gao, Chun-Jie Luo
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1167-1184. 
    With tremendous growing interests in Big Data, the performance improvement of Big Data systems becomes more and more important. Among many steps, the first one is to analyze and diagnose performance bottlenecks of the Big Data systems. Currently, there are two major solutions. One is the pure data-driven diagnosis approach, which may be very time-consuming; the other is the rule-based analysis method, which usually requires prior knowledge. For Big Data applications like Spark workloads, we observe that the tasks in the same stages normally execute the same or similar codes on each data partition. On basis of the stage similarity and distributed characteristics of Big Data systems, we analyze the behaviors of the Big Data applications in terms of both system and micro-architectural metrics of each stage. Furthermore, for different performance problems, we propose a hybrid approach that combines prior rules and machine learning algorithms to detect performance anomalies, such as straggler tasks, task assignment imbalance, data skew, abnormal nodes and outlier metrics. Following this methodology, we design and implement a lightweight, extensible tool, named HybridTune, and measure the overhead and anomaly detection effectiveness of HybridTune using the BigDataBench benchmarks. Our experiments show that the overhead of HybridTune is only 5%, and the accuracy of outlier detection algorithm reaches up to 93%. Finally, we report several use cases diagnosing Spark and Hadoop workloads using BigDataBench, which demonstrates the potential use of HybridTune.
    An Efficient Framework for Multiple Subgraph Pattern Matching Models
    Jiu-Ru Gao, Wei Chen, Jia-Jie Xu, An Liu, Zhi-Xu Li, Hongzhi Yin, Lei Zhao
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1185-1202. 
    With the popularity of storing large data graph in cloud, the emergence of subgraph pattern matching on a remote cloud has been inspired. Typically, subgraph pattern matching is defined in terms of subgraph isomorphism, which is an NP-complete problem and sometimes too strict to find useful matches in certain applications. And how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results is an important concern. Thus, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. In order to protect the structural privacy in data graphs, we firstly develop a k-automorphism model based method. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. During the generation of the k-automorphic graph, a large number of noise edges or vertices might be introduced to the original data graph. Thus, we use the outsourced graph, which is only a subset of a k-automorphic graph, to answer the subgraph pattern matching. The efficiency of the pattern matching process can be greatly improved in this way. Extensive experiments on real-world datasets demonstrate the high efficiency of our framework.
    Interval Estimation for Aggregate Queries on Incomplete Data
    An-Zhen Zhang, Jian-Zhong Li, Hong Gao
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1203-1216. 
    Incomplete data has been a longstanding issue in the database community, and the subject is yet poorly handled by both theories and practices. One common way to cope with missing values is to complete their imputation (filling in) as a preprocessing step before analyses. Unfortunately, not a single imputation method could impute all missing values correctly in all cases. Users could hardly trust the query result on such complete data without any confidence guarantee. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than to impute the missing values. An interval estimation, composed of the upper and the lower bound of aggregate query results among all possible interpretations of missing values, is presented to the end users. The ground-truth aggregate result is guaranteed to be among the interval. We believe that decision support applications could benefit significantly from the estimation, since they can tolerate inexact answers, as long as there are clearly defined semantics and guarantees associated with the results. Our main techniques are parameter-free and do not assume prior knowledge about the distribution and missingness mechanisms. Experimental results are consistent with the theoretical results and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data.
    Adversarial Heterogeneous Network Embedding with Metapath Attention Mechanism
    Chun-Yang Ruan, Ye Wang, Jiangang Ma, Yanchun Zhang, Xin-Tian Chen
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1217-1229. 
    Heterogeneous information network (HIN)-structured data provide an effective model for practical purposes in real world. Network embedding is fundamental for supporting the network-based analysis and prediction tasks. Methods of network embedding that are currently popular normally fail to effectively preserve the semantics of HIN. In this study, we propose AGA2Vec, a generative adversarial model for HIN embedding that uses attention mechanisms and meta-paths. To capture the semantic information from multi-typed entities and relations in HIN, we develop a weighted meta-path strategy to preserve the proximity of HIN. We then use an autoencoder and a generative adversarial model to obtain robust representations of HIN. The results of experiments on several real-world datasets show that the proposed approach outperforms state-of-the-art approaches for HIN embedding.
    Modeling Temporal Dynamics of Users' Purchase Behaviors for Next Basket Prediction
    Pengfei Wang, Yongfeng Zhang, Shuzi Niu, Jiafeng Guo
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1230-1240. 
    Next basket prediction attempts to provide sequential recommendations to users based on a sequence of the user's previous purchases. Ideally, a good prediction model should be able to explore the personalized preference of the users, as well as the sequential relations of the items. This goal of modeling becomes even more challenging when both factors are time-dependent. However, existing methods either take these two aspects as static, or only consider temporal dynamics for one of the two aspects. In this work, we propose the dynamic representation learning approach for time-dependent next basket recommendation, which jointly models the dynamic nature of user preferences and item relations. To do so, we explicitly model the transaction timestamps, as well as the dynamic representations of both users and items, so as to capture the personalized user preference on each individual item dynamically. Experiments on three real-world retail datasets show that our method significantly outperforms several state-of-the-art methods for next basket recommendation.
    Artificial Intelligence and Pattern Recognition
    Large-Scale Estimation of Distribution Algorithms with Adaptive Heavy Tailed Random Projection Ensembles
    Momodou L. Sanyang, Ata Kabán
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1241-1257. 
    We present new variants of Estimation of Distribution Algorithms (EDA) for large-scale continuous optimisation that extend and enhance a recently proposed random projection (RP) ensemble based approach. The main novelty here is to depart from the theory of RPs that require (sub-)Gaussian random matrices for norm-preservation, and instead for the purposes of high-dimensional search we propose to employ random matrices with independent and identically distributed entries drawn from a t-distribution. We analytically show that the implicitly resulting high-dimensional covariance of the search distribution is enlarged as a result. Moreover, the extent of this enlargement is controlled by a single parameter, the degree of freedom. For this reason, in the context of optimisation, such heavy tailed random matrices turn out to be preferable over the previously employed (sub-)Gaussians. Based on this observation, we then propose novel covariance adaptation schemes that are able to adapt the degree of freedom parameter during the search, and give rise to a flexible approach to balance exploration versus exploitation. We perform a thorough experimental study on high-dimensional benchmark functions, and provide statistical analyses that demonstrate the state-of-the-art performance of our approach when compared with existing alternatives in problems with 1 000 search variables.
    Progressive Furniture Model Decimation with Texture Preservation
    Zhi-Guang Pan, Chu-Hua Xian, Shuo Jin, Gui-Qing Li
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1258-1268. 
    In digital furniture design, skillful designers usually use professional software to create new furniture designs with various textures and then take advantage of rendering tools to produce eye-catching design results. Generally, a finegrained furniture model holds many geometric details, inducing significant time cost to model rendering and large data size for storage that are not desired in application scenarios where efficiency is greatly emphasized. To accelerate the rendering process while keeping good rendering results as many as possible, we develop a novel decimation technique which not only reduces the number of faces on furniture models, but also retains their geometric and texture features. Two metrics are utilized in our approach to measure the distortion of texture features. Considering these two metrics as guidance for decimation, high texture distortion can be avoided in simplifying the geometric models. Therefore, we are able to build multi-level representations with different detail levels based on the initial design. Our experimental results show that the developed technique can achieve excellent visual effects on the decimated furniture model.
    Weakly- and Semi-Supervised Fast Region-Based CNN for Object Detection
    Xing-Gang Wang, Jia-Si Wang, Peng Tang, Wen-Yu Liu
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1269-1278. 
    Learning an effective object detector with little supervision is an essential but challenging problem in computer vision applications. In this paper, we consider the problem of learning a deep convolutional neural network (CNN) based object detector using weakly-supervised and semi-supervised information in the framework of fast region-based CNN (Fast R-CNN). The target is to obtain an object detector as accurate as the fully-supervised Fast R-CNN, but it requires less image annotation effort. To solve this problem, we use weakly-supervised training images (i.e., only the image-level annotation is given) and a few proportions of fully-supervised training images (i.e., the bounding box level annotation is given), that is a weakly- and semi-supervised (WASS) object detection setting. The proposed solution is termed as WASS R-CNN, in which there are two main components. At first, a weakly-supervised R-CNN is firstly trained; after that semi-supervised data are used for finetuning the weakly-supervised detector. We perform object detection experiments on the PASCAL VOC 2007 dataset. The proposed WASS R-CNN achieves more than 85% of a fully-supervised Fast R-CNN's performance (measured using mean average precision) with only 10% of fully-supervised annotations together with weak supervision for all training images. The results show that the proposed learning framework can significantly reduce the labeling efforts for obtaining reliable object detectors.
    Computer Graphics and Multimedia
    A Geometric Strategy Algorithm for Orthogonal Projection onto a Parametric Surface
    Xiaowu Li, Zhinan Wu, Feng Pan, Juan Liang, Jiafeng Zhang, Linke Hou
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1279-1293. 
    In this paper, we investigate how to compute the minimum distance between a point and a parametric surface, and then to return the nearest point (foot point) on the surface as well as its corresponding parameter, which is also called the point projection problem of a parametric surface. The geometric strategy algorithm (hereafter GSA) presented consists of two parts as follows. The normal curvature to a given parametric surface is used to find the corresponding foot point firstly, and then the Taylor's expansion of the parametric surface is employed to compute parameter increments and to get the iteration formula to calculate the orthogonal projection point of test point to the parametric surface. Our geometric strategy algorithm is essentially dependent on the geometric property of the normal curvature, and performs better than existing methods in two ways. Firstly, GSA converges faster than existing methods, such as the method to turn the problem into a root-finding of nonlinear system, subdividing methods, clipping methods, geometric methods (tangent vector and geometric curvature) and hybrid second-order method, etc. Specially, it converges faster than the classical Newton's iterative method. Secondly, GSA is independent of the initial iterative value, which we prove in Theorem 1. Many numerical examples confirm GSA's robustness and efficiency.
    Artistic Augmentation of Photographs with Droplets
    Mo-Han Zhang, Jin-Hui Yu, Kang Zhang, Jun-Song Zhang
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1294-1306. 
    Artistic augmentation of photographs with water droplets aims at generating aesthetic yet realistic images, and thus differs from traditional augmented reality in two aspects. One difference lies in the adoption of a new image as the environment map in order to render reflected or refracted effects on the surface of inserted water droplets. The other difference is in modeling of water droplets including hanging droplets and resting droplets. These differences raise two research challenges:1) how to adjust the brightness and colors of the new environment map to maintain visual consistency between the new environment map and the original input image; 2) how to model hanging and resting droplets aesthetically. This paper proposes a framework that addresses these two challenges and demonstrates the effectiveness of our framework by generating example augmented images.
    Automatic Diabetic Retinopathy Screening via Cascaded Framework Based on Image- and Lesion-Level Features Fusion
    Cheng-Zhang Zhu, Rong Hu, Bei-Ji Zou, Rong-Chang Zhao, Chang-Long Chen, Ya-Long Xiao
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1307-1318. 
    The early detection of diabetic retinopathy is crucial for preventing blindness. However, it is time-consuming to analyze fundus images manually, especially considering the increasing amount of medical images. In this paper, we propose an automatic diabetic retinopathy screening method using color fundus images. Our approach consists of three main components:edge-guided candidate microaneurysms detection, candidates classification using mixed features, and diabetic retinopathy prediction using fused features of image level and lesion level. We divide a screening task into two sub-classification tasks:1) verifying candidate microaneurysms by a naive Bayes classifier; 2) predicting diabetic retinopathy using a support vector machine classifier. Our approach can effectively alleviate the imbalanced class distribution problem. We evaluate our method on two public databases:Lariboisière and Messidor, resulting in an area under the curve of 0.908 on Lariboisière and 0.832 on Messidor. These scores demonstrate the advantages of our approach over the existing methods.
    Computer Networks and Distributed Computing
    Security Attacks in Named Data Networking: A Review and Research Directions
    Naveen Kumar, Ashutosh Kumar Singh, Abdul Aleem, Shashank Srivastava
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1319-1350. 
    Contents such as audios, videos, and images, contribute most of the Internet traffic in the current paradigm. Secure content sharing is a tedious issue. The existing security solutions do not secure data but secure the communicating endpoints. Named data networking (NDN) secures the data by enforcing the data publisher to sign the data. Any user can verify the data by using the public key of the publisher. NDN is resilient to most of the probable security attacks in the TCP/IP model due to its new architecture. However, new types of attacks are possible in NDN. This article surveys the most significant security attacks in NDN such as interest flooding attacks, cache privacy attacks, cache pollution attacks, and content poisoning attacks. Each attack is classified according to their behavior and discussed for their detection techniques, countermeasures, and the affected parameters. The article is an attempt to help new researchers in this area to gather the domain knowledge of NDN. The article also provides open research issues that could be addressed by researchers.
    An Efficient Approach for Mitigating Covert Storage Channel Attacks in Virtual Machines by the Anti-Detection Criterion
    Chong Wang, Nasro Min-Allah, Bei Guan, Yu-Qi Lin, Jing-Zheng Wu, Yong-Ji Wang
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1351-1365. 
    Covert channels have been an effective means for leaking confidential information across security domains and numerous studies are available on typical covert channels attacks and defenses. Existing covert channel threat restriction solutions are based on the threat estimation criteria of covert channels such as capacity, accuracy, and short messages which are effective in evaluating the information transmission ability of a covert (storage) channel. However, these criteria cannot comprehensively reflect the key factors in the communication process such as shared resources and synchronization and therefore are unable to evaluate covertness and complexity of increasingly upgraded covert storage channels. As a solution, the anti-detection criterion was introduced to eliminate these limitations of cover channels. Though effective, most threat restriction techniques inevitably incur high performance overhead and hence become impractical. In this work, we avoid such overheads and present a restriction algorithm based on the anti-detection criterion to restrict threats that are associated with covert storage channels in virtual machines while maintaining the resource efficiency of the systems. Experimental evaluation shows that our proposed solution is able to counter covert storage channel attacks in an effective manner. Compared with Pump, a well-known traditional restriction algorithm used in practical systems, our solution significantly reduces the system overhead.
    Theory and Algorithms
    Tightly Secure Public-Key Cryptographic Schemes from One-More Assumptions
    Ge Wu, Jian-Chang Lai, Fu-Chun Guo, Willy Susilo, Fu-Tai Zhang
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1366-1379. 
    A tightly secure cryptographic scheme refers to a construction with a tight security reduction to a hardness assumption, where the reduction loss is a small constant. A scheme with tight security is preferred in practice since it could be implemented using a smaller parameter to improve efficiency. Recently, Bader et al. (EUROCRYPT 2016) have proposed a comprehensive study on the impossible tight security reductions for certain (e.g., key-unique) public-key cryptographic schemes in the multi-user with adaptive corruptions (MU-C) setting built upon non-interactive assumptions. The assumptions of one-more version, such as one-more computational Diffie-Hellman (n-CDH), are variants of the standard assumptions and have found various applications. However, whether it is possible to have tightly secure key-unique schemes from the one-more assumptions or the impossible tight reduction results also hold for these assumptions remains unknown. In this paper, we give affirmative answers to the above question, i.e., we can have efficient key-unique public-key cryptographic schemes with tight security built upon the one-more assumptions. Specifically, we propose a digital signature scheme and an encryption scheme, both of which are key-unique and have tight MU-C security under the one-more computational Diffie-Hellman (n-CDH) assumption. Our results also reflect from another aspect that there indeed exists a gap between the standard assumptions and their one-more version counterparts.
    2019 Contents
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1380-1383. 
    Abstract   PDF(74KB) ( 215 )  
    2019 Author Index
    Journal of Data Acquisition and Processing, 2019, 34 (6): 1384-1388. 
    Abstract   PDF(39KB) ( 69 )  

    Article

    EFFICIENCY ASSESSMENT OF SEARCH ENGINES WITH IMPROVED VSM & ENTROPY BASED LINK OPTIMIZATION ALGORITHM
    Siddharth Ghansela
    Journal of Data Acquisition and Processing, 2022, 37 (4): 1389-1398 . 
    A NONINVASIVE METHOD OF EVALUATING INVARIANT BASAL CELL LAYER FINGER PRINT FEATURE IDENTIFICATION USING MINUTIAE MATCHING TECHNIQUE
    Manjit Sandhu1, Sandeep Kaur*2, Mahesh Chander3, Tajinder Kaur4
    Journal of Data Acquisition and Processing, 2022, 37 (4): 1399-1413 . 
SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved