Bimonthly    Since 1986
ISSN 1004-9037
Indexed in:
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
  • Table of Content
      20 September 2020, Volume 35 Issue 5   
    For Selected: View Abstracts Toggle Thumbnails
    Special Section on Software Systems 2020—Part 1
    Mining Design Pattern Use Scenarios and Related Design Pattern Pairs: A Case Study on Online Posts
    Dong Liu, Zhi-Lei Ren, Zhong-Tian Long, Guo-Jun Gao, He Jiang
    Journal of Data Acquisition and Processing, 2020, 35 (5): 963-978. 
    In common design pattern collections, e.g., design pattern books, design patterns are documented with templates that consist of multiple attributes, such as intent, structure, and sample code. To adapt to modern developers, the depictions of design patterns, especially some specific attributes, should advance with the current programming technologies, for example, “known uses”, which exemplifies the use scenarios of design patterns in practice, and “related patterns”, which describes the relatedness between a design pattern and the others within a context. However, it is not easy to update the contents of these attributes manually due to the diversity of the programming technologies. To address this problem, in this work, we conducted a case study to mine design pattern use scenarios and related design pattern pairs from Stack Overflow posts to enrich the two attributes. We first extracted the question posts relevant to each design pattern by identifying the design pattern tags. Then, the topics of the posts were discovered by applying topic modeling techniques. Finally, by analyzing the topics specified for each design pattern, we detected 195 design pattern use scenarios and 70 related design pattern pairs, involving 61 design patterns totally. These findings are associated with a variety of popular software frameworks and programming techniques. They could complement the existing design pattern collections and help developers better acknowledge the usage and relatedness of design patterns in today's programming practice.
    FATOC: Bug Isolation Based Multi-Fault Localization by Using OPTICS Clustering
    Yong-Hao Wu, Zheng Li, Yong Liu, Xiang Chen
    Journal of Data Acquisition and Processing, 2020, 35 (5): 979-998. 
    Bug isolation is a popular approach for multi-fault localization (MFL), where all failed test cases are clustered into several groups, and then the failed test cases in each group combined with all passed test cases are used to localize only a single fault. However, existing clustering algorithms cannot always obtain completely correct clustering results, which is a potential threat for bug isolation based MFL approaches. To address this issue, we first analyze the influence of the accuracy of the clustering on the performance of MFL, and the results of a controlled study indicate that using the clustering algorithm with the highest accuracy can achieve the best performance of MFL. Moreover, previous studies on clustering algorithms also show that the elements in a higher density cluster have a higher similarity. Based on the above motivation, we propose a novel approach FATOC (One-Fault-at-a-Time via OPTICS Clustering). In particular, FATOC first leverages the OPTICS (Ordering Points to Identify the Clustering Structure) clustering algorithm to group failed test cases, and then identifies a cluster with the highest density. OPTICS clustering is a density-based clustering algorithm, which can reduce the misgrouping and calculate a density value for each cluster. Such a density value of each cluster is helpful for finding a cluster with the highest clustering effectiveness. FATOC then combines the failed test cases in this cluster with all passed test cases to localize a single-fault through the traditional spectrum-based fault localization (SBFL) formula. After this fault is localized and fixed, FATOC will use the same method to localize the next single-fault, until all the test cases are passed. Our evaluation results show that FATOC can significantly outperform the traditional SBFL technique and a state-of-the-art MFL approach MSeer on 804 multi-faulty versions from nine real-world programs. Specifically, FATOC's performance is 10.32% higher than that of traditional SBFL when using Ochiai formula in terms of metric A-EXAM. Besides, the results also indicate that, when checking 1%, 3% and 5% statements of all subject programs, FATOC can locate 36.91%, 48.50% and 66.93% of all faults respectively, which is also better than the traditional SBFL and the MFL approach MSeer.
    Predicted Robustness as QoS for Deep Neural Network Models
    Yue-Huan Wang, Ze-Nan Li, Jing-Wei Xu, Ping Yu, Taolue Chen, Xiao-Xing Ma
    Journal of Data Acquisition and Processing, 2020, 35 (5): 999-1015. 
    The adoption of deep neural network (DNN) model as the integral part of real-world software systems necessitates explicit consideration of their quality-of-service (QoS). It is well-known that DNN models are prone to adversarial attacks, and thus it is vitally important to be aware of how robust a model's prediction is for a given input instance. A fragile prediction, even with high confidence, is not trustworthy in light of the possibility of adversarial attacks. We propose that DNN models should produce a robustness value as an additional QoS indicator, along with the confidence value, for each prediction they make. Existing approaches for robustness computation are based on adversarial searching, which are usually too expensive to be excised in real time. In this paper, we propose to predict, rather than to compute, the robustness measure for each input instance. Specifically, our approach inspects the output of the neurons of the target model and trains another DNN model to predict the robustness. We focus on convolutional neural network (CNN) models in the current research. Experiments show that our approach is accurate, with only 10%–34% additional errors compared with the offline heavy-weight robustness analysis. It also significantly outperforms some alternative methods. We further validate the effectiveness of the approach when it is applied to detect adversarial attacks and out-of-distribution input. Our approach demonstrates a better performance than, or at least is comparable to, the state-of-the-art techniques.
    EasyModel: A Refinement-Based Modeling and Verification Approach for Self-Adaptive Software
    De-Shuai Han, Qi-Liang Yang, Jian-Chun Xing, Guang-Lian Ma
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1016-1046. 
    Self-adaptive software (SAS) is gaining popularity as it can reconfigure itself in response to the dynamic changes in the operational context or itself. However, early modeling and formal analysis of SAS systems becomes increasingly difficult, as the system scale and complexity is rapidly increasing. To tackle the modeling difficulty of SAS systems, we present a refinement-based modeling and verification approach called EasyModel. EasyModel integrates the intuitive Unified Modeling Language (UML) model with the stepwise refinement Event-B model. Concretely, EasyModel: 1) creates a UML profile called AdaptML that provides an explicit description of SAS characteristics, 2) proposes a refinement modeling mechanism for SAS systems that can deal with system modeling complexity, 3) offers a model transformation approach and bridges the gap between the design model and the formal model of SAS systems, and 4) provides an efficient way to verify and guarantee the correct behaviour of SAS systems. To validate EasyModel, we present an example application and a subject-based experiment. The results demonstrate that EasyModel can effectively reduce the modeling and formal verification difficulty of SAS systems, and can incorporate the intuitive merit of UML and the correct-by-construction merit of Event-B.
    Computer Networks and Distributed Computing
    Detecting Anomalous Bus-Driving Behaviors from Trajectories
    Zhao-Yang Wang, Bei-Hong Jin, Tingjian Ge, Tao-Feng Xue
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1047-1063. 
    In urban transit systems, discovering anomalous bus-driving behaviors in time is an important technique for monitoring the safety risk of public transportation and improving the satisfaction of passengers. This paper proposes a twophase approach named Cygnus to detect anomalous driving behaviors from bus trajectories, which utilizes collected sensor data of smart phones as well as subjective assessments from bus passengers by crowd sensing. By optimizing support vector machines, Cygnus discovers the anomalous bus trajectory candidates in the first phase, and distinguishes real anomalies from the candidates, as well as identifies the types of driving anomalies in the second phase. To improve the anomaly detection performance and robustness, Cygnus introduces virtual labels of trajectories and proposes a correntropy-based policy to improve the robustness to noise, combines the unsupervised anomaly detection and supervised classification, and further refines the classification procedure, thus forming an integrated and practical solution. Extensive experiments are conducted on real-world bus trajectories. The experimental results demonstrate that Cygnus detects anomalous bus-driving behaviors in an effective, robust, and timely manner.
    Fault-Tolerant Hamiltonicity and Hamiltonian Connectivity of BCube with Various Faulty Elements
    Gui-Juan Wang, Cheng-Kuan Lin, Jian-Xi Fan, Jing-Ya Zhou, Bao-Lei Cheng
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1064-1083. 
    BCube is one kind of important data center networks. Hamiltonicity and Hamiltonian connectivity have significant applications in communication networks. So far, there have been many results concerning fault-tolerant Hamiltonicity and fault-tolerant Hamiltonian connectivity in some data center networks. However, these results only consider faulty edges and faulty servers. In this paper, we study the fault-tolerant Hamiltonicity and the fault-tolerant Hamiltonian connectivity of BCube(n, k) under considering faulty servers, faulty links/edges, and faulty switches. For any integers n ≥ 2 and k ≥ 0, let BCn,k be the logic structure of BCube(n, k) and F be the union of faulty elements of BCn,k. Let fv, fe, and fs be the number of faulty servers, faulty edges, and faulty switches of BCube(n, k), respectively. We show that BCn,k-F is fault-tolerant Hamiltonian if fv + fe + (n-1)fs ≤ (n-1)(k + 1)-2 and BCn,k-F is fault-tolerant Hamiltonian-connected if fv + fe + (n-1)fs ≤ (n-1)(k + 1)-3. To the best of our knowledge, this paper is the first work which takes faulty switches into account to study the fault-tolerant Hamiltonicity and the fault-tolerant Hamiltonian connectivity in data center networks.
    A Spatiotemporal Causality Based Governance Framework for Noisy Urban Sensory Data
    Bi-Ying Yan, Chao Yang, Pan Deng, Qiao Sun, Feng Chen, Yang Yu
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1084-1098. 
    Urban sensing is one of the fundamental building blocks of urban computing. It uses various types of sensors deployed in different geospatial locations to continuously and cooperatively monitor the natural and cultural environment in urban areas. Nevertheless, issues such as uneven distribution, low sampling rate and high failure ratio of sensors often make their readings less reliable. This paper provides an innovative framework to detect the noise data as well as to repair them from a spatial-temporal causality perspective rather than to deal with them individually. This can be achieved by connecting data through monitored objects, using the Skip-gram model to estimate spatial correlation and long shortterm memory to estimate temporal correlation. The framework consists of three major modules: 1) a space embedded Bidirectional Long Short-Term Memory (BiLSTM)-based sequence labeling module to detect the noise data and the latent missing data; 2) a space embedded BiLSTM-based sequence predicting module calculating the value of the missing data; 3) an object characteristics fusion repairing module to correct the spatial and temporal dislocation sensory data. The approach is evaluated with real-world data collected by over 3 000 electronic traffic bayonet devices in a citywide scale of a medium-sized city in China, and the result is superior to those of several referenced approaches. With a 12.9% improvement in data accuracy over the raw data, the proposed framework plays a significant role in various real-world use cases in urban governance, such as criminal investigation, traffic violation monitoring, and equipment maintenance.
    Minimum Time Extrema Estimation for Large-Scale Radio-Frequency Identification Systems
    Xiao-Jun Zhu, Li-Jie Xu, Xiao-Bing Wu, Bing Chen
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1099-1114. 
    We consider the extrema estimation problem in large-scale radio-frequency identification (RFID) systems, where there are thousands of tags and each tag contains a finite value. The objective is to design an extrema estimation protocol with the minimum execution time. Because the standard binary search protocol wastes much time due to interframe overhead, we propose a parameterized protocol and treat the number of slots in a frame as an unknown parameter. We formulate the problem and show how to find the best parameter to minimize the worst-case execution time. Finally, we propose two rules to further reduce the execution time. The first is to find and remove redundant frames. The second is to concatenate a frame from minimum value estimation with a frame from maximum value estimation to reduce the total number of frames. Simulations show that, in a typical scenario, the proposed protocol reduces execution time by 79% compared with the standard binary search protocol.
    Artificial Intelligence and Pattern Recognition
    Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node
    Nuo Qun, Hang Yan, Xi-Peng Qiu, Xuan-Jing Huang
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1115-1126. 
    Semi-Markov conditional random fields (Semi-CRFs) have been successfully utilized in many segmentation problems, including Chinese word segmentation (CWS). The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences. Despite its theoretical advantage, Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentence's length. In this paper, we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity. Specifically, we first adopt a bi-directional long short-term memory (BiLSTM) on character level to model the context information, and then use simple but effective fusion layer to represent the segment information. Besides, to model arbitrarily long segments within linear time complexity, we also propose a new model named Semi-CRFRelay. The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings. Experiments on four popular CWS datasets show the effectiveness of our proposed methods. The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/.
    Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults
    Andrea Caroppo, Alessandro Leone, Pietro Siciliano
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1127-1146. 
    Facial expression recognition is one of the most active areas of research in computer vision since one of the non-verbal communication methods by which one understands the mood/mental state of a person is the expression of face. Thus, it has been used in various fields such as human-robot interaction, security, computer graphics animation, and ambient assistance. Nevertheless, it remains a challenging task since existing approaches lack generalizability and almost all studies ignore the effects of facial attributes, such as age, on expression recognition even though the research indicates that facial expression manifestation varies with age. Recently, a lot of progress has been made in this topic and great improvements in classification task were achieved with the emergence of deep learning methods. Such approaches have shown how hierarchies of features can be directly learned from original data, thus avoiding classical hand designed feature extraction methods that generally rely on manual operations with labelled data. However, research papers systematically exploring the performance of existing deep architectures for the task of classifying expression of ageing adults are absent in the literature. In the present work a tentative to try this gap is done considering the performance of three recent deep convolutional neural networks models (VGG-16, AlexNet and GoogLeNet/Inception V1) and evaluating it on four different benchmark datasets (FACES, Lifespan, CIFE, and FER2013 ) which also contain facial expressions performed by elderly subjects. As the baseline, and with the aim of making a comparison, two traditional machine learning approaches based on handcrafted features extraction process are evaluated on the same datasets. Carrying out an exhaustive and rigorous experimentation focused on the concept of “transfer learning”, which consists of replacing the output level of the deep architectures considered with new output levels appropriate to the number of classes (facial expressions), and training three different classifiers (i.e., Random Forest, Support Vector Machine and Linear Regression), VGG-16 deep architecture in combination with Random Forest classifier was found to be the best in terms of accuracy for each dataset and for each considered age-group. Moreover, the experimentation stage showed that the deep learning approach significantly improves the baseline approaches considered, and the most noticeable improvement was obtained when considering facial expressions of ageing adults.
    Regular Paper
    Machine Learning Techniques for Software Maintainability Prediction: Accuracy Analysis
    Sara Elmidaoui, Laila Cheikhi, Ali Idri, Alain Abran
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1147-1174. 
    Maintaining software once implemented on the end-user side is laborious and, over its lifetime, is most often considerably more expensive than the initial software development. The prediction of software maintainability has emerged as an important research topic to address industry expectations for reducing costs, in particular, maintenance costs. Researchers and practitioners have been working on proposing and identifying a variety of techniques ranging from statistical to machine learning (ML) for better prediction of software maintainability. This review has been carried out to analyze the empirical evidence on the accuracy of software product maintainability prediction (SPMP) using ML techniques. This paper analyzes and discusses the findings of 77 selected studies published from 2000 to 2018 according to the following criteria: maintainability prediction techniques, validation methods, accuracy criteria, overall accuracy of ML techniques, and the techniques offering the best performance. The review process followed the well-known systematic review process. The results show that ML techniques are frequently used in predicting maintainability. In particular, artificial neural network (ANN), support vector machine/regression (SVM/R), regression & decision trees (DT), and fuzzy & neuro fuzzy (FNF) techniques are more accurate in terms of PRED and MMRE. The N-fold and leave-one-out cross-validation methods, and the MMRE and PRED accuracy criteria are frequently used in empirical studies. In general, ML techniques outperformed non-machine learning techniques, e.g., regression analysis (RA) techniques, while FNF outperformed SVM/R, DT, and ANN in most experiments. However, while many techniques were reported superior, no specific one can be identified as the best.
    Evaluating and Improving Linear Regression Based Profiling: On the Selection of Its Regularization
    Xiang-Jun Lu, Chi Zhang, Da-Wu Gu, Jun-Rong Liu, Qian Peng, Hai-Feng Zhang
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1175-1197. 
    Side-channel attacks (SCAs) play an important role in the security evaluation of cryptographic devices. As a form of SCAs, profiled differential power analysis (DPA) is among the most powerful and efficient by taking advantage of a profiling phase that learns features from a controlled device. Linear regression (LR) based profiling, a special profiling method proposed by Schindler et al., could be extended to generic-emulating DPA (differential power analysis) by on-the-fly profiling. The formal extension was proposed by Whitnall et al. named SLR-based method. Later, to improve SLR-based method, Wang et al. introduced a method based on ridge regression. However, the constant format of L-2 penalty still limits the performance of profiling. In this paper, we generalize the ridge-based method and propose a new strategy of using variable regularization. We then analyze from a theoretical point of view why we should not use constant penalty format for all cases. Roughly speaking, our work reveals the underlying mechanism of how different formats affect the profiling process in the context of side channel. Therefore, by selecting a proper regularization, we could push the limits of LR-based profiling. Finally, we conduct simulation-based and practical experiments to confirm our analysis. Specifically, the results of our practical experiments show that the proper formats of regularization are different among real devices.
    Evaluating and Constraining Hardware Assertions with Absent Scenarios
    Hui-Na Chao, Hua-Wei Li, Xiaoyu Song, Tian-Cheng Wang, Xiao-Wei Li
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1198-1216. 
    Mining from simulation data of the golden model in hardware design verification is an effective solution to assertion generation. While the simulation data is inherently incomplete, it is necessary to evaluate the truth values of the mined assertions. This paper presents an approach to evaluating and constraining hardware assertions with absent scenarios. A Belief-failRate metric is proposed to predict the truth/falseness of generated assertions. By considering both the occurrences of free variable assignments and the conflicts of absent scenarios, we use the metric to sort true assertions in higher ranking and false assertions in lower ranking. Our Belief-failRate guided assertion constraining method leverages the quality of generated assertions. The experimental results show that the Belief-failRate framework performs better than the existing methods. In addition, the assertion evaluating and constraining procedure can find more assertions that cover new design functionality in comparison with the previous methods.
    Item Cold-Start Recommendation with Personalized Feature Selection
    Yi-Fan Chen, Xiang Zhao, Jin-Yuan Liu, Bin Ge, Wei-Ming Zhang
    Journal of Data Acquisition and Processing, 2020, 35 (5): 1217-1230. 
    The problem of recommending new items to users (often referred to as item cold-start recommendation) remains a challenge due to the absence of users' past preferences for these items. Item features from side information are typically leveraged to tackle the problem. Existing methods formulate regression methods, taking item features as input and user ratings as output. These methods are confronted with the issue of overfitting when item features are high-dimensional, which greatly impedes the recommendation experience. Availing of high-dimensional item features, in this work, we opt for feature selection to solve the problem of recommending top-N new items. Existing feature selection methods find a common set of features for all users, which fails to differentiate users' preferences over item features. To personalize feature selection, we propose to select item features discriminately for different users. We study the personalization of feature selection at the level of the user or user group. We fulfill the task by proposing two embedded feature selection models. The process of personalized feature selection filters out the dimensions that are irrelevant to recommendations or unappealing to users. Experimental results on real-life datasets with high-dimensional side information reveal that the proposed method is effective in singling out features that are crucial to top-N recommendation and hence improving performance.
SCImago Journal & Country Rank

ISSN 1004-9037


Editorial Board
Author Guidelines
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
  Copyright ©2015 JCST, All Rights Reserved