Bimonthly    Since 1986
ISSN 1004-9037
Indexed in:
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
  • Table of Content
      20 November 2020, Volume 35 Issue 6   
    For Selected: View Abstracts Toggle Thumbnails
    Tao Xie, Zhi Jin, Xuan-Dong Li, Gang Huang, Hausi A. Muller, Jun Pang, Li-Jun Zhang
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1231-1233. 
    ProSy: API-Based Synthesis with Probabilistic Model
    Bin-Bin Liu, Wei Dong, Jia-Xin Liu, Ya-Ting Zhang, Dai-Yan Wang
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1234-1257. 
    Program synthesis is an exciting topic that desires to generate programs satisfying user intent automatically. But in most cases, only small programs for simple or domain-specific tasks can be synthesized. The major obstacle of synthesis lies in the huge search space. A common practice in addressing this problem is using a domain-specific language, while many approaches still wish to synthesize programs in general programming languages. With the rapid growth of reusable libraries, component-based synthesis provides a promising way, such as synthesizing Java programs which are only composed of APIs (application programming interfaces). However, the efficiency of searching for proper solutions for complex tasks is still a challenge. Given an unfamiliar programming task, programmers would search for API usage knowledge from various coding resources to reduce the search space. Considering this, we propose a novel approach named ProSy to synthesize API-based programs in Java. The key novelty is to retrieve related knowledge from Javadoc and Stack Overflow and then construct a probabilistic reachability graph. It assigns higher probabilities to APIs that are more likely to be used in implementing the given task. In the synthesis process, the program sketch with a higher probability will be considered first; thus, the number of explored reachable paths would be decreased. Some extension and optimization strategies are further studied in the paper. We implement our approach and conduct several experiments on it. We compare ProSy with SyPet and other state-of-the-art API-based synthesis approaches. The experimental results show that ProSy reduces the synthesis time of SyPet by up to 80%.
    Learning Human-Written Commit Messages to Document Code Changes
    Yuan Huang, Nan Jia, Hao-Jie Zhou, Xiang-Ping Chen, Zi-Bin Zheng, Ming-Dong Tang
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1258-1277. 
    Commit messages are important complementary information used in understanding code changes. To address message scarcity, some work is proposed for automatically generating commit messages. However, most of these approaches focus on generating summary of the changed software entities at the superficial level, without considering the intent behind the code changes (e.g., the existing approaches cannot generate such message: “fixing null pointer exception”). Considering developers often describe the intent behind the code change when writing the messages, we propose ChangeDoc, an approach to reuse existing messages in version control systems for automatical commit message generation. Our approach includes syntax, semantic, pre-syntax, and pre-semantic similarities. For a given commit without messages, it is able to discover its most similar past commit from a large commit repository, and recommend its message as the message of the given commit. Our repository contains half a million commits that were collected from SourceForge. We evaluate our approach on the commits from 10 projects. The results show that 21.5% of the recommended messages by ChangeDoc can be directly used without modification, and 62.8% require minor modifications. In order to evaluate the quality of the commit messages recommended by ChangeDoc, we performed two empirical studies involving a total of 40 participants (10 professional developers and 30 students). The results indicate that the recommended messages are very good approximations of the ones written by developers and often include important intent information that is not included in the messages generated by other tools.
    Automatically Identifying Calling-Prone Higher-Order Functions of Scala Programs to Assist Testers
    Yi-Sen Xu, Xiang-Yang Jia, Fan Wu, Lingbo Li, Ji-Feng Xuan
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1278-1294. 
    For the rapid development of internetware, functional programming languages, such as Haskell and Scala, can be used to implement complex domain-specific applications. In functional programming languages, a higher-order function is a function that takes functions as parameters or returns a function. Using higher-order functions in programs can increase the generality and reduce the redundancy of source code. To test a higher-order function, a tester needs to check the requirements and write another function as the test input. However, due to the complex structure of higher-order functions, testing higher-order functions is a time-consuming and labor-intensive task. Testers have to spend an amount of manual effort in testing all higher-order functions. Such testing is infeasible if the time budget is limited, such as a period before a project release. In practice, not every higher-order function is actually called. We refer to higher-order functions that are about to be called as calling-prone ones. Calling-prone higher-order functions should be tested first. In this paper, we propose an automatic approach, namely Phof, which predicts whether a higher-order function of Scala programs will be called in the future, i.e., identifying calling-prone higher-order functions. Our approach can assist testers to reduce the number of higher-order functions of Scala programs under test. In Phof, we extracted 24 features from source code and logs to train a predictive model based on known higher-order function calls. We empirically evaluated our approach on 4 832 higher-order functions from 27 real-world Scala projects. Experimental results show that Phof based on the random forest algorithm and the Synthetic Minority Oversampling Technique Processing strategy (SMOTE) performs well in the prediction of calls of higher-order functions. Our work can be used to support the scheduling of limited test resources.
    Reachability of Patterned Conditional Pushdown Systems
    Xin Li, Patrick Gardy, Yu-Xin Deng, Hiroyuki Seki
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1295-1311. 
    Conditional pushdown systems (CPDSs) extend pushdown systems by associating each transition rule with a regular language over the stack alphabet. The goal is to model program verification problems that need to examine the runtime call stack of programs. Examples include security property checking of programs with stack inspection, compatibility checking of HTML5 parser specifications, etc. Esparza et al. proved that the reachability problem of CPDSs is EXPTIMEcomplete, which prevents the existence of an algorithm tractable for all instances in general. Driven by the practical applications of CPDSs, we study the reachability of patterned CPDS (pCPDS) that is a practically important subclass of CPDS, in which each transition rule carries a regular expression obeying certain patterns. First, we present new saturation algorithms for solving state and configuration reachability of pCPDSs. The algorithms exhibit the exponential-time complexity in the size of atomic patterns in the worst case. Next, we show that the reachability of pCPDSs carrying simple patterns is solvable in fixed-parameter polynomial time and space. This answers the question on whether there exist tractable reachability analysis algorithms of CPDSs tailored for those practical instances that admit efficient solutions such as stack inspection without exception handling. We have evaluated the proposed approach, and our experiments show that the pattern-driven algorithm steadily scales on pCPDSs with simple patterns.
    Specification and Verification of the Zab Protocol with TLA+
    Jia-Qi Yin, Hui-Biao Zhu, Yuan Fei
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1312-1323. 
    ZooKeeper Atomic Broadcast (Zab) is an atomic broadcast protocol specially designed for ZooKeeper, which supports additional crash recovery. This protocol actually has been widely adopted by famous Internet companies, but there are few studies on the correctness and credibility of the Zab protocol, and thus we utilize formal methods to study the correctness. In this paper, Zab, Paxos and Raft are all analyzed and compared to help better understand the Zab protocol. Then we model the Zab protocol with TLA+ and verify three properties abstracted from the specification by the model checker TLC, including two liveness properties and one safety property. The final experimental results can prove that the design of the protocol conforms to the original requirements. This paper makes up for the analysis of formal methods in the Zab protocol.
    Modelling and Verification of Real-Time Publish and Subscribe Protocol Using UPPAAL and Simulink/Stateflow
    Qian-Qian Lin, Shu-Ling Wang, Bo-Hua Zhan, Bin Gu
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1324-1342. 
    Real-Time Publish and Subscribe (RTPS) protocol is a protocol for implementing message exchange over an unreliable transport in data distribution service (DDS). Formal modelling and verification of the protocol provide stronger guarantees of its correctness and efficiency than testing alone. In this paper, we build formal models for the RTPS protocol using Uppaal and Simulink/Stateflow. Modelling using Simulink/Stateflow allows analyzing the protocol through simulation, as well as generate executable code. Modelling using Uppaal allows us to verify properties of the model stated in TCTL (Timed Computation Tree Logic), as well as estimate its performance using statistical model checking. We further describe a procedure for translation from Stateflow to timed automata, where a subset of major features in Stateflow is supported, and prove the soundness statement that the Stateflow model is a refinement of the translated timed automata model. As a consequence, any property in a certain fragment of TCTL that we have verified for the timed automata model in Uppaal is preserved for the original Stateflow model.
    Jupiter Made Abstract, and Then Refined
    Heng-Feng Wei, Rui-Ze Tang, Yu Huang, Jian Lv
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1343-1364. 
    Collaborative text editing systems allow multiple users to concurrently edit the same document, which can be modeled by a replicated list object. In the literature, there is a family of operational transformation (OT)-based Jupiter protocols for replicated lists, including AJupiter, XJupiter, and CJupiter. They are hard to understand due to the subtle OT technique, and little work has been done on formal verification of complete Jupiter protocols. Worse still, they use quite different data structures. It is unclear about how they are related to each other, and it would be laborious to verify each Jupiter protocol separately. In this work, we make contributions towards a better understanding of Jupiter protocols and the relation among them. We first identify the key OT issue in Jupiter and present a generic solution. We summarize several techniques for carrying out the solution, including the data structures to maintain OT results and to guide OTs. Then, we propose an implementation-independent AbsJupiter protocol. Finally, we establish the (data) refinement relation among these Jupiter protocols (AbsJupiter included). We also formally specify and verify the family of Jupiter protocols and the refinement relation among them using TLA+ (TLA stands for “Temporal Logic of Actions”) and the TLC model checker. To our knowledge, this is the first work to formally specify and verify a family of OT-based Jupiter protocols and the refinement relation among them. It would be helpful to promote a rigorous study of OT-based protocols.
    Verifying ReLU Neural Networks from a Model Checking Perspective
    Wan-Wei Liu, Fu Song, Tang-Hao-Ran Zhang, Ji Wang
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1365-1381. 
    Neural networks, as an important computing model, have a wide application in artificial intelligence (AI) domain. From the perspective of computer science, such a computing model requires a formal description of its behaviors, particularly the relation between input and output. In addition, such specifications ought to be verified automatically. ReLU (rectified linear unit) neural networks are intensively used in practice. In this paper, we present ReLU Temporal Logic (ReTL), whose semantics is defined with respect to ReLU neural networks, which could specify value-related properties about the network. We show that the model checking algorithm for the Σ2Π2 fragment of ReTL, which can express properties such as output reachability, is decidable in EXPSPACE. We have also implemented our algorithm with a prototype tool, and experimental results demonstrate the feasibility of the presented model checking approach.
    Modular Verification of SPARCv8 Code
    Jun-Peng Zha, Xin-Yu Feng, Lei Qiao
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1382-1405. 
    Inline assembly code is common in system software to interact with the underlying hardware platforms. The safety and correctness of the assembly code is crucial to guarantee the safety of the whole system. In this paper, we propose a practical Hoare-style program logic for verifying SPARC (Scalable Processor Architecture) assembly code. The logic supports modular reasoning about the main features of SPARCv8 ISA (instruction set architecture), including delayed control transfers, delayed writes to special registers, and register windows. It also supports relational reasoning for refinement verification. We have applied it to verify that there is a contextual refinement between a context switch routine in SPARCv8 and a switch primitive. The program logic and its soundness proof have been mechanized in Coq.
    Automatic Buffer Overflow Warning Validation
    Feng-Juan Gao, Yu Wang, Lin-Zhang Wang, Zijiang Yang, Xuan-Dong Li
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1406-1427. 
    Static buffer overflow detection techniques tend to report too many false positives fundamentally due to the lack of software execution information. It is very time consuming to manually inspect all the static warnings. In this paper, we propose BovInspector, a framework for automatically validating static buffer overflow warnings and providing suggestions for automatic repair of true buffer overflow warnings for C programs. Given the program source code and the static buffer overflow warnings, BovInspector first performs warning reachability analysis. Then, BovInspector executes the source code symbolically under the guidance of reachable warnings. Each reachable warning is validated and classified by checking whether all the path conditions and the buffer overflow constraints can be satisfied simultaneously. For each validated true warning, BovInspector provides suggestions to automatically repair it with 11 repair strategies. BovInspector is complementary to prior static buffer overflow discovery schemes. Experimental results on real open source programs show that BovInspector can automatically validate on average 60% of total warnings reported by static tools.
    Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics
    Mohammad Y. Mhawish, Manjari Gupta
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1428-1445. 
    Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality.
    Regular Paper
    Neural Explainable Recommender Model Based on Attributes and Reviews
    Yu-Yao Liu, Bo Yang, Hong-Bin Pei, Jing Huang
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1446-1460. 
    Explainable recommendation, which can provide reasonable explanations for recommendations, is increasingly important in many fields. Although traditional embedding-based models can learn many implicit features, resulting in good performance, they cannot provide the reason for their recommendations. Existing explainable recommender methods can be mainly divided into two types. The first type models highlight reviews written by users to provide an explanation. For the second type, attribute information is taken into consideration. These approaches only consider one aspect and do not make the best use of the existing information. In this paper, we propose a novel neural explainable recommender model based on attributes and reviews (NERAR) for recommendation that combines the processing of attribute features and review features. We employ a tree-based model to extract and learn attribute features from auxiliary information, and then we use a time-aware gated recurrent unit (T-GRU) to model user review features and process item review features based on a convolutional neural network (CNN). Extensive experiments on Amazon datasets demonstrate that our model outperforms the state-of-the-art recommendation models in accuracy of recommendations. The presented examples also show that our model can offer more reasonable explanations. Crowd-sourcing based evaluations are conducted to verify our model’s superiority in explainability.
    Topic Modeling Based Warning Prioritization from Change Sets of Software Repository
    Jung-Been Lee, Taek Lee, Hoh Peter In
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1461-1479. 
    Many existing warning prioritization techniques seek to reorder the static analysis warnings such that true positives are provided first. However, excessive amount of time is required therein to investigate and fix prioritized warnings because some are not actually true positives or are irrelevant to the code context and topic. In this paper, we propose a warning prioritization technique that reflects various latent topics from bug-related code blocks. Our main aim is to build a prioritization model that comprises separate warning priorities depending on the topic of the change sets to identify the number of true positive warnings. For the performance evaluation of the proposed model, we employ a performance metric called warning detection rate, widely used in many warning prioritization studies, and compare the proposed model with other competitive techniques. Additionally, the effectiveness of our model is verified via the application of our technique to eight industrial projects of a real global company.


    Dr.G.Indra Navaroj, Mrs.A.Ananthakumari, Mrs.S.N Sheebha
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1531-1540 . . . 
    G .Mallikarjun
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1541-1550 . . . 
    Vijay kumar1, Pratik Ranjan2, Abha kumari3, Raj Anwit4
    Journal of Data Acquisition and Processing, 2020, 35 (6): 1551-1556. . . 
SCImago Journal & Country Rank

ISSN 1004-9037


Editorial Board
Author Guidelines
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
  Copyright ©2015 JCST, All Rights Reserved