Journal of Data Acquisition and Processing

Table of Content

05 September 2016, Volume 31 Issue 5

For Selected:

View Abstracts

Download Citations
EndNote Reference Manager ProCite BibTeX RefWorks

Toggle Thumbnails

Special Section on Software Systems 2016

Select

Preface

Tao Xie

Journal of Data Acquisition and Processing, 2016, 31 (5): 849-850.

Abstract

PDF(74KB) ( 705 )

Chinese Summary

Select

Roundtable: Research Opportunities and Challenges for Large-Scale Software Systems

Xusheng Xiao, Jian-Guang Lou, Shan Lu, David C. Shepherd, Xin Peng, Qian-Xiang Wang

Journal of Data Acquisition and Processing, 2016, 31 (5): 851-860.

Abstract

PDF(302KB) ( 1357 )

Chinese Summary

For this special section on software systems, six research leaders in software systems, as guest editors for this special section, discuss important issues that will shape this field's future research directions. The essays included in this roundtable article cover research opportunities and challenges for large-scale software systems such as querying organizationwide software behaviors (Xusheng Xiao), logging and log analysis (Jian-Guang Lou), engineering reliable cloud distributed systems (Shan Lu), usage data (David C. Shepherd), clone detection and management (Xin Peng), and code search and beyond (Qian-Xiang Wang). — Tao Xie, Leading Editor of Software Systems.

Select

Debugging Concurrent Software: Advances and Challenges

Jeff Huang, Charles Zhang

Journal of Data Acquisition and Processing, 2016, 31 (5): 861-868.

Abstract

PDF(408KB) ( 1050 )

Chinese Summary

Concurrency debugging is an extremely important yet challenging problem that has been hampering developer productivity and software reliability in the multicore era. We have worked on this problem in the past eight years and have developed several effective methods and automated tools for helping developers debugging shared memory concurrent programs. This article discusses challenges in concurrency debugging and summarizes our research contributions in four important directions: concurrency bug reproduction, detection, understanding, and fixing. It also discusses other recent advances in tackling these challenges.

Select

Prioritizing Test Cases for Memory Leaks in Android Applications

Ju Qian, Di Zhou

Journal of Data Acquisition and Processing, 2016, 31 (5): 869-882.

Abstract

PDF(1076KB) ( 1346 )

Chinese Summary

Mobile applications usually can only access limited amount of memory. Improper use of the memory can cause memory leaks, which may lead to performance slowdowns or even cause applications to be unexpectedly killed. Although a large body of research has been devoted into the memory leak diagnosing techniques after leaks have been discovered, it is still challenging to find out the memory leak phenomena at first. Testing is the most widely used technique for failure discovery. However, traditional testing techniques are not directed for the discovery of memory leaks. They may spend lots of time on testing unlikely leaking executions and therefore can be inefficient. To address the problem, we propose a novel approach to prioritize test cases according to their likelihood to cause memory leaks in a given test suite. It firstly builds a prediction model to determine whether each test can potentially lead to memory leaks based on machine learning on selected code features. Then, for each input test case, we partly run it to get its code features and predict its likelihood to cause leaks. The most suspicious test cases will be suggested to run at first in order to reveal memory leak faults as soon as possible. Experimental evaluation on several Android applications shows that our approach is effective.

Select

Summarizing Software Artifacts: A Literature Review

Najam Nazar, Yan Hu, He Jiang

Journal of Data Acquisition and Processing, 2016, 31 (5): 883-909.

Abstract

PDF(2126KB) ( 8532 )

Chinese Summary

This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.

Select

What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts

Xin-Li Yang, David Lo, Xin Xia, Zhi-Yuan Wan, Jian-Ling Sun

Journal of Data Acquisition and Processing, 2016, 31 (5): 910-924.

Abstract

PDF(1300KB) ( 2744 )

Chinese Summary

Security has always been a popular and critical topic. With the rapid development of information technology, it is always attracting people's attention. However, since security has a long history, it covers a wide range of topics which change a lot, from classic cryptography to recently popular mobile security. There is a need to investigate security-related topics and trends, which can be a guide for security researchers, security educators and security practitioners. To address the above-mentioned need, in this paper, we conduct a large-scale study on security-related questions on Stack Overflow. Stack Overflow is a popular on-line question and answer site for software developers to communicate, collaborate, and share information with one another. There are many different topics among the numerous questions posted on Stack Overflow and security-related questions occupy a large proportion and have an important and significant position. We first use two heuristics to extract from the dataset the questions that are related to security based on the tags of the posts. And then we use an advanced topic model, Latent Dirichlet Allocation (LDA) tuned using Genetic Algorithm (GA), to cluster different security-related questions based on their texts. After obtaining the different topics of security-related questions, we use their metadata to make various analyses. We summarize all the topics into five main categories, and investigate the popularity and difficulty of different topics as well. Based on the results of our study, we conclude several implications for researchers, educators and practitioners.

Select

Critical Success Factors to Improve the Game Development Process from a Developer's Perspective

Saiqa Aleem, Luiz Fernando Capretz, Faheem Ahmed

Journal of Data Acquisition and Processing, 2016, 31 (5): 925-950.

Abstract

PDF(1005KB) ( 1290 )

Chinese Summary

The growth of the software game development industry is enormous and is gaining importance day by day. This growth imposes severe pressure and a number of issues and challenges on the game development community. Game development is a complex process, and one important game development choice is to consider the developer's perspective to produce good-quality software games by improving the game development process. The objective of this study is to provide a better understanding of the developer's dimension as a factor in software game success. It focuses mainly on an empirical investigation of the effect of key developer's factors on the software game development process and eventually on the quality of the resulting game. A quantitative survey was developed and conducted to identify key developer's factors for an enhanced game development process. For this study, the developed survey was used to test the research model and hypotheses. The results provide evidence that game development organizations must deal with multiple key factors to remain competitive and to handle high pressure in the software game industry. The main contribution of this paper is to investigate empirically the influence of key developer's factors on the game development process.

Select

A Feature Model Based Framework for Refactoring Software Product Line Architecture

Mohammad Tanhaei, Jafar Habibi, Seyed-Hassan Mirian-Hosseinabadi

Journal of Data Acquisition and Processing, 2016, 31 (5): 951-986.

Abstract

PDF(3995KB) ( 1154 )

Chinese Summary

Software product line (SPL) is an approach used to develop a range of software products with a high degree of similarity. In this approach, a feature model is usually used to keep track of similarities and differences. Over time, as modifications are made to the SPL, inconsistencies with the feature model could arise. The first approach to dealing with these inconsistencies is refactoring. Refactoring consists of small steps which, when accumulated, may lead to large-scale changes in the SPL, resulting in features being added to or eliminated from the SPL. In this paper, we propose a framework for refactoring SPLs, which helps keep SPLs consistent with the feature model. After some introductory remarks, we describe a formal model for representing the feature model. We express various refactoring patterns applicable to the feature model and the SPL formally, and then introduce an algorithm for finding them in the SPL. In the end, we use a real-world case study of an SPL to illustrate the applicability of the framework introduced in the paper.

Theory and Algorithms

Select

An Efficient Approach for Solving Optimization over Linear Arithmetic Constraints

Li Chen, Jing-Zheng Wu, Yin-Run Lv, Yong-Ji Wang

Journal of Data Acquisition and Processing, 2016, 31 (5): 987-1011.

Abstract

PDF(1745KB) ( 1036 )

Chinese Summary

Satisfiability Modulo Theories (SMT) have been widely investigated over the last decade. Recently researchers have extended SMT to the optimization problem over linear arithmetic constraints. To the best of our knowledge, Symba and OPT-MathSAT are two most efficient solvers available for this problem. The key algorithms used by Symba and OPT-MathSAT consist of the loop of two procedures: 1) critical finding for detecting a critical point, which is very likely to be globally optimal, and 2) global checking for confirming the critical point is really globally optimal. In this paper, we propose a new approach based on the Simplex method widely used in operation research. Our fundamental idea is to find several critical points by constructing and solving a series of linear problems with the Simplex method. Our approach replaces the algorithms of critical finding in Symba and OPT-MathSAT, and reduces the runtime of critical finding and decreases the number of executions of global checking. The correctness of our approach is proved. The experiment evaluates our implementation against Symba and OPT-MathSAT on a critical class of problems in real-time systems. Our approach outperforms Symba on 99.6% of benchmarks and is superior to OPT-MathSAT in large-scale cases where the number of tasks is more than 24. The experimental results demonstrate that our approach has great potential and competitiveness for the optimization problem.

Select

Secure Channel Free ID-Based Searchable Encryption for Peer-to-Peer Group

Xiao-Fen Wang, Yi Mu, Rongmao Chen, Xiao-Song Zhang

Journal of Data Acquisition and Processing, 2016, 31 (5): 1012-1027.

Abstract

PDF(526KB) ( 1013 )

Chinese Summary

Data sharing and searching are important functionalities in cloud storage. In this paper, we show how to securely and flexibly search and share cloud data among a group of users without a group manager. We formalize a novel cryptosystem: secure channel free searchable encryption in a peer-to-peer group, which features with the secure cloud data sharing and searching for group members in an identity-based setting. Our scheme allows group members to join or leave the group dynamically. We present two schemes: basic scheme and enhanced scheme. We formally prove that our basic scheme achieves consistency and indistinguishability against the chosen keyword and ciphertext attack and the outsider's keyword guessing attack, respectively. An enhanced scheme is also proposed to achieve forward secrecy, which allows to revoke user search right over the former shared data.

Select

Tolerating Permanent State Transition Faults in Asynchronous Sequential Machines

Jung-Min Yang

Journal of Data Acquisition and Processing, 2016, 31 (5): 1028-1037.

Abstract

PDF(266KB) ( 746 )

Chinese Summary

Corrective control theory lays a novel foundation for the fault-tolerant control of asynchronous sequential machines. In this paper, we present a corrective control scheme for tolerating permanent state transition faults in the dynamics of asynchronous sequential machines. By a fault occurrence, the asynchronous machine may be stuck at a faulty state, not responding to the external input. We analyze the detectability of the considered faults and present the necessary and sufficient condition for the existence of a controller that overcomes any permanent transition faults. Fault tolerance is realized by using potential reachability and asynchronous mechanisms in the machine. A case study on an asynchronous counter is provided to illustrate the proposed fault detection and tolerance scheme.

Computer Architecture and Systems

Select

UiLog: Improving Log-Based Fault Diagnosis by Log Analysis

De-Qing Zou, Hao Qin, Hai Jin

Journal of Data Acquisition and Processing, 2016, 31 (5): 1038-1052.

Abstract

PDF(761KB) ( 1089 )

Chinese Summary

In modern computer systems, system event logs have always been the primary source for checking system statuses. As computer systems become more and more complex, the interaction among software and hardware increases frequently. The components will generate enormous log information, including running reports and fault information. The amount of data is a great challenge for analysis relying on the manual method. In this paper, we implement a management and analysis system of log information, which can assist system administrators to understand the real-time status of the entire system, classify logs into different fault types, and determine the root cause of the faults. In addition, we improve the existing fault correlation analysis method based on the results of system log classification. We apply the system in a cloud computing environment for evaluation. The results show that our system can classify fault logs automatically and effectively. With the proposed system, administrators can easily detect the root cause of faults.

Data Management and Data Mining

Select

Topological Features Based Entity Disambiguation

Chen-Chen Sun, De-Rong Shen, Yue Kou, Tie-Zheng Nie, Ge Yu

Journal of Data Acquisition and Processing, 2016, 31 (5): 1053-1068.

Abstract

PDF(638KB) ( 1069 )

Chinese Summary

This work proposes an unsupervised topological features based entity disambiguation solution. Most existing studies leverage semantic information to resolve ambiguous references. However, the semantic information is not always accessible because of privacy or is too expensive to access. We consider the problem in a setting that only relationships between references are available. A structure similarity algorithm via random walk with restarts is proposed to measure the similarity of references. The disambiguation is regarded as a clustering problem and a family of graph walk based clustering algorithms are brought to group ambiguous references. We evaluate our solution extensively on two real datasets and show its advantage over two state-of-the-art approaches in accuracy.