Journal of Data Acquisition and Processing

Table of Content

05 July 2009, Volume 24 Issue 4

For Selected:

View Abstracts

Download Citations
EndNote Reference Manager ProCite BibTeX RefWorks

Toggle Thumbnails

Special Section on Selected Papers from WAIM2008

Select

Self-Switching Classification Framework for Titled Documents

Hang Guo, Li-Zhu Zhou, Member, ACM, and Ling Feng, Member, CCF, ACM, IEEE

Journal of Data Acquisition and Processing, 2009, 24 (4): 615-625.

Abstract

PDF(447KB) ( 2235 )

Chinese Summary

Ambiguous words refer to words that have multiple meanings such as apple, window. In text classification they are usually removed by feature reduction methods like Information Gain. Sometimes there are too many ambiguous words in the corpus, which makes throwing away all of them not a viable option, as in the case when classifying documents from the Web. In this paper we look for a method to classify Titled documents with the help of ambiguous words. Titled documents are a kind of documents that have a simple structure containing a title and an excerpt. News, messages, and paper abstracts with titles are examples of titled documents. Instead of introducing another feature reduction method, we describe a framework to make the best use of ambiguous words in the titled documents. The framework improves the performance of a traditional bag-of-words classifier with the help of a bag-of-word-pairs classifier. The framework is implemented using one of the most popular classifiers, Multinomial NaiveBayes (MNB) as an example. The experiments with three real life datasets show that in our framework the MNB model performs much better than traditional MNB classifier and a naive weighted algorithm, which simply puts more weight on words in the title.

Select

Combining Local Scoring and Global Aggregation to Rank Entities for Deep Web Queries

Yue Kou, Student Member, CCF, De-Rong Shen, Senior Member, CCF, Ge Yu, Senior Member, CCF, Member, ACM, IEEE, and Tie-Zheng Nie, Student Member, CCF

Journal of Data Acquisition and Processing, 2009, 24 (4): 626-637.

Abstract

PDF(1250KB) ( 2169 )

Chinese Summary

With the rapid growth of Web databases, it is necessary to extract and integrate large-scale data available in Deep Web automatically. But current Web search engines conduct page-level ranking, which are becoming inadequate for entity-oriented vertical search. In this paper, we present an entity-level ranking mechanism called LG-ERM for Deep Web queries based on local scoring and global aggregation. Unlike traditional approaches, LG-ERM considers more rank influencing factors including the uncertainty of entity extraction, the style information of the entities and the importance of the Web sources, as well as the entity relationship. By combining local scoring and global aggregation in ranking, the query result can be more accurate and effective to meet users' needs. The experiments demonstrate the feasibility and effectiveness of the key techniques of LG-ERM.

Select

Constraints-Aware Scheduling for Transactional Services Composition

An Liu, Hai Liu, Qing Li, Member, CCF,Senior Member, IEEE, Liu-Sheng Huang,Senior Member, CCF,and Ming-Jun Xiao, Member, CCF

Journal of Data Acquisition and Processing, 2009, 24 (4): 638-651.

Abstract

PDF(833KB) ( 2204 )

Chinese Summary

Composite Web services need transactional support to guarantee their consistent and reliable execution. Due to the long running and inter-organizational characteristics of Web services, current approaches for transactional Web services composition adopt compensation mechanism to maintain atomicity. A common assumption is that a compensation operation can be applied at any time with no cost. However, compensation operations are typically associated with temporal and cost constraints, which make compensation mechanism problematic in this new environment. To address this problem, we distinguish two types of scheduling for transactional Web services composition: time aware scheduling and cost aware scheduling. We devise several algorithms for scheduling, which can ensure the atomicity of composite services when compensation operations have temporal constraints, and assist composite services to maintain atomicity with minimum compensation cost when compensation operations have cost constraints. We benchmark our algorithms by simulations and the results show that our algorithm decreases the compensation cost and in turn improves the QoS of transactional services composition.

Select

System |Π: A Native RDF Repository Based on the Hypergraph Representation for RDF Data Model

Gang Wu, Juan-Zi Li, Member, CCF, ACM, Jian-Qiang Hu, and Ke-Hong Wang, Member, CCF

Journal of Data Acquisition and Processing, 2009, 24 (4): 652-664.

Abstract

PDF(773KB) ( 2450 )

Chinese Summary

RDF is the data interchange layer for the Semantic Web. In order to manage the increasing amount of RDF data, an RDF repository should provide not only the necessary scalability and efficiency, but also sufficient inference capabilities. Though existing RDF repositories have made progress towards {these goals}, there is still ample space for improving the overall performance. In this paper, we propose a native RDF repository, System , to pursue a better tradeoff among system scalability, query efficiency, and inference capabilities. System takes a hypergraph representation for RDF as the data model for its persistent storage, which effectively avoids the costs of data model transformation when accessing RDF data. Based on this native storage scheme, a set of efficient semantic query processing techniques are designed. First, several indices are built to accelerate RDF data access including a value index, a labeling scheme for transitive closure computation, and three triple indices. Second, we propose a hybrid inference strategy under the semantics to support inference for OWL-Lite with a relatively low computational complexity. Finally, we extend the SPARQL algebra to explicitly express inference semantics in logical query plan by defining some new algebra operators. In addition, MD5 hash value of URI and schema level cache are introduced as practical implementation techniques. The results of performance evaluation on the LUBM benchmark and a real data set show that System has a better combined metric value than other comparable systems.

Database and Knowledge-Based Systems

Select

Improved Integrity Constraints Checking in Distributed Databases by Exploiting Local Checking

Ali A. Alwan, Hamidah Ibrahim, and Nur Izura Udzir

Journal of Data Acquisition and Processing, 2009, 24 (4): 665-674.

Abstract

PDF(512KB) ( 1974 )

Chinese Summary

Most of the previous studies concerning checking the integrity constraints in distributed database derive simplified forms of the initial integrity constraints with the sufficiency property, since the sufficient test is known to be cheaper than the complete test and its initial integrity constraint as it involves less data to be transferred across the network and can always be evaluated at the target site (single site). Their studies are limited as they depend strictly on the assumption that an update operation will be executed at a site where the relation specified in the update operation is located, which is not always true. Hence, the sufficient test, which is proven to be local test by previous study, is no longer appropriate. This paper proposes an approach to checking integrity constraints in a distributed database by utilizing as much as possible the local information stored at the target site. The proposed approach derives support tests as an alternative to the existing complete and sufficient tests proposed by previous researchers with the intention to increase the number of local checking regardless the location of the submitted update operation. Several analyses have been performed to evaluate the proposed approach, and the results show that support tests can benefit the distributed database, where local constraint checking can be achieved.

Select

Cooperative Answering of Fuzzy Queries

Narjes Hachani, Mohamed Ali Ben Hassine, Hanène Chettaoui, and Habib Ounelli

Journal of Data Acquisition and Processing, 2009, 24 (4): 675-686.

Abstract

PDF(1024KB) ( 2203 )

Chinese Summary

The majority of existing information systems deals with crisp data through crisp database systems. Traditional Database Management Systems (DBMS) have not taken into account imprecision so one can say there is some sort of lack of flexibility. The reason is that queries retrieve only elements which precisely match to the given Boolean query. That is, an element belongs to the result if the query is true for this element; otherwise, no answers are returned to the user. The aim of this paper is to present a cooperative approach to handling empty answers of fuzzy conjunctive queries by referring to the Formal Concept Analysis (FCA) theory and fuzzy logic. We present an architecture which combines FCA and databases. The processing of fuzzy queries allows detecting the minimal reasons of empty answers. We also use concept lattice in order to provide the user with the nearest answers in the case of a query failure.

Select

Prefetching J⁺-Tree: A Cache-Optimized Main Memory Database Index Structure

Hua Luan, Student Member, CCF, Xiao-Yong Du, Member, CCF, and Shan Wang, Senior Member, CCF, Member, ACM

Journal of Data Acquisition and Processing, 2009, 24 (4): 687-707.

Abstract

PDF(762KB) ( 2469 )

Chinese Summary

As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfortunately, the predominant indexes --- B⁺-trees and T-trees --- have been shown to utilize cache poorly, which triggers the development of many cache-conscious indexes, such as CSB⁺-trees and pB⁺-trees. Most of these cache-conscious indexes are variants of conventional B⁺-trees, and have better cache performance than B⁺-trees. In this paper, we develop a novel J⁺-tree index, inspired by the Judy structure which is an associative array data structure, and propose a more cache-optimized index --- Prefetching J⁺-tree (pJ⁺-tree), which applies prefetching to J⁺-tree to accelerate range scan operations. The J⁺-tree stores all the keys in its leaf nodes and keeps the reference values of leaf nodes in a Judy structure, which makes J⁺-tree not only hold the advantages of Judy (such as fast single value search) but also outperform it in other aspects. For example, J⁺-trees can achieve better performance on range queries than Judy. The pJ⁺-tree index exploits prefetching techniques to further improve the cache behavior of J⁺-trees and yields a speedup of 2.0 on range scans. Compared with B⁺-trees, CSB⁺-trees, pB⁺-trees and T-trees, our extensive experimental study shows that pJ⁺-trees can provide better performance on both time (search, scan, update) and space aspects.

Select

Cache-Conscious Data Cube Computation on a Modern Processor

Hua Luan, Student Member, CCF, Xiao-Yong Du, Member, CCF, and Shan Wang, Senior Member, CCF, Member, ACM

Journal of Data Acquisition and Processing, 2009, 24 (4): 708-722.

Abstract

PDF(690KB) ( 2074 )

Chinese Summary

Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm. %To the best of our knowledge, %this paper is the first effort to make cube computation cache conscious.

Select

Optimization Techniques for RFID Complex Event Processing

Hai-Long Liu, Student Member, CCF, Qun Chen, Member, CCF, and Zhan-Huai Li, Senior Member, CCF

Journal of Data Acquisition and Processing, 2009, 24 (4): 723-733.

Abstract

PDF(667KB) ( 2350 )

Chinese Summary

One research crucial to wider adoption of Radio Frequency Identification (RFID) technology is how to efficiently transform sequences of RFID readings into meaningful business events. Contrary to traditional events, RFID readings are usually of high volume and velocity, and have the attributes representing their reading objects, occurrence times and spots. Based on these characteristics and the Non-deterministic Finite Automata (NFA) implementation framework, this paper studies the performance issues of RFID complex event processing and proposes corresponding optimization techniques. Our techniques include: (1) taking advantage of negation events or exclusiveness between events to prune intermediate results, thus reduces memory consumption; (2) with different selectivities of complex events, purposefully reordering the join operations between events to improve overall efficiency, achieve higher stream throughput; (3) utilizing the slot-based or B⁺-tree-based approach to optimizing the processing performance with the time window constraint. We present the analytical results of these techniques and validate their effectiveness through experiments.

Artificial Intelligence

Select

Edge-Aware Level Set Diffusion and Bilateral Filtering Reconstruction for Image Magnification

Hua Huang, Senior Member, CCF, Member, IEEE, Yu Zang, Senior Member, CCF, Member, IEEE, Paul L. Rosin, and Chun Qi, Senior Member, CCF

Journal of Data Acquisition and Processing, 2009, 24 (4): 734-744.

Abstract

PDF(22131KB) ( 2530 )

Chinese Summary

In this paper we propose an image magnification reconstruction method. In recent years many interpolation algorithms have been proposed for image magnification, but all of them have defects to some degree, such as jaggies and blurring. To solve these problems, we propose applying post-processing which consists of edge-aware level set diffusion and bilateral filtering. After the initial interpolation, the contours of the image are identified. Next, edge-aware level set diffusion is applied to these significant contours to remove the jaggies, followed by bilateral filtering at the same locations to reduce the blurring created by the initial interpolation and level set diffusion. These processes produce sharp contours without jaggies and preserve the details of the image. Results show that the overall RMS error of our method barely increases while the contour smoothness and sharpness are substantially improved.

Select

Facial Expression Recognition of Various Internal States via Manifold Learning

Young-Suk Shin, Member, IEEE

Journal of Data Acquisition and Processing, 2009, 24 (4): 745-752.

Abstract

PDF(3306KB) ( 2543 )

Chinese Summary

Emotions are becoming increasingly important in human-centered interaction architectures. Recognition of facial expressions, which are central to human-computer interactions, seems natural and desirable. However, facial expressions include mixed emotions, continuous rather than discrete, which vary from moment to moment. This paper represents a novel method of recognizing facial expressions of various internal states via manifold learning, to achieve the aim of human-centered interaction studies. A critical review of widely used emotion models is described, then, facial expression features of various internal states via the locally linear embedding (LLE) are extracted. The recognition of facial expressions is created with the pleasure-displeasure and arousal-sleep dimensions in a two-dimensional model of emotion. The recognition result of various internal state expressions that mapped to the embedding space via the LLE algorithm can effectively represent the structural nature of the two-dimensional model of emotion. Therefore our research has established that the relationship between facial expressions of various internal states can be elaborated in the two-dimensional model of emotion, via the locally linear embedding algorithm.

Select

A Logic-Program-Based Negotiation Mechanism

Wu Chen, Ming-Yi Zhang, and Mao-Nian Wu

Journal of Data Acquisition and Processing, 2009, 24 (4): 753-760.

Abstract

PDF(410KB) ( 1994 )

Chinese Summary

This paper presents a logic-program-based mechanism of negotiation between two agents. In this mechanism an extended logic program (ELP) is regarded as an agent. The negotiation process between two agents is then modelled as multiple encounters between two ELPs, each of which selects an answer set as its initial demand. Both agents mutually revise the original sets of demands through accepting part of the opponent's demand and/or giving up part of its own demand. The overall dynamics can be regarded as mutual updates between two extended logic programs. A deal to achieve an appropriate negotiation solution is put forward. The conditions of existence and terminability of an appropriate negotiation are given. Properties of a negotiation solution are discussed, including its weak Pareto optimality.

Computer Network and Internet

Select

Survivability Evaluation in Large-Scale Mobile Ad-Hoc Networks

San-Cheng Peng, Student Member, CCF, Wei-Jia Jia, Member, ACM, Senior Member, IEEE, and Guo-Jun Wang, Senior Member, CCF

Journal of Data Acquisition and Processing, 2009, 24 (4): 761-774.

Abstract

PDF(525KB) ( 1772 )

Chinese Summary

Survivability refers to the ability of a network system to fulfill critical services in a timely manner to end users in the presence of failures and/or attacks. In order to establish a highly survivable system, it is necessary to measure its survivability to evaluate the performance of the system's services under adverse conditions. According to survivability requirements of large-scale mobile ad-hoc networks (MANETs), we propose a novel model for quantitative evaluation on survivability. The proposed model considers various types of faults and connection states of mobile hosts, and uses the continuous time Markov chain (CTMC) to describe the survivability of MANETs in a precise manner. We introduce the reliability theory to perform quantitative analysis and survivability evaluation of segment-by-segment routing (SSR), multipath-based segment-by-segment routing (MP-SSR), and segment-by-segment-based multipath routing (SS-MPR) in large-scale MANETs. The proposed model can be used to analyze the network performance much more easily than a simulation-based approach. Numerical validation shows that the proposed model can be used to obtain a better evaluation result on the survivability of large-scale MANETs.

Select

Cost-Sensitive and Load-Balancing Gateway Placement in Wireless Mesh Networks with QoS Constraints

Feng Zeng, Student Member, CCF, and Zhi-Gang Chen, Member, CCF

Journal of Data Acquisition and Processing, 2009, 24 (4): 775-785.

Abstract

PDF(694KB) ( 2441 )

Chinese Summary

In wireless mesh networks (WMNs), gateway placement is the key to network performance, QoS and construction cost. This paper focuses on the optimization of the cost and load balance in the gateway placement strategy, ensuring the QoS requirements. Firstly, we define a metric for load balance on the gateways, and address the minimum cost and load balancing gateway placement problem. Secondly, we propose two algorithms for gateway placement. One is a heuristic algorithm, which is sensitive to the cost, selects the gateway candidates according to the capacity/cost ratio of the nodes, and optimizes the load balance on the gateways through scanning and shifting methods. The other is a genetic algorithm, which can find the global optimal solution. The two algorithms differ in their computing complexity and the quality of the generated solutions, and thus provide a trade-off for WMN design. At last, simulation is done, and experimental results show that the two algorithms outperform the others. Compared with OPEN/CLOSE, the average cost of gateway placement generated by our algorithms is decreased by 8%~32%, and the load variance on the gateways decreased by 77%~86%. For the genetic algorithm, the performance improvement is at the price of the increase of the CPU execution time.

Algorithm and Complexity

Select

Combination of Qualitative Information with 2-Tuple Linguistic Representation in DSmT

Xin-De Li, Member, IEEE, Florentin Smarandache, Jean Dezert, and Xian-Zhong Dai

Journal of Data Acquisition and Processing, 2009, 24 (4): 786-797.

Abstract

PDF(538KB) ( 2033 )

Chinese Summary

Modern systems for information retrieval, fusion and management need to deal more and more with information coming from human experts usually expressed qualitatively in natural language with linguistic labels. In this paper, we propose and use two new 2-Tuple linguistic representation models (i.e., a distribution function model (DFM) and an improved Herrera-Martínez's model) jointly with the fusion rules developed in Dezert-Smarandache Theory (DSmT), in order to combine efficiently qualitative information expressed in term of qualitative belief functions. The two models both preserve the precision and improve the efficiency of the fusion of linguistic information expressing the global expert's opinion. However, DFM is more general and efficient than the latter, especially for unbalanced linguistic labels. Some simple examples are also provided to show how the 2-Tuple qualitative fusion rules are performed and their advantages.

Select

A New Fuzzy Set Theory Satisfying All Classical Set Formulas

Qing-Shi Gao, Xiao-Yu Gao, and Yue Hu

Journal of Data Acquisition and Processing, 2009, 24 (4): 798-804.

Abstract

PDF(425KB) ( 4589 )

Chinese Summary

A new fuzzy set theory, C-fuzzy set theory, is introduced in this paper. It is a particular case of the classical set theory and satisfies all formulas of the classical set theory. To add a limitation to C-fuzzy set system, in which all fuzzy sets must be "non-uniform inclusive" to each other, then it forms a family of sub-systems, the Z-fuzzy set family. It can be proved that the Z₀-fuzzy set system, one of Z-fuzzy set systems, is equivalent to Zadeh's fuzzy set system. Analysis shows that 1) Zadeh's fuzzy set system defines the relations A=B and between two fuzzy sets A and B as "f" and ``'' respectively is inappropriate, because it makes all fuzzy sets be "non-uniformly inclusive"; 2) it is also inappropriate to define two fuzzy sets' union and intersection operations as the max and min of their grades of membership, because this prevents fuzzy set's ability to correctly reflect different kinds of fuzzy phenomenon in the natural world. Then it has to work around the problem by invent unnatural functions that are hard to understand, such as augmenting max and min for union and intersection to min{a+b, 1} and max{a+b-1, 0}, but these functions are incorrect on inclusive case. If both pairs of definitions are used together, not only are they unnatural, but also they are still unable to cover all possible set relationships in the natural world; and 3) it is incorrect to define the set complement as , because it can be proved that set complement cannot exist in Zadeh's fuzzy set, and it causes confusion in logic and thinking. And it is seriously mistaken to believe that logics of fuzzy sets necessarily go against classical and normal thinking, logic, and conception. The C-fuzzy set theory proposed in this paper overcomes all of the above errors and shortcomings, and more reasonably reflects fuzzy phenomenon in the natural world. It satisfies all relations, formulas, and operations of the classical set theory. It is consistent with normal, natural, and classical thinking, logic, and concepts.