Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      15 September 2002, Volume 17 Issue 5   
    For Selected: View Abstracts Toggle Thumbnails
    Articles
    Progress in the Development of National Knowledge Infrastructure
    CAO Cungen (曹存根)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(2609KB) ( 2214 )  
    This paper presents the recent process in a long-term research project, called National Knowledge Infrastructure (or NKI). Initiated in the early 2000, the project aims to develop a multi-domain shareable knowledge base for knowledge-intensive applications. To develop NKI, we have used domain-specific ontologies as a solid basis, and have built more than 600 ontologies. Using these ontologies and our knowledge acquisition methods, we have extracted about 1.1 millions of domain assertions. For users to access our NKI knowledge, we have developed a uniform multi-modal human-knowledge interface. We have also implemented a knowledge application programming interface for various applications to share the NKI knowledge.
    Compressed Data Cube for Approximate OLAP Query Processing
    FENG Yu (冯玉) and WANG Shan (王珊)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(371KB) ( 1815 )  
    Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse. In this paper, we present a novel method that provides approximate answers to OLAP queries. Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly, so it improves the performance of the queries. We also provide the algorithm of the OLAP queries and the confidence intervals of query results. An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling.
    Designing a Top-Level Ontology of Human Beings: A Multi-Perspective Approach
    TIAN Wen (田雯), GU Fang (顾芳) and CAO Cungen (曹存根)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(599KB) ( 1721 )  
    Knowledge about human beings is an integral part of any intelligent agent of considerable significance. Delimiting, modeling and acquiring such knowledge are the central topics of this paper. Because of the tremendous complexity in knowledge of human beings, we introduce a top-level ontology of human beings from the perspectives of psychology, sociology, physiology and pathology. This ontology is not only an explicit conceptualization of human beings, but also an efficient way of acquiring and organizing relevant knowledge.
    Ontology-Based Semantic Cache in AOKB
    ZHENG Hong (郑红), LU Ruqian (陆汝钤), JIN Zhi (金芝) and HU Sikang (胡思康)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(311KB) ( 1528 )  
    When querying on a large-scale knowledge base, a major technique of improving performance is to preload knowledge to minimize the number of roundtrips to the knowledge base. In this paper, an ontology-based semantic cache is proposed for an agent and ontology-oriented knowledge base (AOKB). In AOKB, an ontology is the collection of relationships between a group of knowledge units (agents and/or other sub-ontologies). When loading some agent A, its relationships with other knowledge units are examined, and those who have a tight semantic tie with A will be preloaded at the same time, including agents and sub-ontologies in the same ontology where A is. The preloaded agents and ontologies are saved at a semantic cache located in the memory. Test results show that up to 50% reduction in running time is achieved.
    Formal Ontology: Foundation of Domain Knowledge Sharing and Reusing
    LU Ruqian (陆汝钤) and JIN Zhi (金芝)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(402KB) ( 2095 )  
    Domain analysis is the activity of identifying and representing the relevant information in a domain, so that the information can be shared and reused in similar systems. But until now, no efficient approaches are available for capturing and representing the results of domain analysis and then for sharing and reusing the domain knowledge. This paper proposes an ontology-oriented approach for formalizing the domain models. The architecture for the multiple-layer structure of the domain knowledge base is also discussed. And finally, some genetic algorithm-based methods have been given for supporting the knowledge sharing and reusing.
    Relationship Between Support Vector Set and Kernel Functions in SVM
    ZHANG Ling (张铃) and ZHANG Bo (张钹)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(301KB) ( 1428 )  
    Based on a constructive learning approach, covering algorithms, we investigate the relationship between support vector sets and kernel functions in support vector machines (SVM). An interesting result is obtained. That is, in the linearly non-separable case, any sample of a given sample set K can become a support vector under a certain kernel function. The result shows that when the sample set K is linearly non-separable, although the chosen kernel function satisfies Mercer's condition its corresponding support vector set is not necessarily the subset of K that plays a crucial role in classifying K. For a given sample set, what is the subset that plays the crucial role in classification? In order to explore the problem, a new concept, boundary or boundary points, is defined and its properties are discussed. Given a sample set K, we show that the decision functions for classifying the boundary points of K are the same as that for classifying the K itself. And the boundary points of K only depend on K and the structure of the space at which K is located and independent of the chosen approach for finding the boundary. Therefore, the boundary point set may become the subset of K that plays a crucial role in classification. These results are of importance to understand the principle of the support vector machine (SVM) and to develop new learning algorithms.
    Kernel Projection Algorithm for Large-Scale SVM Problems
    WANG Jiaqi (王家琦), TAO Qing (陶卿) and WANG Jue (王珏)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(372KB) ( 2204 )  
    Support Vector Machine (SVM) has become a very effective method in statistical machine learning and it has proved that training SVM is to solve Nearest Point pair Problem (NPP) between two disjoint closed convex sets. Later Keerthi pointed out that it is difficult to apply classical excellent geometric algorithms directly to SVM and so designed a new geometric algorithm for SVM. In this article, a new algorithm for geometrically solving SVM, Kernel Projection Algorithm, is presented based on the theorem on fixed-points of projection mapping. This new algorithm makes it easy to apply classical geometric algorithms to solving SVM and is more understandable than Keerthi's. Experiments show that the new algorithm can also handle large-scale SVM problems. Geometric algorithms for SVM, such as Keerthi's algorithm, require that two closed convex sets be disjoint and otherwise the algorithms are meaningless. In this article, this requirement will be guaranteed in theory by using the theoretic result on universal kernel functions.
    Toward Effective Knowledge Acquisition with First-Order Logic Induction
    ZHANG Xiaolong (张晓龙) and Masayuki Numao
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(440KB) ( 1307 )  
    Knowledge acquisition with machine learning techniques is a fundamental requirement for knowledge discovery from databases and data mining systems. Two techniques in particular --- inductive learning and theory revision --- have been used toward this end. A method that combines both approaches to effectively acquire theories (regularity) from a set of training examples is presented. Inductive learning is used to acquire new regularity from the training examples; and theory revision is used to improve an initial theory. In addition, a theory preference criterion that is a combination of the MDL-based heuristic and the Laplace estimate has been successfully employed in the selection of the promising theory. The resulting algorithm developed by integrating inductive learning and theory revision and using the criterion has the ability to deal with complex problems, obtaining useful theories in terms of its predictive accuracy.
    A Reduction Algorithm Meeting Users Requirements
    ZHAO Kai (赵凯) and WANG Jue (王珏)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(423KB) ( 2201 )  
    Generally a database encompasses various kinds of knowledge and is shared by many users. Different users may prefer different kinds of knowledge. So it is important for a data mining algorithm to output specific knowledge according to users' current requirements (preference). We call this kind of data mining requirement-oriented knowledge discovery (ROKD). When the rough set theory is used in data mining, the ROKD problem is how to find a reduct and corresponding rules interesting for the user. Since reducts and rules are generated in the same way, this paper only concerns with how to find a particular reduct. The user's requirement is described by an order of attributes, called attribute order, which implies the importance of attributes for the user. In the order, more important attributes are located before less important ones. Then the problem becomes how to find a reduct including those attributes anterior in the attribute order. An approach to dealing with such a problem is proposed. And its completeness for reduct is proved. After that, three kinds of attribute order are developed to describe various user requirements.
    ARMiner: A Data Mining Tool Based on Association Rules
    ZHOU Haofeng (周皓峰), ZHU Jianqiu (朱建秋), ZHU Yangyong (朱扬勇) and SHI Baile (施伯乐)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(1181KB) ( 3783 )  
    In this paper, ARMiner, a data mining tool based on association rules, is introduced. Beginning with the system architecture, the characteristics and functions are discussed in details, including data transfer, concept hierarchy generalization, mining rules with negative items and the re-development of the system. An example of the tool's application is also shown. Finally, some issues for future research are presented.
    A Semi-Structured Document Model for Text Mining
    YANG Jianwu (杨建武) and CHEN Xiaoou (陈晓鸥)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(343KB) ( 2486 )  
    A semi-structured document has more structured information compared to an ordinary document, and the relation among semi-structured documents can be fully utilized. In order to take advantage of the structure and link information in a semi-structured document for better mining, a structured link vector model (SLVM) is presented in this paper, where a vector represents a document, and vectors' elements are determined by terms, document structure and neighboring documents. Text mining based on SLVM is described in the procedure of K-means for briefness and clarity: calculating document similarity and calculating cluster center. The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments, and its F value increases from 0.65--0.73 to 0.82--0.86.
    Squeezer: An Efficient Algorithm for Clustering Categorical Data
    HE Zengyou (何增有), XU Xiaofei (徐晓飞) and DENG Shengchun(邓胜春)
    Journal of Data Acquisition and Processing, 2002, 17 (5): 0-0. 
    Abstract   PDF(412KB) ( 4290 )  
    This paper presents a new efficient algorithm for clustering categorical data, Squeezer, which can produce high quality clustering results and at the same time deserve good scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning t to an existing cluster (initially none), or creating t as a new cluster, which is determined by the similarities between t and clusters. Due to its characteristics, the proposed algorithm is extremely suitable for clustering data streams, where given a sequence of points, the objective is to maintain consistently good clustering of the sequence so far, using a small amount of memory and time. Outliers can also be handled efficiently and directly in Squeezer. Experimental results on real-life and synthetic datasets verify the superiority of Squeezer.
SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved