Bimonthly    Since 1986
ISSN 1004-9037
Indexed in:
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
  • Table of Content
      15 January 2004, Volume 19 Issue 1   
    For Selected: View Abstracts Toggle Thumbnails
    An Algorithm for Finding Conserved Secondary Structure Motifs in Unaligned RNA Sequences
    Giulio Pavesi,Giancarlo Mauri,and Graziano Pesole
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(369KB) ( 1541 )  
    Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in the secondary structure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions of ferritin mRNA, and the domain IV stem-loop structure in SRP RNA.
    Outlier Analysis for Gene Expression Data
    Chao Yan, Guo-Liang Chen, and Yi-Fei Shen
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(352KB) ( 3268 )  
    The rapid developments of technologies that generate arrays of gene data enable a global view of the transcription levels of hundreds of thousands of genes simultaneously. The outlier detection problem for gene data has its importance but together with the difficulty of high dimensionality. The sparsity of data in high-dimensional space makes each point a relatively good outlier in the view of traditional distance-based definitions. Thus, finding outliers in high dimensional data is more complex. In this paper, some basic outlier analysis algorithms are discussed and a new genetic algorithm is presented. This algorithm is to find best dimension projections based on a revised cell-based algorithm and to give explanations to solutions. It can solve the outlier detection problem for gene expression data and for other high dimensional data as well.
    Verbumculus and the Discovery of Unusual Words
    Alberto Apostolico, Fang-Cheng Gong, and StefanoLonardi
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(564KB) ( 1704 )  
    Measures relating word frequencies and expectations have been constantly of interest in Bioinformatics studies. With sequence data becoming massively available, exhaustive enumeration of such measures have become conceivable, and yet pose significant computational burden even when limited to words of bounded maximum length. In addition, the display of the huge tables possibly resulting from these counts poses practical problems of visualization and inference. Verbumculus is a suite of software tools for the efficient and fast detection of over- or under-represented words in nucleotide sequences. The inner core of Verbumculus rests on subtly interwoven properties of statistics, pattern matching and combinatorics on words, that enable one to limit drastically and a priori the set of over- or under-represented candidate words of all lengths in a given sequence, thereby rendering it more feasible both to detect and visualize such words in a fast and practically useful way. This paper is devoted to the description of the facility at the outset and to report experimental results, ranging from simulations on synthetic data to the discovery of regulatory elements on the upstream regions of a set of genes of the yeast. The software Verbumculus is accessible at http://www.cs.ucr.edu/\verb!~!stelo/Verbumculus/ or http://wwwdbl. dei.unipd.it/Verbumculus/
    The Complexity of Checking Consistency of Pedigree Information and Related Problems
    Luca Aceto, Jens A. Hansen, Anna Ingolfsdottir, Jacob Johnsen and John Knudsen
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(503KB) ( 1698 )  
    Consistency checking is a fundamental computational problem in genetics. Given a pedigree and information on the genotypes (of some) of the individuals in it, the aim of consistency checking is to determine whether these data are consistent with the classic Mendelian laws of inheritance. This problem arose originally from the geneticists' need to filter their input data from erroneous information, and is well motivated from both a biological and a sociological viewpoint. This paper shows that consistency checking is NP-complete, even with focus on a single gene and in the presence of three alleles. Several other results on the computational complexity of problems from genetics that are related to consistency checking are also offered. In particular, it is shown that checking the consistency of pedigrees over two alleles, and of pedigrees without loops, can be done in polynomial time.
    Encoding of Primary Structures of Biological Macromolecules Within a Data Mining Perspective
    Mondher Maddouri and Mourad Elloumi
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(420KB) ( 1576 )  
    An encoding method has a direct effect on the quality and the representation of the discovered knowledge in data mining systems. Biological macromolecules are encoded by strings of characters, called primary structures. Knowing that data mining systems usually use relational tables to encode data, we have then to re-encode these strings and transform them into relational tables.In this paper, we do a comparative study of the existing static encoding methods, that are based on the Biologist know-how, and our new dynamic encoding one, that is based on the construction of Discriminant and Minimal Substrings (DMS). Different classification methods are used to do this study. The experimental results show that our dynamic encoding method is more efficient than the static ones, to encode biological macromolecules within a data mining perspective.
    Membrane Automata with Priorities
    Ludvek Cienciala and Lucie Ciencialova
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(329KB) ( 1390 )  
    In this paper the one-way P automata with priorities are introduced. Such automata are P systems where the membranes are only allowed to consume objects from parent membranes, under the given conditions. The result of computation of these systems is the set of multiset sequences consumed by skin membrane into the system. The rules associated in some order with each membrane cannot modify any objects, they can only move them through membrane. We show that P automata with priorities and two membranes can accept every recursively enumerated language.
    Trends in Computing with DNA
    Natasa Jonoska
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(446KB) ( 1402 )  
    As an emerging new research area, DNA computation, or more generally biomolecular computation, extends into other fields such as nanotechnology and material design, and is developing into a new sub-discipline of science and engineering. This paper provides a brief survey of some concepts and developments in this area. In particular several approaches are described for biomolecular solutions of the satisfiability problem (using bit strands, DNA tiles and graph self-assembly).Theoretical models such as the primer splicing systems as well as the recent model of forbidding and enforcing are also described. We review some experimental results of self-assembly of DNA nanostructures and nanomechanical devices as well as the design of an autonomous finite state machine.
    Integer Programming Models for Computational Biology Problems
    Giuseppe Lancia
    Journal of Data Acquisition and Processing, 2004, 19 (1): 0-0. 
    Abstract   PDF(483KB) ( 1964 )  
    The recent years have seen an impressive increase in the use of Integer Programming models for the solution of optimization problems originating in Molecular Biology. In this survey, some of the most successful Integer Programming approaches are described, while a broad overview of application areas being is given in modern Computational Molecular Biology.
SCImago Journal & Country Rank

ISSN 1004-9037


Editorial Board
Author Guidelines
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
  Copyright ©2015 JCST, All Rights Reserved