Bimonthly    Since 1986
ISSN 1004-9037
Indexed in:
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
  • Table of Content
      05 January 2010, Volume 25 Issue 1   
    For Selected: View Abstracts Toggle Thumbnails
    Special Issue on Computational Challenges from Modern Molecular Biology
    Ying Xu, Ming Li, and Tao Jiang
    Journal of Data Acquisition and Processing, 2010, 25 (1): 1-2. 
    Abstract   PDF(94KB) ( 2412 )  

    Bioinformatics is considered as one of the fastest growing fields in science today, thanks to the rapidly expanding and advancing capabilities in biological data collection from cellular organisms using high-throughput measurement technologies. These data reflect different aspects of living organisms such as the existence, structure, functionality and functional states of biological molecules and assemblies under designed experimental conditions. The enormous amount of information hidden in these data allows computational scientists to start to elucidate the internal structures and control mechanisms of biological systems at various levels such as cell, tissue, organ, organism and eco-system in a systematic manner, and even possibly to derive the organizational and operating principles of such systems. Scientists have begun to draw comparisons between the relationship of physics and mathematics and that of biology and computational science, and believe that the future of biology could be taught, like physics, ``as a set of basic systems $cs$ duplicated and adapted to a very wide range of cellular and organismal functions, following basic evolutionary principles constrained by the Earth's geological history'' (T. F. Smith, The challenges facing genomic informatics. Current Topics in Computational Molecular Biology, T. Jiang, Y. Xu and M. Q. Zhang (eds.), pp.3-8, MIT Press, Cambridge, Massachusetts (2002).). It is clearly exciting to possibly play a role in helping to transform biological science from a pure experimental science to a science like physics. Yet the gap between where we are now and where we want to be is enormously large! It is generally believed that computational scientists can and should play essential roles in bridging this gap by offering new techniques, frameworks and possibly theories for solving a variety of computational challenges arising in modern biology.
    In this special issue, we have invited 11 teams of leading researchers working at the forefront of bioinformatics and computational biology, plus three additional articles contributed by the three editors, to share their visions about the computational challenges that we are facing in different areas of modern biology. It is not our intention to provide a comprehensive coverage of computational challenges across all areas of bioinformatics and computational biology; instead, we hope to provide some samples of the challenging issues with the 14 articles. Our ultimate goal is to attract more computational scientists to look at and to study these computational problems and beyond to help to shape the future of a new branch of computational science in which biological problems, instead of the traditional physics-oriented/-inspired problems, will be the focus.
    The 14 articles cover the following areas of bioinformatics and computational biology:
    1. new generation sequencing techniques and computational challenges arising from the associated problems such as genome assembly (Schwartz and Waterman);
    2. evolutionary studies of gene orders in ancestral genomes using a phylogenomic approach (Sankoff {it et al.});
    3. elucidation and understanding of epigenomic modifications at a genome scale, discussed from two different perspectives by Zhang & Smith and by Liang, respectively;
    4. orthologous gene mapping across multiple genomes, a fundamental technique in comparative genomics (Jiang);
    5. elucidation of microbial genomic structures and associated computational challenges (Xu);
    6. computational challenges from the emerging field of metagenomics (Wooley and Ye);
    7. domain-based functional studies of proteins and associated computational challenges (Rendon {it et al.});
    8. opportunities and challenges in developing a new generation of accurate prediction techniques for protein tertiary structures (Li);
    9. challenges from large-scale proteomic studies through mass spectrometry data analyses (Ma);
    10. computational challenges in elucidation of protein interactomic data (Wong and Liu);
    11. challenges in reconstruction of gene networks based on time-course microarray gene expression data (Yamaguchi, Imoto and Miyano);
    12. study of dynamics of complex biochemical systems using chemical master equations and associated challenges (Liang and Qian); and
    13. computational challenges from biological text mining, a technique that will play increasingly more important roles as scientists start to mine the published literature in a similar way to that people have been mining organized data in databases (Dai {it et al.}).
    A large number of open questions and challenges in these areas are discussed. It is our hope that these articles will get many computational scientists excited about the expanding field of bioinformatics, and decide to study some of the open problems discussed in this special issue.
    We would like to take this opportunity to thank all the authors who have taken time from their busy schedules to write for this special issue. We would also like to thank the editorial office of JCST, particularly Ms. Xiaoxian Wan, for encouraging us to edit this special issue. Throughout this project, Ms. Joan Yantko and Dr. Fenglou Mao of the Computational Systems Biology Lab at the University of Georgia, have both provided timely help in coordinating with the authors and setting up an internal website to facilitate communication between the editors and the authors as well as among the authors. We thank their help.

    New Generations: Sequencing Machines and Their Computational Challenges
    David C. Schwartz and Michael S. Waterman
    Journal of Data Acquisition and Processing, 2010, 25 (1): 3-9. 
    Abstract   PDF(212KB) ( 2664 )  

    New generation sequencing systems are changing how molecular biology is practiced. The widely promoted $1000 genome will be a reality with attendant changes for healthcare, including personalized medicine. More broadly the genomes of many new organisms with large samplings from populations will be commonplace. What is less appreciated is the explosive demands on computation, both for CPU cycles and storage as well as the need for new computational methods. In this article we will survey some of these developments and demands.

    Issues in the Reconstruction of Gene Order Evolution
    David Sankoff, Chunfang Zheng, Adriana Muñoz, Zhenyu Yang, Zaky Adam, Robert Warren, Vicky Choi, and Qian Zhu
    Journal of Data Acquisition and Processing, 2010, 25 (1): 10-25. 
    Abstract   PDF(661KB) ( 2173 )  

    As genomes evolve over hundreds of millions years, the chromosomes become rearranged, with segments of some chromosomes inverted, while other chromosomes reciprocally exchange chunks from their ends. These rearrangements lead to the scrambling of the elements of one genome with respect to another descended from a common ancestor. Multidisciplinary work undertakes to mathematically model these processes and to develop statistical analyses and mathematical algorithms to understand the scrambling in the chromosomes of two or more related genomes. A major focus is the reconstruction of the gene order of the ancestral genomes.

    Challenges in Understanding Genome-Wide DNA Methylation
    Michael Q. Zhang and Andrew D. Smith, Member, ACM
    Journal of Data Acquisition and Processing, 2010, 25 (1): 26-34. 
    Abstract   PDF(286KB) ( 2554 )  

    DNA methylation is a chemical modification of the bases in genomes. This modification, most frequently found at CpG dinucleotides in eukaryotes, has been identified as having multiple critical functions in broad and diverse species of animals and plants, while mysteriously appears to be lacking from several other well-studied species. DNA methylation has well known and important roles in genome stability and defense, its pattern change highly correlates with gene regulation. Much evidence has linked abnormal DNA methylation to human diseases. Most prominently, aberrant DNA methylation is a common feature of cancer genomes. Elucidating the precise functions of DNA methylation therefore has great biomedical significance. Here we provide an update on large-scale experimental technologies for detecting DNA methylation on a genomic scale. We also discuss new prospect and challenges that computational biologist will face when analyzing DNA methylation data.

    Genome-Wide Analysis of Epigenetic Modifications
    Shoudan Liang
    Journal of Data Acquisition and Processing, 2010, 25 (1): 35-41. 
    Abstract   PDF(147KB) ( 2641 )  

    In plants and animals, gene expression can be altered by changes that do not alter the sequence of nucleotides in DNA but rather modify the chemical structure of either the DNA or the histones that interact with the DNA. These so-called epigenetic modifications are not transient, but persist through cell divisions. Rapidly advancing technologies, such as next-generation DNA sequencing, have dramatically increased our ability to survey epigenetic markers throughout an entire genome. These techniques are revealing in great detail that the many forms and stages of cancer are characterized by a massive number of epigenetic changes. Interpreting such epigenetic marks in cell differentiation and in carcinogenesis is computationally challenging. We review several examples of epigenetic data analysis and discuss the need for computational methods that will enable us to learn from the data the relationships between different kinds of histone modifications and DNA methylation.

    Some Algorithmic Challenges in Genome-Wide Ortholog Assignment
    Tao Jiang, Fellow, ACM
    Journal of Data Acquisition and Processing, 2010, 25 (1): 42-52. 
    Abstract   PDF(355KB) ( 2198 )  

    Genome-scale assignment of orthologous genes is a fundamental and challenging problem in computational biology and has a wide range of applications in comparative genomics, functional genomics, and systems biology. Many methods based on sequence similarity, phylogenetic analysis, chromosomal syntenic information, and genome rearrangement have been proposed in recent years for ortholog assignment. Although these methods produce results that largely agree with each other, their results may still contain significant differences. In this article, we consider the recently proposed parsimony approach for assigning orthologs between closely related genomes based on genome rearrangement, which essentially attempts to transform one genome into another by the smallest number of genome rearrangement events including reversal, translocation, fusion, and fission, as well as gene duplication events. We will highlight some of the challenging algorithmic problems that arise in the approach including (i) minimum common substring partition, (ii) signed reversal distance with duplicates, and (iii) signed transposition distance with duplicates. The most recent progress towards the solution of these problems will be reviewed and some open questions will be posed. We will also discuss some possible extensions of the approach to the simultaneous comparison of multiple genomes.

    Computational Challenges in Deciphering Genomic Structures of Bacteria
    Ying Xu
    Journal of Data Acquisition and Processing, 2010, 25 (1): 53-70. 
    Abstract   PDF(648KB) ( 2495 )  

    This article addresses how the functionalities of the cellular machinery of a bacterium might have constrained the genomic arrangement of its genes during evolution and how we can study such problems using computational approaches, taking full advantage of the rapidly increasing pool of the sequenced bacterial genomes, potentially leading to a much improved understanding of why a bacterial genome is organized in the way it is. This article discusses a number of challenging computational problems in elucidating the genomic structures at multiple levels and the information that is encoded through these genomic structures, gearing towards the ultimate understanding of the governing rules of bacterial genome organization.

    Metagenomics: Facts and Artifacts, and Computational Challenges
    John C. Wooley and Yuzhen Ye
    Journal of Data Acquisition and Processing, 2010, 25 (1): 71-81. 
    Abstract   PDF(227KB) ( 5227 )  

    Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.

    Understanding the "Horizontal Dimension'' of Molecular Evolution to Annotate, Classify, and Discover Proteins with Functional Domains
    Gloria Rendon, Mao-Feng Ger, Ruth Kantorovitz, Shreedhar Natarajan, Jeffrey Tilson, and Eric Jakobsson, Fellow, APS
    Journal of Data Acquisition and Processing, 2010, 25 (1): 82-94. 
    Abstract   PDF(319KB) ( 2406 )  

    Protein evolution proceeds by two distinct processes: 1) individual mutation and selection for adaptive mutations and 2) rearrangement of entire domains within proteins into novel combinations, producing new protein families that combine functional properties in ways that previously did not exist. Domain rearrangement poses a challenge to sequence alignment-based search methods, such as BLAST, in predicting homology since the methodology implicitly assumes that related proteins primarily differ from each other by individual mutations. Moreover, there is ample evidence that the evolutionary process has used (and continues to use) domains as building blocks, therefore, it seems fit to utilize computational, domain-based methods to reconstruct that process. A challenge and opportunity for computational biology is how to use knowledge of evolutionary domain recombination to characterize families of proteins whose evolutionary history includes such recombination, to discover novel proteins, and to infer protein-protein interactions. In this paper we review techniques and databases that exploit our growing knowledge of ``horizontal'' protein evolution, and suggest possible areas of future development. We illustrate the power of the domain-based methods and the possible directions of future development by a case history in progress aiming at facilitating a particular approach to understanding microbial pathogenicity.

    Can We Determine a Protein Structure Quickly?
    Ming Li, Fellow, ACM, IEEE, Royal Society of Canada
    Journal of Data Acquisition and Processing, 2010, 25 (1): 95-106. 
    Abstract   PDF(404KB) ( 2157 )  

    Can we determine a high resolution protein structure quickly, say, in a week? I will show this is possible by the current technologies together with new computational tools discussed in this article. We have three potential paths to explore:





  • X-ray crystallography. While this method has produced the most protein structures in the PDB (Protein Data Bank), the nasty trial-and-error crystallization step remains to be an inhibitive obstacle.
  • NMR (Nuclear Magnetic Resonance) spectroscopy. While the NMR experiments are relatively easy to do, the interpretation of the NMR data for structure calculation takes several months on average.
  • In silico protein structure prediction. Can we actually predict high resolution structures consistently? If the predicted models remain to be labeled as ``predicted'', and these structures still need to be experimentally verified by the wet lab methods, then this method at best can serve only as a screening tool.
    I investigate the question of ``quick protein structure determination'' from a computer scientist point of view and actually answer the more relevant question ``what can a computer scientist effectively contribute to this goal''.






  • Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics
    Bin Ma
    Journal of Data Acquisition and Processing, 2010, 25 (1): 107-123. 
    Abstract   PDF(386KB) ( 2710 )  

    Mass spectrometry is an analytical technique for determining the composition of a sample. Recently it has become a primary tool for protein identification and quantification, and post translational modification characterization in proteomics research. Both the size and the complexity of the data produced by this experimental technique impose great computational challenges in the data analysis. This article reviews some of these challenges and serves as an entry point for those who want to study the area in general.

    Protein Interactome Analysis for Countering Pathogen Drug Resistance
    Limsoon Wong and Guimei Liu
    Journal of Data Acquisition and Processing, 2010, 25 (1): 124-130. 
    Abstract   PDF(304KB) ( 2274 )  

    Drug-resistant varieties of pathogens are now a recognized global threat. Insights into the routes for drug resistance in these pathogens are critical for developing more effective antibacterial drugs. A systems-level analysis of the genes, proteins, and interactions involved is an important step to gaining such insights. This paper discusses some of the computational challenges that must be surmounted to enable such an analysis; viz., unreliability of bacterial interactome maps, paucity of bacterial interactome maps, and identification of pathways to bacterial drug resistance.

    Network-Based Predictions and Simulations by Biological State Space Models: Search for Drug Mode of Action
    Rui Yamaguchi, Seiya Imoto, and Satoru Miyano
    Journal of Data Acquisition and Processing, 2010, 25 (1): 131-153. 
    Abstract   PDF(1174KB) ( 2496 )  

    Since time-course microarray data are short but contain a large number of genes, most of statistical models should be extended so that they can handle such statistically irregular situations. We introduce biological state space models that are established as suitable computational models for constructing gene networks from microarray gene expression data. This chapter elucidates theory and methodology of our biological state space models together with some representative analyses including discovery of drug mode of action. Through the applications we show the whole strategy of biological state space model analysis involving experimental design of time-course data, model building and analysis of the estimated networks.

    Computational Cellular Dynamics Based on the Chemical Master Equation: A Challenge for Understanding Complexity
    Jie Liang and Hong Qian
    Journal of Data Acquisition and Processing, 2010, 25 (1): 154-168. 
    Abstract   PDF(512KB) ( 2087 )  

    Modern molecular biology has always been a great source of inspiration for computational science. Half a century ago, the challenge from understanding macromolecular dynamics has led the way for computations to be part of the tool set to study molecular biology. Twenty-five years ago, the demand from genome science has inspired an entire generation of computer scientists with an interest in discrete mathematics to join the field that is now called bioinformatics. In this paper, we shall lay out a new mathematical theory for dynamics of biochemical reaction systems in a small volume (i.e., mesoscopic) in terms of a stochastic, discrete-state continuous-time formulation, called the chemical master equation (CME). Similar to the wavefunction in quantum mechanics, the dynamically changing probability landscape associated with the state space provides a fundamental characterization of the biochemical reaction system. The stochastic trajectories of the dynamics are best known through the simulations using the Gillespie algorithm. In contrast to the Metropolis algorithm, this Monte Carlo sampling technique does not follow a process with detailed balance. We shall show several examples how CMEs are used to model cellular biochemical systems. We shall also illustrate the computational challenges involved: multiscale phenomena, the interplay between stochasticity and nonlinearity, and how macroscopic determinism arises from mesoscopic dynamics. We point out recent advances in computing solutions to the CME, including exact solution of the steady state landscape and stochastic differential equations that offer alternatives to the Gilespie algorithm. We argue that the CME is an ideal system from which one can learn to understand "complex behavior'' and complexity theory, and from which important biological insight can be gained.

    New Challenges for Biological Text-Mining in the Next Decade
    Hong-Jie Dai, Yen-Ching Chang, Richard Tzong-Han Tsai, and Wen-Lian Hsu, Fellow, IEEE
    Journal of Data Acquisition and Processing, 2010, 25 (1): 169-inside back cover. 
    Abstract   PDF(367KB) ( 3585 )  

    The massive flow of scholarly publications from traditional paper journals to online outlets has benefited biologists because of its ease to access. However, due to the sheer volume of available biological literature, researchers are finding it increasingly difficult to locate needed information. As a result, recent biology contests, notably JNLPBA and BioCreAtIvE, have focused on evaluating various methods in which the literature may be navigated. Among these methods, text-mining technology has shown the most promise. With recent advances in text-mining technology and the fact that publishers are now making the full texts of articles available in XML format, TMSs can be adapted to accelerate literature curation, maintain the integrity of information, and ensure proper linkage of data to other resources. Even so, several new challenges have emerged in relation to full text analysis, life-science terminology, complex relation extraction, and information fusion. These challenges must be overcome in order for text-mining to be more effective. In this paper, we identify the challenges, discuss how they might be overcome, and consider the resources that may be helpful in achieving that goal.

SCImago Journal & Country Rank

ISSN 1004-9037


Editorial Board
Author Guidelines
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
  Copyright ©2015 JCST, All Rights Reserved