Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      15 September 2005, Volume 20 Issue 5   
    For Selected: View Abstracts Toggle Thumbnails
    Articles
    Single-Cycle Bit Permutations with MOMR Execution
    Ruby B. Lee, Xiao Yang and Zhi-Jie Jerry Shi
    Journal of Data Acquisition and Processing, 2005, 20 (5): 577-585 . 
    Abstract   PDF(848KB) ( 1735 )  
    Secure computing paradigms impose new architectural challenges for general-purpose processors. Cryptographic processing is needed for secure communications, storage, and computations. We identify two categories of operations in symmetric-key and public-key cryptographic algorithms that are not common in previous general-purpose workloads: advanced bit operations within a word and multi-word operations. We define MOMR (Multiple Operands Multiple Results) execution} or datarich execution as a unified solution to both challenges. It allows arbitrary n-bit permutations to be achieved in one or two cycles, rather than O(n) cycles as in existing RISC processors. It also enables significant acceleration of multi-word multiplications needed by public-key ciphers. We propose two implementations of MOMR: one employs only hardware changes while the other uses Instruction Set Architecture (ISA) support. We show that MOMR execution leverages available resources in typical multi-issue processors with minimal additional cost. Multi-issue processors enhanced with MOMR units provide additional speedup over standard multi-issue processors with the same datapath. MOMR is a general architectural solution for word-oriented processor architectures to incorporate datarich operations.
    The Value of a Small Microkernel for Dreamy Memory and the RAMpage Memory Hierarchy
    Philip Machanick
    Journal of Data Acquisition and Processing, 2005, 20 (5): 586-595 . 
    Abstract   PDF(510KB) ( 1708 )  
    This paper explores potential for the RAMpage memory hierarchy to use a microkernel with a small memory footprint, in a specialized cache-speed static RAM (tightly-coupled memory, TCM). Dreamy memory is DRAM kept in low-power mode, unless referenced. Simulations show that a small microkernel suits RAMpage well, in that it achieves significantly better speed and energy gains than a standard hierarchy from adding TCM. RAMpage, in its best 128KB L2 case, gained 11% speed using TCM, and reduced energy 14%. Equivalent conventional hierarchy gains were under 1%. While 1MB L2 was significantly faster against lower-energy cases for the smaller L2, the larger SRAM's energy does not justify the speed gain. Using a 128KB L2 cache in a conventional architecture resulted in a best-case overall run time of 2.58s, compared with the best dreamy mode run time (RAMpage without context switches on misses) of 3.34s, a speed penalty of 29%. Energy in the fastest 128KB L2 case was 2.18J vs. 1.50J, a reduction of 31%. The same RAMpage configuration without dreamy mode took 2.83s as simulated, and used 2.39J, an acceptable trade-off (penalty under 10%) for being able to switch easily to a lower-energy mode.
    A Novel Memory Structure for Embedded Systems: Flexible Sequential and Random Access Memory
    Ying Chen, Karthik Ranganathan, Vasudev V. Pai, David J. Lilja, and Kia Bazargan
    Journal of Data Acquisition and Processing, 2005, 20 (5): 596-606 . 
    Abstract   PDF(837KB) ( 1257 )  
    The on-chip memory performance of embedded systems directly affects the system designers' decision about how to allocate expensive silicon area. A novel memory architecture, flexible sequential and random access memory (FSRAM), is investigated for embedded systems. To realize sequential accesses, small ``links'' are added to each row in the RAM array to point to the next row to be prefetched. The potential cache pollution is ameliorated by a small sequential access buffer (SAB). To evaluate the architecture-level performance of FSRAM, we ran the Mediabench benchmark programs on a modified version of the SimpleScalar simulator. Our results show that the FSRAM improves the performance of a baseline processor with a 16KB data cache up to 55%, with an average of 9%; furthermore, the FSRAM reduces 53.1% of the data cache miss count on average due to its prefetching effect. We also designed RTL and SPICE models of the FSRAM, which show that the FSRAM significantly improves memory access time, while reducing power consumption, with negligible area overhead.
    A Power-Aware Branch Predictor by Accessing the BTB Selectively
    Cheol Hong Kim, Sung Woo Chung, and Chu Shik Jhon
    Journal of Data Acquisition and Processing, 2005, 20 (5): 607-614 . 
    Abstract   PDF(801KB) ( 1363 )  
    Microarchitects should consider power consumption, together with accuracy, when designing a branch predictor, especially in embedded processors. This paper proposes a power-aware branch predictor, which is based on the gshare predictor, by accessing the BTB (Branch Target Buffer) selectively. To enable the selective access to the BTB, the PHT (Pattern History Table) in the proposed branch predictor is accessed one cycle earlier than the traditional PHT if the program is executed sequentially without branch instructions. As a side effect, two predictions from the PHT are obtained through one access to the PHT, resulting in more power savings. In the proposed branch predictor, if the previous instruction was not a branch and the prediction from the PHT is untaken, the BTB is not accessed to reduce power consumption. If the previous instruction was a branch, the BTB is always accessed, regardless of the prediction from the PHT, to prevent the additional delay/accuracy decrease. The proposed branch predictor reduces the power consumption with little hardware overhead, not incurring additional delay and never harming prediction accuracy. The simulation results show that the proposed branch predictor reduces the power consumption by 29--47%.
    Performance Evaluation of Different Data Value Prediction Schemes
    Yong Xiao and Xing-Ming Zhou
    Journal of Data Acquisition and Processing, 2005, 20 (5): 615-623 . 
    Abstract   PDF(1612KB) ( 1382 )  
    Data value prediction has been widely accepted as an effective mechanism to break data hazards for high performance processor design. Several works have reported promising performance potential. However, there is hardly enough information that is presented in a clear way about performance comparison of these prediction mechanisms. This paper investigates the performance impact of four previously proposed value predictors, namely last value predictor, stride value predictor, two-level value predictor and hybrid (stride+two-level) predictor. The impact of misprediction penalty, which has been frequently ignored, is discussed in detail. Several other implementation issues, including instruction window size, issue width and branch predictor are also addressed and simulated. Simulation results indicate that data value predictors act differently under different configurations. In some cases, simpler schemes may be more beneficial than complicated ones. In some particular cases, value prediction may have negative impact on performance.
    Micro-Task Processing in Heterogeneous Reconfigurable Systems
    Sebastian Wallner
    Journal of Data Acquisition and Processing, 2005, 20 (5): 624-634 . 
    Abstract   PDF(1704KB) ( 1637 )  
    New reconfigurable computing architectures are introduced to overcome some of the limitations of conventional microprocessors and fine-grained reconfigurable devices (e.g., FPGAs). One of the new promising architectures are Configurable System-on-Chip (CSoC) solutions. They were designed to offer high computational performance for real-time signal processing and for a wide range of applications exhibiting high degrees of parallelism. The programming of such systems is an inherently challenging problem due to the lack of an programming model. This paper describes a novel heterogeneous system architecture for signal processing and data streaming applications. It offers high computational performance and a high degree of flexibility and adaptability by employing a micro Task Controller (mTC) unit in conjunction with programmable and configurable hardware. The hierarchically organized architecture provides a programming model, allows an efficient mapping of applications and is shown to be easy scalable to future VLSI technologies. Several mappings of commonly used digital signal processing algorithms for future telecommunication and multimedia systems and implementation results are given for a standard-cell ASIC design realization in 0.18 micron 6-layer UMC CMOS technology.
    Multiple-Morphs Adaptive Stream Architecture
    Mei Wen , Nan Wu, Hai-Yan Li, and Chun-Yuan Zhang
    Journal of Data Acquisition and Processing, 2005, 20 (5): 635-646 . 
    Abstract   PDF(1056KB) ( 1839 )  
    In modern VLSI technology, hundreds of thousands of arithmetic units fit on a 1cm$^{2}$ chip. The challenge is supplying them with instructions and data. Stream architecture is able to solve the problem well. However, the applications suited for typical stream architecture are limited. This paper presents the definition of regular stream and irregular stream, and then describes MASA (Multiple-morphs Adaptive Stream Architecture) prototype system which supports different execution models according to applications' stream characteristics. This paper first discusses MASA architecture and stream model, and then explores the features and advantages of MASA through mapping stream applications to hardware. Finally MASA is evaluated by ten benchmarks. The result is encouraging.
    Power Efficient Sub-Array in Reconfigurable VLSI Meshes
    Ji-Gang Wu and Thambipillai Srikanthan
    Journal of Data Acquisition and Processing, 2005, 20 (5): 647-653 . 
    Abstract   PDF(469KB) ( 1343 )  
    Given an m x n mesh-connected VLSI array with some faulty elements, the reconfiguration problem is to find a maximum-sized fault-free sub-array under the row and column rerouting scheme. This problem has already been shown to be NP-complete. In this paper, new techniques are proposed, based on heuristic strategy, to minimize the number of switches required for the power efficient sub-array. Our algorithm shows that notable improvements in the reduction of the number of long interconnects could be realized in linear time and without sacrificing on the size of the sub-array. Simulations based on several random and clustered fault scenarios clearly reveal the superiority of the proposed techniques.
    User-Level Device Drivers: Achieved Performance
    Ben Leslie, Peter Chubb, Nicholas Fitzroy-Dale, Stefan Gotz, Charles Gray, Luke Macpherson, Daniel Potts, Yue-Ting Shen, Kevin Elphinstone, and Gernot Heiser
    Journal of Data Acquisition and Processing, 2005, 20 (5): 654-664 . 
    Abstract   PDF(516KB) ( 2221 )  
    Running device drivers as unprivileged user-level code, encapsulated into their own process, has often been proposed as a technique for increasing system robustness. However, in the past, systems based on user-level drivers have generally exhibited poor I/O performance. Consequently, user-level device drivers have never caught on to any significant degree. In this paper we demonstrate that it is possible to build systems which employ user-level device drivers, without significant performance degradation, even for high-bandwidth devices such as Gigabit Ethernet.
    Hardware-Software Collaborative Techniques for Runtime Profiling and Phase Transition Detection
    Youfeng Wu and Yong-Fong Lee
    Journal of Data Acquisition and Processing, 2005, 20 (5): 665-675 . 
    Abstract   PDF(756KB) ( 1521 )  
    Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, a new profiling technique is proposed, that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the processor executes profiling operations asynchronously in available free slots or on dedicated hardware. The compiler instrumentation of this technique is implemented using an Itanium research compiler. The result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. The technique is extended to collect edge profiles for continuous phase transition detection. It is believed that the hardware-software collaborative scheme will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors.
    Secure Application-Aware Service Differentiation in Public Area Wireless Networks
    Weisong Shi , Sharun Santhosh, and Hanping Lufei
    Journal of Data Acquisition and Processing, 2005, 20 (5): 676-688 . 
    Abstract   PDF(1025KB) ( 1348 )  
    We are witnessing the increasing demand for pervasive Internet access from public area wireless networks (PAWNs). As their popularity grows, the inherent untrusted nature of public places} and the diverse service requirements of end users are two key issues that need to be addressed. We have proposed two approaches to address these issues. First, the Home-based Authentication Protocol (HAP) that provides a framework by which to establish trust between a nomadic client and a service provider using a trusted third party (home). Second, we argue that the best-effort-based service model provided by many access points is not enough to satisfy the end user fairness and to maximize the wireless link utilization for a diverse user population. We have proposed an application-aware service differentiation (AASD) mechanism that takes both application semantics and user requirements into consideration. Our analysis of this framework shows several fruitful results. The total authentication latency increases with the number of clients but at a rate that is much less than linear increasing latency. Also, in comparison with two other bandwidth allocation approaches, the best effort and static access control, our proposed application-aware service differentiation method, outperforms them in terms of the client fairness and wireless bandwidth utilization.
    Techniques for Determining the Geographic Location of IP Addresses in ISP Topology Measurement
    Yu Jiang, Bin-Xing Fang, Ming-Zeng Hu, and Xiang Cui
    Journal of Data Acquisition and Processing, 2005, 20 (5): 689-701 . 
    Abstract   PDF(690KB) ( 1579 )  
    A brief survey on the state-of-the-art research of determining geographic location of IP addresses is presented. The problem of determining the geographic location of routers in Internet Service Provider (ISP) topology measurement is discussed when there is inadequate information such as domain names that could be used. Nine empirical inference rules are provided, and they are respectively (1) rule of mutual inference, (2) rule of locality, (3) rule of ping-pong assignment, (4) rule of bounding from both sides, (5) rule of preferential exit deny, (6) rule of unreachable/timeout, (7) rule of relay hop assignment, (8) rule of following majority, and (9) rule of validity checking based on interface-finding. In totally 2,563 discovered router interfaces of a national ISP topology, only 6.4% of them can be located by their corresponding domain names. In contrast, after exercising these nine empirical inference rules, 38% of them have been located. Two methods have mainly been employed to evaluate the effectiveness of these inference rules. One is to compare the measured topology graph with the graph published by the corresponding ISP. The other is to contact the administrator of the corresponding ISP for the verification of IP address locations of some key routers. The conformity between the locations inferred by the rules and those determined by domain names as well as those determined by whois information is also examined. Experimental results show that these empirical inference rules play an important role in determining the geographic location of routers in ISP topology measurement.
    RMAC: A Reliable MAC Protocol Supporting Multicast for Wireless Ad Hoc Networks
    Wei-Sheng Si and Cheng-Zhi Li
    Journal of Data Acquisition and Processing, 2005, 20 (5): 702-712 . 
    Abstract   PDF(1188KB) ( 1631 )  
    This paper presents a new reliable MAC protocol called ``RMAC'' supporting reliable multicast for wireless ad hoc networks. By utilizing the busy tones to realize the multicast reliability, RMAC has three novelties: (1) it uses a variable-length control frame to stipulate an order for the receivers to respond, thus solving the feedback collision problem; (2) it extends the usage of busy tone for preventing data frame collisions into the multicast scenario; and (3) it introduces a new usage of busy tone for acknowledging data frames positively. In addition, RMAC is generalized into a comprehensive MAC protocol that provides both reliable and unreliable services for all the three modes of communications: unicast, multicast, and broadcast, making it capable of supporting various upper-layer protocols. The evaluation shows that RMAC achieves high reliability with very limited overhead. RMAC is also compared with other reliable MAC protocols, showing that RMAC not only provides higher reliability but also involves lower cost.
    Approximation Algorithms for Steiner Connected Dominating Set
    Ya-Feng Wu, Yin-Long Xu , and Guo-Liang Chen
    Journal of Data Acquisition and Processing, 2005, 20 (5): 713-716 . 
    Abstract   PDF(299KB) ( 1323 )  
    Steiner connected dominating set ( SCDS}) is a generalization of the famous connected dominating set problem, where only a specified set of required vertices has to be dominated by a connected dominating set, and known to be NP-hard. This paper firstly modifies the SCDS algorithm of Guha and Khuller and achieves a worst case approximation ratio of (2+1/(m-1))H((\D , k))+O(1), which outperforms the previous best result (c+1)H((\D, k))+O(1) in the case of m 1+1/(c-1), where c is the best approximation ratio for Steiner tree, \D is the maximum degree of the graph,k is the cardinality of the set of required vertices, m is an optional integer satisfying 0 m (D, k) and H is the harmonic function. This paper also proposes another approximation algorithm which is based on a greedy approach. The second algorithm can establish a worst case approximation ratio of 2\ln(\min(\D, k))+O(1), which can also be improved to 2\ln k if the optimal solution is greater than \frac{c\cdot e^{2c+1}}{2(c+1)}.
    Dynamic Fuzzy Controlled RWA Algorithm for IP/GMPLS over WDM Networks
    I-Shyan Hwang, I-Feng Huang, and Shin-Cheng Yu
    Journal of Data Acquisition and Processing, 2005, 20 (5): 717-727 . 
    Abstract   PDF(1309KB) ( 1670 )  
    This paper proposes a dynamic RWA scheme using fuzzy logic control on IP/GMPLS over WDM networks to achieve the best quality of network transmission. The proposed algorithm dynamically allocates network resources and reserves partial bandwidth based on the current network status, which includes the request bandwidth, average utilization for each wavelength and its coefficient of variance (C.V.) of data traffic, to determine whether the connection can be set up. Five fuzzy sets for request bandwidth, average rate and C.V. of data traffic are used to divide the variable space: very large (LP), large (SP), normal (ZE), small (SN), and very small (LN). Setting the fuzzy limit is a key part in the proposed algorithm. The simulation of scenarios in this paper has two steps. In the first step, the adaptive fuzzy limits are evaluated based on average transmission cost pertaining to ten network statuses. The second step is to compare the proposed algorithm with periodic measurement of traffic (PMT) in ATM networks in six network situations to show that the proposed FC-RWA algorithm can provide better network transmission.
    Clonal Strategy Algorithm Based on the Immune Memory
    Ruo-Chen Liu, Li-Cheng Jiao, and Hai-Feng Du
    Journal of Data Acquisition and Processing, 2005, 20 (5): 728-734 . 
    Abstract   PDF(667KB) ( 1760 )  
    Based on the clonal selection theory and immune memory mechanism in the natural immune system, a novel artificial immune system algorithm, Clonal Strategy Algorithm based on the Immune Memory (CSAIM) is proposed in this paper. The algorithm realizes the evolution of antibody population and the evolution of memory unit at the same time, and by using clonal selection operator, the global optimal computation can be combined with the local searching. According to antibody-antibody (Ab-Ab) affinity and antibody-antigen (Ab-Ag) affinity, the algorithm can allot adaptively the scales of memory unit and antibody population. It is proved theoretically that the CSAIM is convergent with probability 1. And with the computer simulations of eight benchmark functions and one instance of traveling salesman problem (TSP), it is shown that CSAIM has the strong abilities in having high convergence speed, enhancing the diversity of the population and avoiding the premature convergence to some degree.
SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved