

# PERFORMANCE ANALYSIS OF NOVEL PARALLEL FIR FILTER ARCHITECTURE FOR NOISE REDUCTION IN ECG SIGNAL PROCESSING FOR TIME CRITICAL APPLICATIONS

Kunjan D. Shinde\*

Research Scholar, Dept. of E&CE, SDM College of Engineering & Technology, Dharwad, VTU Belagavi, Karnataka, INDIA. ORCID: 0000-0002-0064-2981, kunjan18m@gmail.com

#### Vijaya C

Professor & Head, Dept. of E&CE, SDM College of Engineering & Technology, Dharwad, VTU Belagavi, Karnataka, INDIA. ORCID: 0000-0001-6167-5974, vijayac26@gmail.com

Abstract. High-performance computation is the demand of upgrading technology, processing biomedical signals requires the use of numerous signal processing algorithms executing as per described end application. Filtering is a simple but essential operation in most signalprocessing algorithms. To maintain signal quality, the ECG signal undergoes various stages of filtering as the ECG signals consist of multiple frequency components and noise. Implementation of the FIR filter is a challenging task for time-critical applications. Parallel FIR filter architectures meet the demand of high computational demand as described for timecritical applications but parallel FIR filters consume a large area and resources to be realized. The presented work gives a performance analysis of a conventional parallel FIR filter with proposed parallel FIR filter architecture on a reconfigurable platform. Improvement in resource utilization is to be noted as the number of DSP48E1s slices consumed in the proposed work is constant with a change in the level of parallelism from 2, 4, 8, and 16 for the FIR filter of order 16. The study depicts the architectural impact of the proposed work and from the comparative analysis, it is observed that the proposed FIR filter architecture uses fewer resources and improves the path delay by 64% while the DSPslices utilization is brought down up to 96%. The presented work is carried out on xc7vx690t-2Lffg1930 Virtex 7 series FPGA with low power grade and tools such as Xilinx ISE14.7, PlanAhead 14.7 are utilized for the design and analysis of the presented work on parallel FIR filter developed for pre-processing of ECG signals.

**Keywords.** Bio-medical Signal Processing, Denoising ECG signals, FIR Filter Architectures, Parallel Filter Architectures, FIR on Reconfigurable Architecture, High-speed FIR Filter, FPGA.

#### Introduction

Bio-medical signal such as ECG is a crucial undertaking in the field of medicine due to the relevance of electrocardiogram (ECG) signals and their signal integrity depicts the health and characteristics of the heart. Various conditions of the heart can be understood with the morphology of such signals. ECG is the method of cardiovascular diagnosis that clinicians

utilize the most frequently. ECG analysis is selected because it is straightforward, accurate, non-invasive, and has an excellent temporal resolution. ECG signals, on the other hand, must be filtered in order to make them useful for diagnosis because they are contaminated by a number of noise sources during signal acquisition using electrodes. The American Heart Association (AHA) has specified the adoption of a digital low-pass filter (LPF) as a vital component of its recommendations for standardizing electrocardiography [15,16]. Electromyographic (EMG) noise, interference from other electronic equipment, medical implants, and other sources are all removed by the LPF. FIR filters are preferred in many biomedical signal processing applications because of their linear phase response and stable characteristics.

The presented work is developed for pre-processing of ECG signals and removal of highfrequency noise components using parallel FIR filters. The parallel FIR filters are designed to meet high computational demand suitable for time-critical applications, the presented work showcases conventional and proposed parallel FIR filter architecture of filter order 16 with levels of parallelism varying from 2, 4, 8, and 16.

The existing work on ECG signal processing is focused on block-level optimizations while the presented work focuses on architectural enhancements. Most of the state-of-art FIR filter designs are not parallel architecture-based designs and are not suitable for time-critical applications. The filter architectures of existing works are at some bottlenecks due to the limitation in the way the data is processed within the filter architecture. The proposed parallel architecture gives an improved methodology to process the data efficiently while consuming low resources and producing multiple samples in less time as compared with conventional parallel architecture. The major drawback of parallel architecture is eliminated with the use of the proposed parallel FIR filter architecture.

The FPGA platform is opted to understand the implement-ability of denoising ECG signals using parallel FIR filters. The use of DSP slices is captured and other resources utilized are depicted in a comparative analysis of the paper. The proposed parallel FIR filter used is architecturally enhanced but block-level optimization is not carried out in any form, as the improvement due to architectural enhancement may not be captured effectively. The further scope of the presented work can be made with block-level optimization at constant multiplication, shift unit, and adder tree stages which contributes to the critical path of the proposed design.

The rest of the paper is organized as, in section II various state-of-art FIR filters developed for biomedical ECG signal processing are reviewed and various levels of optimization in existing architectures are understood. Section III gives the design of conventional parallel architecture and section VI gives the design of the proposed parallel FIR filter architecture. Section V gives the results and discussion with details of the experimental setup, simulation results, and comparative analysis. The conclusion, statements, and references are at the end of the paper.

## Literature Review

The following are the related work carried out by several researchers in the field of bio-medical signal processing and its implementation on related platforms. To validate the implementability of a design, FPGAs provide a suitable environment with fast time-to-market scope. Several

platforms apart from FPGA are also considered for the study to understand the importance of suitable techniques and methods for FIR filter design.

The bottlenecks of several FIR filter topologies in addition to the critical routes are predicted and verified on an FPGA platform in [1] while the emphasis on the use of symmetric filter coefficients is given for reduced area-based filter design. A new parallel FIR filter design is suggested in [2], and its effects are evaluated for a number of benchmark FIR filters, with the performance metric being well anticipated in comparative remarks. With the novel parallel FIR filter architecture of [2], a parallel architecture may be designed with lower space needs and faster filtering. The parallel FIR filter design of [2] is utilized in the presented work with the end users of ECG denoising and for the pre-conditioning of ECG signals under study. It is shown in [13] that the new parallel FIR filter design effectively reduces the AWGN noise added to the real-time audio data collected. The parallel FIR filter architecture of [2] is validated for audio-based applications with filter order 100. With [1,2,13] the importance of architectural enhancements is captured for versatile FIR filters and end applications. The raw ECG signal is taken from the MIT-BIH database with sample details provided in [17] and the same is used to test the functioning of the presented work.

The importance of digital signal processing and its effect on VLSI design are discussed [3,14], along with a summary of the many existing solutions and potential techniques to implement DSP algorithms. Additionally, different DSP algorithms are created and put into use on a VLSI platform while taking systematic optimization techniques into consideration for each algorithm. The construction of an FIR filter using windowing techniques and the differences between IIR and FIR filter approaches are thoroughly analyzed in [11].

The emphasis in [4] is on filters created especially for mobile ECG acquisition devices. The primary criteria for portable systems are low power consumption and small design. The filter is intended to minimize high-frequency noise in the ECG, and the area-power complexity is handled using a Vedic Multiplier based on the Urdhva-Tiryagbhyam sutra with optimization carried out using a methodical LSBs quantization methodology. Making use of this 16x16 Vedic Multiplier, which has been adjusted for filters with orders 16, 32, and 64. Using the Artix-7 FPGA xc7a200tfbg676-2, the filter architecture is assessed and comparisons are made. The ECG signal was introduced with high-frequency synthetic noise and filtered using the Vedic FIR filter developed. Low-power designs are known to be provided by a VLSI version of the Vedic sutra Urdhva-Tiryagbhyam [5-8], and its architecture is optimized for DSP applications by product bit quantization as indicated in [4]. The Carry Save Adder (CSA) substitutes the conventional Ripple Carry Adder (RCA) for high speed in the [8] VM architecture in several publications [5-8] on related platforms. The FIR filter was also developed using the VM-CSA architecture. [5] Compares several parallel prefix adder topologies, including the Brent-Kung Adder (BKA), Wallace Tree Multiplier (WTM), and VM, for 16th-order FIR filters. In spite of Wallace Tree's superior performance, it is determined that VM-based FIR filters use less energy and provide better Energy Delay Products (EDP). The ECG signal is processed on an FPGA platform, as in [9-8], and information on a filter with different windowing techniques is presented. The literature is taken into consideration for investigation because of the flexible FPGA platform and the type of reaction seen. The design considerations for a multiplier are explained in [11]. Because of their versatility, array

multipliers may be further customized for DSP applications. The datasheet for the FPGA being utilized, which gives information on the technology and its characteristics, is supplied in [12]. [19] depicts the variant in FIR filter design with improvising in block level optimization using Wallace tree multiplier/carry skip adder combination proves to be more efficient as compared to another multiplier/adder combinations.

The work of [20] uses a system generator to resolve the denoising in ECG signal and gives a comparative study on various filtering methods with lower filter order and captured area and power metric, analysis on the delay metric is not provided. In [21] cascaded FIR filter is developed and its functionality is validated for ECG signal processing. [22] gives the FIR filter realization with symmetric convolution and the FFA algorithm is explored, due to versatile filter order and specifications in the related literature referred for study.

Design of Parallel FIR Filter Architecture

FIR filter described using difference equation is represented in eq.1 and eq.2 gives the expanded view of Nth order (N+1 tap) FIR filter [1,2]. These equations provide the implementation details required for VLSI/ FPGA platforms.

$$y(n) = \sum_{k=0}^{N-1} b_k x(n-k)$$
 (1)

Expanding the above benchmark equation 1

$$y(n) = b_0 x(n) + b_1 x(n-1) + b_2 x(n-2) + \dots + b_{N-1} x(n-N-1)$$
(2)

Notations used are: y(n) is output sample, N is filter order,  $b_k$  is filter coefficients and x(n) is input sample.

To construct a parallel FIR filter a common approach is to replicate the primary design block with the level of parallelism and thereby produce multiple samples as output. In a parallel FIR filter, multiple samples are processed and produced simultaneously by the filter design.

The area requirements are calculated by multiplying the level of parallelism by the cost of implementing a single block with some overheads in realization. The complexity of the parallel filter realization increases with an increase in the level of parallelism and filter order. Hence the resource demand is very high for parallel filters and due to such area constraints the parallel architectures for filter design are not often preferred for low and moderate computing devices. High computational demand-based applications often make use of parallel architectures while effective optimization on the area requirements can be tested.

Figure 1 depicts the block diagram of the typical parallel processing FIR filter architecture. The MIMO unit is basically the block where hardware replication occurs, leading to the numerous inputs and outputs that are observed from the block. In order to provide parallel samples into and obtain serial samples out of the MIMO unit, serial-to-parallel converter blocks and parallel-to-serial converter blocks are required respectively.

In other words, the MIMO unit significantly improves the operating speed by producing multiples output while a non-parallel setup produces only one sample at the same time [1].



Fig.1 Block diagram of Conventional Parallel FIR filter architecture

To estimate the time required to compute the output sample, the critical path of the parallel system is to be calculated. The critical path through a conventional parallel architecture can be computed using eq.3 [3,1].

$$\Gamma_{\text{parallel}} = \frac{1}{L} T_{clk} \ge \frac{1}{L} \text{T}_{\text{Critical}} \text{Path}$$
(3)

where  $T_{parallel} \neq T_{clk}$  as multiple outputs are computed in a single clock cycle. It is to be noted from the eq.1, as the level of parallelism increases the system becomes faster, in other words, more output samples are generated in the same amount of time as the

in other words, more output samples are generated in the same amount of time parallelism is increased.

## Proposed Parallel FIR Filter Architecture and its Design

The major concern of conventional parallel architecture is the large area requirement; this can be resolved by improvising the data flow in parallel filter architecture. From eq.1 and 2, it is to be noted that the delayed version of input samples are multiplied with filter coefficients which are constant for a given end application. Now focusing on the conventional parallel architecture, at a given instant of time for a predefined level of parallelism, the input samples being multiplied can be pulled up to a single instance where the constant multiplication can be realized. There some arrangement has to be made for using the samples whose multiplication with a filter coefficient is occurring in advance. Later the stored partial products can be put in line to give the required output sample. Multiple stages of the adder tree can be introduced to produce multiple output samples inline with the level of parallelism and the length of the storage unit and constant multiplication can be interlinked with the level of parallelism.

The details mentioned above can be described in a general block diagram as shown in Fig.2 [2, 13]. The store/shift unit and constant multiplications block are essential components of the proposed architecture. The adder tree block pulls up all the partial products stored and simultaneously the multiple samples can be realized by implementing the multiple adder tree as this is directly related to the level of parallelism.



Fig.2 Block diagram of proposed parallel FIR filter architecture

To understand the impact of the proposed architecture, let's evaluate the critical path and obtain the necessary equations to set up the proposed architecture. The input sample first goes through the constant multiplication stage and later the result of the multiplication is stored in the store/shift unit, the adder tree captures the stored values and computes the desired output sample and at last, the samples can be fed to parallel to serial converter to put all the output samples inline. The critical blocks in the proposed architecture are constant multiplication, adder tree unit, and storage unit, and hence these block's performance is to be taken into account while computing the critical path.

Equation 4 provides the critical path for the proposed design involving critical blocks as mentioned above, and eq.5 provides the total delay (speed metrics), which is derived similarly as estimated in conventional parallel architecture as shown in eq.3. The eq.6 is used to determine the length of the shift unit needed for the realization of the proposed parallel FIR filter architecture. Detailed analysis and design of the proposed parallel FIR filter are provided in [2].

$$T_{Critical\_path\_proposed} = T_{Constant\_mutliplication\_unit} + T_{store/shift\_unit} + T_{adder\_tree\_unit}$$
(4)

$$\mathbf{T}_{\text{parallel}\_proposed} = \frac{1}{L} T_{clk} \ge \frac{1}{L} \mathbf{T}_{\text{Critical}\_path\_proposed}$$
(5)

$$L_{\text{store/shift_unit}} = N + L - 1 \tag{6}$$

Where L- level of parallelism, N- filter Order

#### **Results and Discussions**

The presented work is developed for denoising of ECG signals, as the ECG signals undergo various noise eliminations and in the proposed work we have designed the FIR filter for prefiltering of ECG signals to remove the high-frequency noise component and make the ECG signal ready for specific application-oriented processing. The filter design was accomplished using the specifications given in the experimental setup of the paper on the Matlab filterDesigner tool. Xilinx Virtex 7 series FPGA xc7vx690t-2Lffg1930 has opted for the study while the relative choice of FPGA is considered for low power grade which is suitable for battery-operated devices and end applications. Xilinx ISE 14.7 is used for synthesis and post-synthesis to obtain the implementation details of the presented work. Using Xilinx PlanAhead 14.7 the post-implementation details are captured and tabulated the path delay associated with the design and on-chip power consumption. The ECG dataset of [18] is used from the MIT-BIH platform as a standard reference to the presented work and the source is freely available for research-related activity.

#### **Experimental Setup**

The [17] gives the standard recommendations for filter design to process the ECG signal and denoising the signal under study. The construction of the proposed FIR filter is designed to be in line with [2].

Fig.3 gives a setup environment for the presented work, we have used the Matlab tool to capture the analog signature of the ECG signal and the filtered version of the ECG signal of sample mentioned in [18] is considered and a known amount of noise is introduced. The noisy version of input samples is written into a text file and with the help of Xilinx ISE environment and file

IO operations, the text file is read and input samples are processed with a designed filter. The filtered version of output samples is recorded in the output text file and further, it is read in a Matlab environment for calculating SNR and future processing of the signal.



Fig.3 Experimental setup of the presented work: Denoising of ECG signal using parallel FIR filter architecture

The following are the technical details of the FIR filter used in the presented work.

Generating FIR filter coefficients using filterDesigner tool in Matlab: FIR Filter coefficients are generated using filterDesigner (fdatool) tool available in Matlab, the coefficients generated are in fractions and are scaled to 216. The filter specifications are as follows and the setup of filterDesigner is shown in fig.4.

| Filter type            | : Low Pass Filter   |
|------------------------|---------------------|
| Filter order           | : 16                |
| Level of Parallelism   | : 2, 4, 8, and 16   |
| Sampling rate          | : 400Hz             |
| Cutoff frequencies     | : 100Hz             |
| FIR coefficient method | : Window – Blackman |



Fig.4 Setup of FIR filter requirements on filterDesigner tool in Matlab

- Adding Noise: AWGN of value 10 is added with raw ECG signal [17] and read in Matlab, rounded version of these integers is written in text file for processing on Xilinx environment.
- Signal to Noise ratio calculation: It is calculated using the built-in function SNR available in Matlab.
- File I/O operations: It is invoked in both Matlab and Xilinx environments, and the stimulus block is developed using uses Verilog HDL coding.
- Input and output Signal Precession: The input samples are set to the precession of 16-bit signed representation and output is captured with full-scale 32-bit signed representation.
- Filter Coefficients: Table 1 depicts the FIR filter coefficients obtained from the Matlab filterDesigner/fda tool and the rounded version of the filter coefficients are used for RTL coding. It can be noted that the round-off noise is very low and the resultant rounded filter coefficients can be effectively utilized for the denoising of ECG signals.

## Table.1 FIR filter coefficients scaled and rounded version for RTL coding

| Filter<br>Coefficients | Original Value<br>obtained from | Scaled<br>version | Rounded<br>Version of filter | Round-off<br>noise |
|------------------------|---------------------------------|-------------------|------------------------------|--------------------|
| variables              | FDA tool                        | (bi^65536)        | coefficients                 |                    |
| <b>b</b> <sub>0</sub>  | 0                               | 0                 | 0                            | 0                  |
| <b>b</b> <sub>1</sub>  | -0.00066                        | -43.5811          | -44                          | 6.39E-06           |
| b <sub>2</sub>         | 1.29E-18                        | 8.48E-14          | 0                            | 1.29E-18           |
| <b>b</b> <sub>3</sub>  | 0.010952                        | 717.7496          | 718                          | -3.8E-06           |
| <b>b</b> <sub>4</sub>  | -6.6E-18                        | -4.3E-13          | 0                            | -6.6E-18           |
| <b>b</b> 5             | -0.05884                        | -3856.4           | -3856                        | -6.1E-06           |
| b <sub>6</sub>         | 1.51E-17                        | 9.88E-13          | 0                            | 1.51E-17           |
| <b>b</b> <sub>7</sub>  | 0.298639                        | 19571.59          | 19572                        | -6.2E-06           |
| b <sub>8</sub>         | 0.499836                        | 32757.28          | 32757                        | 4.33E-06           |
| b9                     | 0.298639                        | 19571.59          | 19572                        | -6.2E-06           |
| b <sub>10</sub>        | 1.51E-17                        | 9.88E-13          | 0                            | 1.51E-17           |
| b11                    | -0.05884                        | -3856.4           | -3856                        | -6.1E-06           |
| b <sub>12</sub>        | -6.6E-18                        | -4.3E-13          | 0                            | -6.6E-18           |
| b <sub>13</sub>        | 0.010952                        | 717.7496          | 718                          | -3.8E-06           |
| b <sub>14</sub>        | 1.29E-18                        | 8.48E-14          | 0                            | 1.29E-18           |
| b <sub>15</sub>        | -0.00066                        | -43.5811          | -44                          | 6.39E-06           |
| b <sub>16</sub>        | 0                               | 0                 | 0                            | 0                  |

#### **Simulation Results**

The design of FIR filter with above specifications is carried out in Matlab 2018, the magnitude vs. frequency response is obtained for the same and shown in fig.5. The signal bandwidth at the pre-processing stage is kept larger due to the later use of filtered signal for application dedicated feature extraction.





The functional verification of the presented work is carried out on Xilinx ISE 14.7, with the help of file IO operations, the reading of ECG signals is carried out from text file and the filtered version of ECG signal is written into text file for further analysis. The filtered version of the ECG signal is scaled to 65536 at the time coefficient scaling as depicted in table 1, proper care has to be taken while scaling down. Due to the limitation of the Xilinx ISE 14.7 simulation environment, the input and output signals cannot be visualized in analog form.

|                                                          |               |   |       |      |     | 40.000       | ns |    |       |        |      |         |          |        |      |       |      |      |       |              |
|----------------------------------------------------------|---------------|---|-------|------|-----|--------------|----|----|-------|--------|------|---------|----------|--------|------|-------|------|------|-------|--------------|
| Name                                                     | Value         |   | 35 ns |      |     | 40 ns        |    |    | 45 ns |        | .    | 50 ns   |          | 55 ns  | ,    | 60 ns |      |      | 65 ns |              |
| ▶ 😽 F_out(31:0)                                          | -1246033      | X | -130  | -128 | -12 | 4 <b>(</b> 1 | 22 | 31 | -132  | -851   | -81  | 1       | 477.     | . 385. | <br> | 5     | 126) | -118 | -109  | <b>-152X</b> |
| ▶ <table-of-contents> A[15:0]</table-of-contents>        | -18           |   | 5     |      |     |              |    | ·  | 8     |        |      |         |          | -40    |      |       |      |      | 4     |              |
| 请 dk                                                     | 0             | Л |       |      |     |              |    | 1  |       | Л      |      |         |          |        |      |       |      |      |       |              |
| outfile0[31:0]                                           | 1111111111111 |   |       |      |     |              |    |    |       | 111111 | 1111 | 1111110 | 11000111 | 100000 |      |       |      |      |       |              |
| ▶ <table-of-contents> outfile3[31:0]</table-of-contents> | 111111111111  |   |       |      |     |              |    |    |       | 111111 | 1111 | 1111110 | 11000111 | 100001 |      |       |      |      |       |              |
| ▶ 👹 count[2:0]                                           | 000           |   |       |      |     |              |    |    |       |        |      | 000     |          |        |      |       |      |      |       |              |

Fig.4 Simulation results of ECG signal processed in Xilinx ISE 14.7 environment Figure.4 gives a simulation of the ECG signal under study in numerical form, the same set of results are obtained from the rest of the conventional and proposed parallel FIR filter architectures and hence only a single sample copy of the data is displayed.

#### **Comparative Analysis**

Table 1 gives the comparative analysis of conventional parallel FIR filter architecture with proposed parallel FIR filter architecture with levels of parallelism varying from 2, 4, 8, and 16. Table 1 depicts the device utilization of Xilinx Virtex 7 series FPGA xc7vx690t-2Lffg1930 which is a low-power grade FPGA, parameters such as a number of slices, LUT and DSP 48 slice gives the utilization of FPGA for area metric while the minimum period and path delay gives the timing details. The column with initials as Block size gives the level of parallelism for parallel architectures and the column with initials as a number of output samples generated gives the details on output samples generated.

It is important to note the path delay and minimum period with block size and no. of output samples generated, it is observed that the minimum period and path delay are almost constant for changes in the block size of conventional parallel architecture, this is due to the multiple outputs computed within the stipulated timing metric, the values being almost constant does not mean the degradation of parallel architecture, but the data is to be observed on how many samples are produced and hence the implication of parallel architecture being set to increase in speed of operation by increasing the number of samples produced by the system. Similarly, in the proposed parallel architecture also we can observe the parameters like path delay and minimum period being constant while the increase in a number of output samples with an increase in block size.

# Table.2 Post synthesis implementation details of various parallel FIR filter architectureson xc7vx690t-2Lffg1930 Xilinx Virtex 7 series FPGA

| PERFORMANCE ANALYSIS OF NOVEL PARALLEL FIR FILTER ARCHITECTURE FOR NOISE REDUCTION IN ECG SIGNAL PROCESSING FOR |
|-----------------------------------------------------------------------------------------------------------------|
| TIME CRITICAL APPLICATIONS                                                                                      |

| Parallel      | Block | Slice | LUT   | DSP | Minimum     | Maximum               | Path   | Logic  | Route | No. of    |
|---------------|-------|-------|-------|-----|-------------|-----------------------|--------|--------|-------|-----------|
| Architectures | size  |       |       | 48  | Period (ns) | s) Frequency Delay (n |        | delay  | Delay | Outputs   |
|               |       |       |       | E1s |             | (M Hz)                |        | (ns)   | (ns)  | Generated |
|               | 1     | 4     | 16    | 9   | 12.392      | 80.7                  | 12.392 | 12.392 | 0     | 1         |
|               | 2     | 71    | 64    | 18  | 0.926       | 1079.7                | 12.813 | 12.474 | 0.339 | 2         |
| Conventional  | 4     | 99    | 64    | 38  | 0.964       | 1036.9                | 12.813 | 12.474 | 0.339 | 4         |
|               | 8     | 108   | 60    | 80  | 0.975       | 1025.1                | 12.813 | 12.474 | 0.339 | 8         |
|               | 16    | 205   | 335   | 158 | 0.975       | 1025.1                | 13.540 | 12.733 | 0.807 | 16        |
|               | 1     | 104   | 371   | 5   | 1.069       | 935.4                 | 4.602  | 2.302  | 2.300 | 1         |
|               | 2     | 234   | 555   | 5   | 1.069       | 935.4                 | 4.609  | 2.302  | 2.307 | 2         |
| Proposed      | 4     | 411   | 1,038 | 5   | 1.069       | 935.4                 | 4.609  | 2.302  | 2.307 | 4         |
|               | 8     | 797   | 2,141 | 5   | 1.069       | 935.4                 | 5.354  | 2.832  | 2.522 | 8         |
|               | 16    | 1665  | 3756  | 5   | 1.069       | 935.4                 | 5.354  | 2.832  | 2.522 | 16        |

From the table 1, it is to be noted that the path delay of proposed work is improved by on an average of 64% and minimum period is slightly increased by 9%. Whereas the parallel architectures of block size one can be treated as conventional direct form architecture (as direct form architecture is considered in the realization of conventional parallel filter), and now looking at the performance of conventional parallel architecture with block size 1 and the rest of the variants, the parallel architectures are 10 times faster with multiple samples being produced in the same instant.

To validate the path delay of table 1 and to measure the path delay after post-synthesis table 2 is prepared. The post-synthesis implementation details are obtained after the device implementation is captured by the Xilinx implementation stage and with the help of planAhead 14.5 the device implantation can be visualized. The report on the timing is captured with path delay with details on the distribution of path delay in terms of logic and route delay and the on-chip power is also captured for the parallel FIR filter architectures presented in this paper.

It is observed that the parallel FIR filters used here have consumed the same amount of power while the implementation details have changed in terms of block size and filter architectures. It is interesting to note that the path delay after the post-synthesis has slightly reduced for the proposed parallel FIR filter and slightly increased for the conventional parallel architecture.

#### Table.3 Path delay and power estimation from device implementation details captured

| Parallel<br>Architectures | Block<br>size | Path<br>Delay | Logic<br>delay | Route<br>Delay | Total On-<br>Chip Power |  |  |
|---------------------------|---------------|---------------|----------------|----------------|-------------------------|--|--|
|                           |               | (ns)          | (ns)           | (ns)           | (mW)                    |  |  |
|                           | 1             | 11.447        | 11.397         | 0.050          | 286.90                  |  |  |
|                           | 2             | 17.344        | 14.533         | 2.811          | 286.90                  |  |  |
| Conventional              | 4             | 17.510        | 14.538         | 2.972          | 286.90                  |  |  |
|                           | 8             | 18.805        | 14.492         | 4.313          | 286.90                  |  |  |
|                           | 16            | 20.553        | 14.844         | 5.709          | 286.90                  |  |  |
|                           | 1             | 3.946         | 2.080          | 1.866          | 286.90                  |  |  |
|                           | 2             | 4.146         | 1.984          | 2.162          | 286.90                  |  |  |
| Proposed                  | 4             | 4.513         | 2.031          | 2.482          | 286.90                  |  |  |
|                           | 8             | 4.731         | 2.092          | 2.639          | 286.90                  |  |  |
|                           | 16            | 5.296         | 2.099          | 3.197          | 286.90                  |  |  |

| on | Xilinx | PlanAhead | 14.7 |
|----|--------|-----------|------|
|----|--------|-----------|------|

The comparative analysis has provided detailed insights on the realization of conventional parallel and proposed parallel FIR filter architectures, It is observed that the proposed parallel FIR filter architecture for denoising of ECG signals has utilized least resources and had shown improvement in path delay up to 64% with a reduction in DSP slice utilization from 44% to 96% with the change in block size.

## Conclusion

Signal processing is a complex task involving various algorithms for dedicated feature extractions, in bio-medical signal processing the morphology of the signal plays a vital role. FIR filters are suitable for applications involving the preservation of the phase and shape of the signals. Conventional FIR filters may not be suitable for time-critical applications. Parallel FIR filters meet the high computational demand but require a large area for realization. The presented work uses a novel parallel FIR filter which is architecturally enhanced to meet the high computation demand and can be realized with very less hardware. The presented work is developed for denoising of ECG signals by preprocessing the ECG signal free from high-frequency noise. The comparative study gives the resource requirements and the performance analysis of conventional parallel and proposed parallel FIR filter architecture for various filter orders from 2, 4, 8, and 16. From the study presented, it is observed that the proposed work uses constant DSP-Slices irrespective of change in the level of parallelism which is a major advantage of the proposed architecture, while block-level enhancements can be further explored.

# Funding

The authors declare that no funding is provided for this research work from any of the resources.

# **Declaration of competing interest**

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## **Conflict Of Interest**

We the authors would like to declare that there is NO Conflict of Interest on the research work presented. NO data sets were generated or created in the process.

## Acknowledgement

I would like to express my gratitude to my supervisor, Vijaya C, who guided me throughout this project. I wish to acknowledge the suggestions provided by the doctoral committee. I extend my gratitude to the principal and management of SDM College of Engineering and Technology, Dharwad for the academic environment provided.

## References

- Kunjan D. Shinde, and C. Vijaya. "Bottlenecks in Finite Impulse Response Filter Architectures on a Reconfigurable Platform." Recent Advances in Artificial Intelligence and Data Engineering. Springer, Singapore, 2022. 309-325. DOI: 10.1007/978-981-16-3342-3\_26.
- Kunjan D. Shinde and Dr. Vijaya C, "A High Speed and Area Efficient Parallel FIR Filter Architecture for Time Critical Applications", Microprocessor and Microsystem Journal, Elsevier, unpublished, May 2022.

- Keshab K. Parhi, VLSI Digital Signal Processing System- Design And Implementation, Wiley students edition 2013, ISBN: 978-81-265-1098-6, pp1,5-6 & 31-40, 63-83.
- S. Janwadkar and R. Dhavse, "Strategic Reduction of Area and Power in FIR Filter Architecture for ECG Signal Acquisition," 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India, 2020, pp. 1-7, doi: 10.1109/INDICON49873.2020.9342386.
- Mittal, A. Nandi, and D. Yadav, "Comparative Study of 16-order FIR Filter Design using Different Multiplication Techniques," IET Circ., Device Syst., vol. 11, no. 3, pp. 196-200, 2017
- M. Sumalatha, P. Naganjaneyulu, and K. S. Prasad, "Low Power and Low Area VLSI Implementation of Vedic Design FIR filter for ECG Signal Denoising," Microprocess Microsy, vol. 71, p. 102883, 2019.
- N. S. Rai, B. S. Pannagashree, Y. P. Meghana, A. P. Chavan, and H. V. R. Aradhya, "Design and implementation of 16 tap fir filter for dsp applications," in 2018 Second Int. Conf. Advances in Electronics Computers and Communications (ICAECC), 2018, pp. 1-5.
- Mittal and A. Nandi, "Design of 16-bit FIR Filter using Vedic Multiplier with Carry Save Adder," in 44th IRF Int. Conf., 2015, pp.57-60.
- R. Popa, "ECG Signal Filtering in FPGA," 2019 6th International Symposium on Electrical and Electronics Engineering (ISEEE), 2019, pp. 1-6, doi: 10.1109/ISEEE48094.2019.9136119.
- Rahul Sharma, Rajesh Mehra, Chandni, "FPGA based Asynchronous FIR Filter Design for ECG Signal Processing", International Journal of Computer Applications (0975 – 8887) Volume 156 – No 7, December 2016.
- Asha, K.A., Shinde, K.D. (2016). Performance Analysis and Implementation of Array Multiplier using various Full Adder Designs for DSP Applications: A VLSI Based Approach. In: Corchado Rodriguez, J., Mitra, S., Thampi, S., El-Alfy, ES. (eds) Intelligent Systems Technologies and Applications 2016. ISTA 2016. Advances in Intelligent Systems and Computing, vol 530. Springer, Cham. https://doi.org/10.1007/978-3-319-47952-1\_59
- Xilinx FPGA details and datasheet accessed on 11-April-2022: ARTIX 7 family https://www.xilinx.com/products/silicon-devices/fpga/artix-7.html
- Kunjan D. Shinde and Dr. Vijaya C, "Architecturally Enhanced Parallel FIR Filter Architecture for Time Critical Applications: An Impact Analysis for Audio Signal Processing on Reconfigurable Platform" AEUE- International Journal of Electronics & Communication,Elsevier, Unpublished, Dec 2022.
- Vijaya C and Uday kumar., "Digital Signal Processing", Elite printers & publishers, ISBN 1234567156882.
- P. Kligfield et al.,"Recommendations for the Standardization and Interpretation of the Electrocardiogram," J. Am Coll Cardiol, vol. 49, no. 10, pp. 1109-1127, 3 2007
- M. Rangaraj and Rangayyan, Biomedical Signal Analysis: A Case Study Approach. Wiley India Edition, pp. 73-175, 01 2010

- Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000 Jun 13;101(23):E215-20. doi: 10.1161/01.cir.101.23.e215. PMID: 10851218.
- https://www.physionet.org/physiobank/database/html/mitdbdir/records.htm#100 Accessed on 13/09/2022
- A. AlJuffri, A. S. Badawi, M. S. BenSaleh, A. M. Obeid and S. M. Qasim, "FPGA implementation of scalable microprogrammed FIR filter architectures using Wallace tree and Vedic multipliers," 2015 Third International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), Beirut, Lebanon, 2015, pp. 159-162, doi: 10.1109/TAEECE.2015.7113619.
- Kirti, Harsh Sohal, Shruti Jain, "FPGA implementation of collateral and sequence preprocessing modules for low power ECG denoising module", Informatics in Medicine Unlocked, Volume 28, 2022, 100838, ISSN 2352-9148, https://doi.org/10.1016/j.imu.2021.100838.
- Prashar, N., M. Sood, and S. Jain. "Design and performance analysis of cascade digital filter for ECG signal processing." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8.8 (2019): 2659-2665.
- Y. -C. Tsao and K. Choi, "Area-Efficient Parallel FIR Digital Filter Structures for Symmetric Convolutions Based on Fast FIR Algorithm," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 366-371, Feb. 2012, doi: 10.1109/TVLSI.2010.2095892.