

Volume XXVII 2024 ISSUE no.1 MBNA Publishing House Constanta 2024



SBNA PAPER • OPEN ACCESS

# An energy-efficient gated Johnson counter using redundant switching technique

To cite this article: Denis Sami, Deyan Levski and Aneliya Manukova, Scientific Bulletin of Naval Academy, Vol. XXVII 2024, pg. 145-149.

Submitted: 26.04.2024 Revised: 20.06.2024 Accepted: 07.08.2024

Available online at <u>www.anmb.ro</u>

ISSN: 2392-8956; ISSN-L: 1454-864X

# An energy-efficient gated Johnson counter using redundant switching technique

### Denis Sami, Deyan Levski, Aneliya Manukova

Photolitics OOD, ul. Aleksandrovska 55, 7000 Ruse, Bulgaria, Department of Electronics, University of Ruse, Bulgaria

Abstract. This paper proposes an approach to increase the energy efficiency of gated Johnson counters by eliminating the redundant switching consumption of the unused counter-chain cells. Two techniques that aim to achieve this goal are analysed and show that efficiency improves almost exponentially as the counter resolution increases up to a limiting asymptote. To demonstrate the effectiveness of the proposed approach, a 5-bit Johnson counter architecture with 16 stages is designed. The design is extensively analysed and simulated up to 1 GHz clock frequency and implemented into single channel SS-ADC as part of CIS realized in a 0.11 $\mu$ m 1P4M BSI CMOS process. The obtained results show that the proposed approach trades energy efficiency for the area and is practically suitable only for counters with resolution  $\leq 8$  bits.

#### 1. Introduction

As the size of systems-on-chip continues to shrink and system speeds increases, low-power circuit design has become a critical concern in VLSI design. This is especially true for synchronous sequential circuit logic, which is widely used in various systems, including image sensors, and areas where micro-electronics reliability is paramount importance [1]. Among the conventional building blocks from sequential logic, counters are a core component, and their power efficiency has a huge impact on system performance [2], [3].

With the increase of electronic devices in the consumer basket the transfer from standalone electronics into integrated requires more digital knowledge and experience into microelectronics design furthermore in sequential logic in order to meet labor market needs in field of microelectronic industry in which Johnson counters are heavily represented, are purposefully included, and studied in the university curriculum [4], [5].

Some conventional counter architectures are typically designed using either binary or thermometer output coding in a synchronous or asynchronous switching domain. However, power inefficiencies can arise from unnecessary gate activity, leading to high power dissipation. Most efforts for clock power management in systems are focused on reducing voltage swings, optimal buffering and clock distribution and routing [6]. However, oftentimes, inefficiency comes from unnecessary gate activity from where the clock gating techniques as such have been extensively analyzed by researchers [7][8].

The aim of this paper is to propose a low-power technique applied to a Johnson counter that reduces the size of the whole counter chain and increases consumption efficiency without speed degradation. The studied system has lower power dissipation and a simpler design compared to the conventional one. The verification of the system efficiency is achieved by designing and comparing various counter architectures, followed by additional power analysis verified with SPICE engine and further implemented into single channel single slope ADC architecture that takes place into final imager CIS product.

### 2. Architecture overview

The proposed Johnson counter architecture is displayed in Fig. 1 that is also known as a walking-ring counter that output is achieved by a thermometer code structure. The counter implementation consists of a chain of serially connected modified D flip-flop (DFF) cells.

In Fig. 2 reveals a) a single cell from a Johnson counter that is composed of a master-slave DFF based on tri-states structure with included extra combinatorial logic in the periphery. The modified DFF cell includes in-self OR, NAND and MUX gates that components forming an additional combinatorial logic. The logic that is formed by adding extra logic leads to achieve gating technique that generates a local clock pulses for each modified cells, that further leads eliminates redundant switches in the counter chain. Implementing such a technique results in increased consumption efficiency with an increase in resolution, making the counter consumption approximately fixed and not strongly resolution-dependent.

In order to enable clock gating feature to be compatible with double data rate (DDR) counting, the periphery logic is design in such a way that two clock paths is performed to a clock input of conventional DFF. Each clock path enable is self-configured that is strongly dependent on the input data so that each periphery cell receives feedback from the inverted output of the DFF, the input data (from previous cell), and the state of the main (master) clock. The inverter shown in the block diagram is not a physical component but shows the logical inversion of a clock that is sensed to a NAND input. The inversion is used only in one of the inputs of NAND path in order to maintain the same clock phase (polarity) of both paths and no glitches occur by switching between both clock paths.



The timing diagram is displayed in Fig. 2 b) that reflects the <u>operation</u> principle of a single channel (cell) gated counter. The individ<u>ual</u> cell is initialized by toggling to a low state of a *CLEAR* signal so that the output is referred respectively to Q = 0 and Q = 1. The initial state is followed by switching activity that fills the counter with 'ones' where the local clock is being generated by the NAND gate noted by *Y*. The switching activity continues until counter overflow occur that leads to toggle last cell in the DFF chain to be inverted so that output *Q* that is

shorted to the first cell input realizes DDR counting. The following counting activity is replaced by feeding 'zeroes' in the chain. Although the local clock is now gated by OR instead of NAND noted by X. The generated gCLK is sensed to the clock input of DFF by MUX whose control signal is self-triggered by Q of the last chain cell. The intuitively proposed operation is extensively analyzed and described in following section in order to verify efficiency of the architecture.

# 3. Energy Analysis

To effectively analyze the system, it is recommended to split the process into firstly analyzing the DFF displayed in Fig. 3 as an individual cell followed by analyze with including those extra combinatorial logic. The structure of the DFF used in proposed architecture is composed of tristate inverters, conventional CMOS inverters, NAND, and complementary switches. The consumption analysis of the DFF is separated into two main aspects based on:

• Static activity of DFF - this occurs when the previous and present DFF are in same state. The power consumption is only determined by the master sub-latch that contains two face-to-face tri-state inverters and conventional CMO inverter. When input data of the DFF is unchanged to slave keeps the it's storage sot that no switching activity occur hence no consumption ( $P_{Slave} = 0W$ ). That statement can be concluded as:

$$P_{nonSw} = P_{Master} + P_{Slave} = 2P_{tri-state} + P_{inv}$$
(1)

where  $P_{nonSw}$ ,  $P_{Master}$ ,  $P_{Slave}$  represents non-switching power consumption that is the sum of master and slave latch consumption.



Fig. 3: Structure of DFF

With including clock gating technique with extra combinatorial periphery the redundant switching of cells where data is remain unchanged is excluded so that the consumption of the clock is determined only by DFF that dynamically switching.

• Dynamic activity of DFF - Also known as switching activity that occurs when the present data is opposite to the previous data and vice versa. That activity causes both DFF halves (master and slave) to consume noted as dynamic (switching or toggling) power. Such a configuration leads increase maximum consumption that can be concluded as:

$$P_{Sw} = P_{Master} + P_{Slave} = 2P_{tri-state} + 2P_{inv} + P_{NAND}$$
(2)

In a free running Johnson counter is mostly composed of single cell that is dynamically switching by nature of the architecture. Thus helps to keep the power consumption to be resolution independent. This statement is true up-to 8-bit resolution. With increasing chain the number of cells drastically increases that leads to beefy drivers to drive the global signal to the counter cells (that most of consumption comes from).

Further consumption analyses is needed with adding extra combinatorial logic in order to determine the overall consumption of single cell that is equal to overall consumption of the counter

system. The consumption of individually switching cell is determined by digital component that is along the clock path that also depends on input feeding 'ones' or 'zeroes' that is concluded as follows: • During the initial state of operation, the all cells in the chains are in reset, causing the output of the DFFs to be Q = 0 and Q = 1. The operation of the counter starts with feeding 'ones' and performing clock path through NAND gate. So that switching consumption is determined by:

$$P_{ones} = P_{NAND} + P_{DFF} \tag{3}$$

where  $P_{ones}$ ,  $P_{NAND}$  and  $P_{DFF}$  are respectively consumption of the NAND and DFF. That operation continues until all cells are overflowed by 'ones' so that second stage of operation occur.

• The second stage operation of the counter starts with feeding 'zeroes' and performing clock path through OR gate. So that switching consumption is determined by:

$$P_{zeroes} = P_{OR} + P_{DFF} \tag{4}$$

where  $P_{zeroes}$ ,  $P_{OR}$  and  $P_{DFF}$  are respectively consumption of the OR and DFF. That operation stage continues until all cells are under-flowed by 'zeroes' so that counter starts from beginning.

To sum up the overall power consumption  $P_{prop}$  is equal to:

$$P_{prop} = n(P_{ones} + P_{zeroes}) \tag{5}$$

where n is counter resolution.

To further increase the power consumption using of a DDR technique is a must that leads to reduction number of cells in the chain by half so that resolution is reduced to 2n-1 from 2n that also leads to compact layout design.

### 4. Experimental results

To assess the counter efficiency of the proposed techniques compared to conventional ripple-carry (asynchronous), synchronous binary counter, conventional synchronous Johnson counter is shown in the timing diagram in Fig.4 where the current magnitude during a transition toggle event between



carry counter; b) Synchronous binary counter; c) Conventional Johnson counter; d) Proposed counter

"01111" and "10000" of all examined counter types. It is clearly evident that the synchronous binary performs worst, while the Johnson clock gated counter offers the best power efficiency.

Due to the gating method, the consumption efficiency increases with an increase in the counter resolution. To verify this statement 2, 4, 6, 8, and 10-bit counters architectures were designed and simulated. The results are shown in table in Fig. 5 a) and a consumption comparison graph in Fig 5 b). The graphs as a function of resolution indicate that power consumption with the clock gating method is kept relatively constant with an increase in resolution. All results were obtained from simulations using the Spectre tool with a supply voltage of 1.5V, typical 1P4M 110nm CMOS process, at room temperature, with typical process parameters with a capacitive load of 50fF at output count nodes.

The proposed 5-bit clock-gated double data rate counter is physically designed with overall sizes  $40\mu m x 55\mu m$ . The compact single cell size is  $18.5\mu m x 4\mu m$ .

# 5. Conclusion

A vastly important conclusion to draw from the presented analysis is that the clock gating method applied to Johnson counting, although improving greatly power consumption, comes at a significant cost of silicon area compared to binary counting. This is probably the largest trade-off of this technique, which makes it limiting in practical applications that require counter resolutions greater than 8-bits.

Regardless of the above, this work proposed a CMOS synchronous counting method that reduces redundant switching power consumption during counting, resulting in energy efficiency. The conducted simulations and system analysis, as well as comparison between well-known four different 5-bit architectures with 1 GHz count speed verifies the design.

The whole system can be a standalone counter or it can take place in a larger system such as time or analog-to-digital converter, sequencer or even in CPU. Proposed counter architecture is implemented as a part of a single slope analog-to-digital converter.

# 6. Acknowledgement

The work was supported by Photolitics. Their cooperation is hereby gratefully acknowledged. The article presents results from Project 2024–EEA-02.

## References

[1] David Money Harris Neil H. E. Weste. CMOS VLSI Design: A Circuits and Systems Perspective (4th Edition). Addison Wesley, 2010.

[2] K. Roy and S.C. Prasad. Circuit activity based logic synthesis for low power reliable operations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1(4):503–513, 1993.

[3] L. Benini and G. de Micheli. State assignment for low power dissipation. In Proceedings of IEEE Custom Integrated Circuits Conference - CICC '94, pages 136–139, 1994.

[4] Adriana Borodzhieva, Ivanka Tsvetkova, and Dimitar Dimitrov. Active learning for teaching "synthesis and analysis of counters" in the course "digital electronics". In 2021 17th Conference on Electrical Machines, Drives and Power Systems (ELMA), pages 1–6, 2021.

[5] Iordan Ivanov Stoev, Adriana Naydenova Borodzhieva, and Valentin Angelov Mutkov. Fpga implementation of johnson counters applied in the educational process. In 2018 IEEE 24th International Symposium for Design and Technology in Electronic Packaging (SIITME), pages 99–103, 2018.

[6] E.G. Friedman. Clock distribution design in vlsi circuits-an overview. In 1993 IEEE International Symposium on Circuits and Systems, pages 1475–1478 vol.3, 1993.

[7] Qing Wu, M. Pedram, and Xunwei Wu. Clock-gating and its application to low power design of sequential circuits. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 47(3):415–420, 2000.

[8] Massoud Pedram. Power minimization in ic design: Principles and applications. ACM Transactions on Design Automation of Electronic Systems, 1, 03 2003.