Published on April 17th, 2012

Simultaneous Control of In-rush Current and Dynamic IR Drop in MTCMOS Designs Using Hamiltonian-cycle Routing

Multi-threshold CMOS (MTCMOS) is currently the most popular methodology for implementing a power gating design that reduces standby leakage power by turning off inactive circuit domains. However, power-gated domains may suddenly surge to a high current active mode while in the cycle. This article covers, among other topics, why designers should sequentially turn on all on-chip power switches to mitigate peak in-rush current. The article proposes an efficient power switch routing framework, using a feasible Hamiltonian-cycle routing among power switches to minimize in-rush current and dynamic IR drop simultaneously.

I. DESIGN CHALLENGE AND PROBLEM FORMULATION

The traditional TSP Solver is incapable of Power-switch Routing
Several TSP solvers have been developed to find a Hamiltonian path with minimal length. However, current public TSP solvers (such as Concorde, GOBLIN, or LKH) are all performed based on a complete graph where any two nodes are connected to each other and a Hamiltonian path can always be easily found. Due to Manhattan-distance constraint, the connection graph of power-switch switch routing is not complete and hence the existing TSP solver cannot be directly applied to solve the power-switch switch routing.

In order to solve the power-switch routing with a general TSP solver, we try the following method based on a modified complete graph. First, we assign each edge's weight as the distance between the two switches of the edge if this distance is smaller than the given Manhattan-distance constraint. Then, to further lead the TSP result to satisfy the Manhattan-distance constrains, , we assign an excessively large weight to each edge when the distance between two switches exceeds the constraint. In our experiment, this excessively large weight is set to the summation of the distance between any two switches. Therefore, when a TSP solver tries to minimize the path's length, those constraint-violated edges can be avoided if the TSP algorithm is optimal. Unfortunately, the TSP solvers cannot lead to a Hamiltonian path without going through a constraint-violated edge. We will show the experimental results later.

A. Problem Formulation of the Proposed Framework
Power-switch routing is performed after the power-switch allocation is completed. The physical location of each power switch and each hard macro can be determined from the design data base through the Tcl interface of an APR tool. The proposed framework will try to find a Hamiltonian-cycle route covering all power switches without violating the Manhattan-distance constraint. We also try to avoid the connections that travel across a hard macro since the routing resource on top of the hard macro is limited and its resulting wire length may be longer than expected although its Manhattan distance is under the constraint. The complete problem formulation of the proposed switch-routing framework is summarized as follows.

Fig. 1. (a) Cyclic and (b) acyclic power-switch tour.

II. THE PROPOSED FRAMEWORK

Cyclic and Acyclic Power-Switch Tour
We classify the power-switch tours into cyclic tours and acyclic tours as shown in Figure 1. A cyclic power-switch tour is a sequence of power switches that are visited in order not to violate Manhattan distance constraint and ends at its starting switch. On the other hand, an acyclic power-switch tour ends at a power switch other than its starting switch. In our proposed framework, two cyclic tours can merge into a larger cyclic tour, while an acyclic tour cannot be merged with another cyclic tour.

Fig. 2. The overall algorithm of the proposed switch-routing framework

A. Overall Flow
Basically, our switch-routing framework applies a divide-and-conquer algorithm to recursively merge smaller cyclic sub-tours of power switches into a larger sub-tour until forming one cyclic tour visiting all power switches. Therefore, the most critical task is to turn the given power switches into several non-overlapping cyclic sub-tours, that can then be merged into one. Figure 2 illustrates the overall flow of the proposed switch-routing framework.

Fig. 3. (a) Around-macro (b) column-based group.

B. Group Power Switches
Two types of power-switch groups, around-macro groups and column-based groups as shown in Figure 3, are used in the proposed algorithm. Each hard macro has its own around-macro group. The power switches placed around a hard macro is assigned to its around-macro group. Note that one power switch may be assigned to two around-macro groups simultaneously since it may be located in between two hard macros. Next, the rest of the power switches are divided into different column-based groups dependent upon its column index in the position matrix. The power switches in a column-based group will not overlap with the power switches in any other group.

Fig. 4. An example of splitting two overlapping groups.

C. Split Overlapping Switch Groups
The power switches in two around-macro groups may overlap. Such a scenario happens when one row (or column) of switches are embedded in between two hard macros. The objective of this step is to separate those overlapped switches into the two groups by interweaving the switches along the X-axis (or Y-axis). Figure 4 illustrates an example of splitting two overlapping switch groups into two non-overlapping ones.

Fig. 5. Touring switches in an around-macro group.

Tour Switch inside a Group
Tour Around-Macro Groups
We first try to tour all the switches in an around-macro group clockwise starting from the bottom left as shown in Figure 5. If no two adjacent switches exceed the Manhattan-distance constraint, we can form a cyclic sub-tour. Otherwise, the first sub-tour ends at one end of the violating edge, and we start the next sub-tour at the other end of the violating edge. Note that these two sub-tours are both acyclic. We repeat the above process until all switches are visited.

Fig. 6. Touring switches of combined column-based groups.

E.2 Tour Column-based Groups
When touring the column-based groups, we first combine as many adjacent column-based groups as possible and form an acyclic sub-tour as shown in Figure 6(a). If one end switch of a column, s1, cannot be directly connected to the end switch of its adjacent column, s2, we will first connect the s1 to a relay switch, sr between s1 and s2, and then connect sr to s2, as highlighted by the dash line in Figure 6(a). We will repeat this process until the relay switch can be connected to s2. Next, we will create a path starting from the end switch of the acyclic sub-tour back to the start switch while removing any corresponding connection of the original acyclic sub-tour. An example of creating such a cyclic sub-tour is illustrated in Figure 6(b). Lastly, for each remaining dangling column-based groups, we tour the switches from bottom to top to form an acyclic sub-tour.

Fig. 7. Using interlacing to transform an acyclic (a) column-based sub-tour or (b) around-macro sub-tour into a cyclic sub-tour

Transform Acyclic Sub-tour into Cyclic Ones
For each acyclic sub-tour, we interlace the order of its switches with a cyclic one. Figure 7(a) and (b) illustrate how to interlace the switch order to transform an acyclic column-based sub-tour and around-macro sub-tour into a cyclic one.

Fig. 8. The algorithm of unitization.

D. Unitization
Figure 8 illustrates the unitization algorithm. First, we schedule the merging list for around-macro sub-tours and column-based sub-tours, respectively, based on the locations of sub-tours' starting point (from lower left to upper right by default). Then we append the merging list of the column-based macros to the end of the merging list of the around-macro sub-tours. Next, we iteratively use the first available sub-tour in the merging list as the current discard sub-tour. We then attempt to identify a pair of adjacent switches, called the connecting switches, inside the discard sub-tour which can satisfy the following two conditions: (1) there exist a pair of connecting switches in another nearby sub-tour, denoted as the augmented sub-tour where either connecting switch in the discard sub-tour can correspond to a connecting switch in the augmented sub-tour within the search range, and (2) the Manhattan distance between the corresponding connecting switches in the two sub-tours is minimum. Once a pair of feasible connecting switches can be found in the discard sub-tour, we can merge it into the corresponding augmented sub-tour by reconnecting the connecting switches as shown in Figure 9. Finally, we remove the discard sub-tour from the merging list and repeat the above process for the next available sub-tour in the merging list until all the sub-tours are tried.

If more than one sub tour is left in the merging list after this iteration, we will increase the searching range and repeat the above iteration until only one sub-tour is left. The remaining sub-tour is the final Hamiltonian-cycle routing for the power switches.

Fig. 9. An example of a merging operation.

E. Merging Operation
Figure 9 illustrates how a merging operation in the proposed algorithm can merge the discard sub-tour into the augmented sub-tour. Note that the modified augmented sub-tour is still cyclic after the merging operation.

Fig. 10. Power-up sequence can be formulated as a 2-D weighting table.

Power Switch Reordering for Dynamic IR Prevention
Different power up sequence may result in different dynamic IR drops. The power-up sequence problem can be formulated as a function subject to the locations of voltage source, sleep enable signal, power-gated domain and the related active domains (denoted as the IR degrading, as shown in Fig. 10). Power switches are clustered, weighed and assembled with a heuristic to prevent the dynamic IR (during placement stage). A rule of thumb is to place the sleep enable signal of a power-gated domain near the voltage sources and far away from the active domains.

III. Experimental Results

The experiments in this section were conducted based on three power-gated domains in a 4G-application MTCMOS design which were implemented using a TSMC 40nm low-power cell library. The Manhattan-distance constraint is set to 80μm. First, Table 1 shows the experimental result after applying the proposed switch-routing framework. In Table 1, Column 5 and 6 list the wire length of the resulting switch routing and the total wire length of the domain reported by Encounter. The result in Table 1 shows that the proposed switch-routing framework can always generate a feasible Hamiltonian-cycle routing without violating the Manhattan-distance constraint for all three cases. Note that the wire length of the switch routing is at most 0.8% of the total wire length in a domain, showing that minimizing the wire length of switch routing is indeed a secondary issue when we designing a power-gated domain.

Next, we attempted to solve the switch-routing problem by applying the state-of-the-art (Concorde) TSP solver based on a modified complete graph as described in Section I-B to solve the switch-routing problem and check whether this advanced TSP solver can generate a feasible Hamiltonian-cycle without going through any constraint-violated edge. The TSP solver can always iteratively fine-tune an existing solution to obtain a better solution. However, this TSP solver requires a two-dimensional matrix to store all edges' weight of the complete graph.

Table 2 compares the maximum and total Manhattan-distance of the resulting Hamiltonian-cycle, the number of constraint-violated edges, and the runtime between the proposed framework and the TSP solver. For Case 1 and Case 3, the TSP solver cannot generate any result since building the corresponding data structure to store all edge's weight will run out of memory at a workstation with 32GB main memory and 16 AMD64 cores. For Case 2, the resulting Hamiltonian cycle reported by the TSP solver still contains few edges violating the constraint after running 100 iterations. Its runtime is 7243 seconds, which is 2796 times of the runtime consumed by the proposed framework. Even though we keep on running more iteration for over one day, the three constraint-violated edges still cannot be removed. This result demonstrates that a feasible Hamiltonian-cycle routing can be efficiently and effectively obtained by applying the proposed switch-routing framework, and cannot be easily obtained by applying a general TSP solver.

A. Problem Formulation of the Proposed Framework

Power-switch routing is performed after the power-switch allocation is completed. The physical location of each power switch and each hard macro can be determined from the design data base through the Tcl interface of an APR tool. The proposed framework will try to find a Hamiltonian-cycle route covering all power switches without violating the Manhattan-distance constraint. We also try to avoid the connections that travel across a hard macro since the routing resource on top of the hard macro is limited and its resulting wire length may be longer than expected although its Manhattan distance is under the constraint. The complete problem formulation of the proposed switch-routing framework is summarized as follows.

III. Experimental Results

The experiments in this section were conducted based on three power-gated domains in a 4G-application MTCMOS design which were implemented using a TSMC 40nm low-power cell library. The Manhattan-distance constraint is set to 80μm. First, Table 1 shows the experimental result after applying the proposed switch-routing framework. In Table 1, Column 5 and 6 list the wire length of the resulting switch routing and the total wire length of the domain reported by Encounter. The result in Table 1 shows that the proposed switch-routing framework can always generate a feasible Hamiltonian-cycle routing without violating the Manhattan-distance constraint for all three cases. Note that the wire length of the switch routing is at most 0.8% of the total wire length in a domain, showing that minimizing the wire length of switch routing is indeed a secondary issue when we designing a power-gated domain.

Next, we attempted to solve the switch-routing problem by applying the state-of-the-art (Concorde) TSP solver based on a modified complete graph as described in Section I-B to solve the switch-routing problem and check whether this advanced TSP solver can generate a feasible Hamiltonian-cycle without going through any constraint-violated edge. The TSP solver can always iteratively fine-tune an existing solution to obtain a better solution. However, this TSP solver requires a two-dimensional matrix to store all edges' weight of the complete graph.

Table 2 compares the maximum and total Manhattan-distance of the resulting Hamiltonian-cycle, the number of constraint-violated edges, and the runtime between the proposed framework and the TSP solver. For Case 1 and Case 3, the TSP solver cannot generate any result since building the corresponding data structure to store all edge's weight will run out of memory at a workstation with 32GB main memory and 16 AMD64 cores. For Case 2, the resulting Hamiltonian cycle reported by the TSP solver still contains few edges violating the constraint after running 100 iterations. Its runtime is 7243 seconds, which is 2796 times of the runtime consumed by the proposed framework. Even though we keep on running more iteration for over one day, the three constraint-violated edges still cannot be removed. This result demonstrates that a feasible Hamiltonian-cycle routing can be efficiently and effectively obtained by applying the proposed switch-routing framework, and cannot be easily obtained by applying a general TSP solver.

Shi-Hao Chen received the B.S. and M.S. degrees in electronic engineering from Chung Yuan Christian University, Chung-Li, Taiwan, R.O.C., in 1996 and 1998, respectively. He is currently pursuing the Ph.D. degree in computer science at National Tsing Hua University, Hsinchu, Taiwan, R.O.C.

He is currently a Deputy Director of Design Service Division, Global Unichip Corporation (GUC), The Flexible ASIC LeaderTM, Hsinchu, Taiwan. His research interests include design flow automation, physical design, and low power design methodologies.

Yi-Ming Wang received the B.S. degree in electronics engineering from National Chiao-Tung University, Hsinchu, Taiwan, R.O.C., in 2010.

He is currently pursuing the M.S. degree in electronics engineering from National Chiao-Tung University, Hsinchu, Taiwan, R.O.C. His research interests include power-gating methodology, power-switch routing, physical design automation, and digital testing.



Tech Videos

©2018 Extension Media. All Rights Reserved. PRIVACY POLICY | TERMS AND CONDITIONS

Extension Media websites place cookies on your device to give you the best user experience. By using our websites, you agree to placement of these cookies and to our Privacy Policy. Please click here to accept.