Silicon Valley Engineer

JS-Blogger-Client: inserted post

This is the body of the blog post. I can include HTML tags.

Labels: Label1, Label2

JS-Blogger-Client: inserted post

This is the body of the blog post. I can include HTML tags.

Labels: Label1, Label2

JS-Blogger-Client: inserted post

This is the body of the blog post. I can include HTML tags.

Labels: Label1, Label2

JS-Blogger-Client: inserted post

This is the body of the blog post. I can include HTML tags.

Labels: Label1, Label2

Reading Process information from FABs

DEVICE CORNERS		SSS	TTT	FFF	TFS	TSF
POLY	STD - 4nm			X
POLY	STD		X		X	X
POLY	STD + 4nm	X
NFET	15%			X
NFET	7.5%				X
NFET	ISAT(STD)		X
NFET	-7.5%					X
NFET	-15%	X
PFET	15%			X
PFET	7.5%					X
PFET	ISAT(STD)		X
PFET	-7.5%				X
PFET	-15%	X

Reading the process table from FAB is simple. I learned this a few days ago from a friend. Here is the trick.

Device Corner specified in the above table are SSS, TTT, FFF, TSF, TFS. Let's take an example of first TFS device corner. In this case T means typical Poly, F means FAST NFET, and S means SLOW PFET.

Let's look at only the POLY Section of the above table.

DEVICE CORNERS		SSS	TTT	FFF	TFS	TSF
POLY	STD - 4nm			X
POLY	STD		X		X	X
POLY	STD + 4nm	X

This indicates that there are three possible Channel LENGTHs . i.e. STD and the other two are either 4nm smaller then the STD or 4nm bigger than the STD.

Now we understand that if the CHANNEL is larger then it reduces leakage whereas if it is smaller then the leakage is more but perfromance is better if channel length is smaller and worst if channel length is larger.

Therefore, for SSS process the first S refers to increased POLY LENGTH i.e STD+4nm and for TFS since the channel length is Typical therefore it has a cross in the STD section.

DEVICE CORNERS		SSS	TTT	FFF	TFS	TSF
NFET	15%			X
NFET	7.5%				X
NFET	ISAT(STD)		X
NFET	-7.5%					X
NFET	-15%	X

Similarly if we look at the NFET portion of the table it is clear that for the SSS process NFETs has 15% less ISAT(Saturation Current) compare to the Typical NFET Saturation Current. Now, the important thing to note here is that for TSF process the NFET are faster compare to the SSS process by 7.5%

Details of the CPF Implementation

CPF specification describes method called PSO(Power shut off) implementation that we use to write the CPF file.
Overall procedure after the RTL is complete is to write the CPF specification file along with the synthesis script. We have to use the Cadence Synthesis tool (RC Compiler version RC72USR1 or above) In the synthesis script we simply do the following
1. Read the RTL
2. Read the CPF(read_cpf)
3. Commit the CPF (commit_cpf)
4. Synthesize (synthesize)
5. Reports and analysis
6. Conformal lec Scripts (write_do_lec) to perfrom the formal verification. This is needed to make sure that the functional Intent of the block after the CPF is applied is still identical.

CPF Syntax
In the CPF we perform the following steps to implement what we have described in the previously.
1. Define the characterization sets that are needed for the three different Power Domains using the command "define_library_set". Since there are three Power Domains therefore I would expect to create atleast three library sets using this command.
a. First set is for the TOP Power domain which is running at Nominal Voltage(1.08V) but it
interacting with Mult4 which is running at 0.81V. Therefore
define_library_set -name set1 -libraries "STANDARD SS + Step Down Level Shifters"
b. Second set is for the LS Power domain which is running at 0.81V and also interacting with
TOP which is running at 1.08V therefore
define_library_set -name set2 -libraries "SSLV libs + Step UP Level Shifters"
c. Third set is for the LOW Power domain with Standard SS libraries and Retention Flops
and Power gating cell library
define_library_set -name set3 -libraries "SS libs + Retention Flop+ Power Gating .libs"

2. Define Power Domains i.e. TOP, LS and LOW using the command create_power_domains. The example of this command is
create_power_domain -name LOW -instances mult_5 -shutoff_condition "pse"

3. After creating the Power Domains, we create the different nominal conditions using the CPF command "create_nominal_condition". There are potentially 4 nominal conditions.
a. "high" means 1.08V (create_nominal_condition -name high -voltage 1.08 -library_set set1)
b. "on" means 1.08V (create_nominal_condition -name on -voltage 1.08 -library_set set3)
c. "level_shift means 0.81V(create_nominal_condition -name level_shift -voltage 0.81 -library_set set2 )
d. "off" means 0v(create_nominal_condition -name off -voltage 0.0)

The difference between high and on nominal condition is that high nominal condition is using the set1 which means it uses the standard low corner library(ss) and the step down level shifters to talk to the LS Domain. "on" nominal condition is using the set3 which means it uses the SS library + Retention flops + Power Gates.

4. After defining the Power domains and the nominal conditions we apply the nominal conditions on the power domains using the command create_power_mode. Based on the above nominal conditions we have atleast two Power modes.
create_power_mode -name shift -domain_conditions {TOP@high LOW@on LS@level_shift}
create_power_mode -name SO -domain_conditions {TOP@high LOW@off LS@level_shift}

5. Last step in the CPF is to make sure that we define all the rules that are needed for the proper implementation of Level Shifters(create_level_shifter_rule), Retention Flops(create_state_retention_rule), Isolation Rules (create_isolation_rule) and Power Switching rules(create_power_switch_rule).

Around the Block(CPF) Versus In the Rows(UPF) Implementation

Now Inorder to implement the mult block defined in the previous session. We have to employ either CPF or UPF. Currently, UPF only support power gating using the "Power Gates in the Rows" approach and CPF only support power gating using "Around the block" approach.

I will first dicuss the flow using the CPF and may be in the Future using the UPF as well.

The picture above is the implementation of this MULT block using CPF(Around the Block) where we have the following three power domains.
1. Power Domain TOP which contains everything except MULT4 and MULT5
2. Power Domain LS(MULT4) which is running at 0.81V only(Upper Right Corner)
3. Power Domain LOW(MULT5) which can Shutdown based on the top level input "pse". (Lower Left Corner)
Notice how MULT4(Power Domain LS) has Separate Power rings around it. This is a separate supply which is less than the Nominal supply of 1.08V in the rest of the block.
Also MULT5(Power Domain LOW) has Power Gates around the block(CPF Implementation).

Implementing A simple 4 bit Multiplier using CPF and UPF

Here is a very simple 4 bit multiplier design in which basically we have implemented five 4 bit multiplier and based on "sopa" and "sopb" signal, one of the multiplier is selected.

Now, we can implement power saving techniques to one or more of the multipliers and see the impact of CPF/UPF during the Synthesis or P&R flow.

Changing Mult4 to run at Lower voltage

In this particular example we have divided the Multipleir first into two power domains i.e. TOP and LS(Level Shifter).

The TOP is synthesized at SS/1.08V/125C whereas we will synthesize the LS power domain at SS/0.81V/125C.

We can see from the picture that Mult4(Power Domain LS) is talking to the TOP power domain. This is only possible if we have Step down Level shifters on the Signal Opa and Opb going into Mult4 and another set of step up Level shifters on prod4 going out of the Mult4 . Since it is 4 bit multiplier therefore we need atleast

Step up Level Shifter = 8
Step Down Level Shifter = 4

Also notice that you need a separate Power Supply fo this Block(Mult4) which is 0.81V.

Changing Mult5 to Shutdown (LOW)

In this part of the design we wanted to Shutdown Mult5 based on the Signal(pse=Power Shutdown enable). This will create another Power Domain(LOW).
During the Shutdown it is required it is also required that we keep the content of the flops retained. Therefore we need the retention flops.
The Shutdown block also requires that we put the required Power gating cells to gate the power based on the signal "pse".

RTL IMPLEMENTATION of the TOP level
module mult (clk, S_opa, S_opb, S_prod, opa, opb, prod, ice, pse, pge);
input clk, ice, pse, pge;
input [2:0] S_opa, S_opb, S_prod;
input [3:0] opa, opb;
output [7:0] prod;
reg [3:0] opa_1, opa_2,opa_3, opa_4, opa_5, opb_1, opb_2, opb_3, opb_4, opb_5;
reg [7:0] prod, prod_1, prod_2, prod_3, prod_4, prod_5;

multiplier mult_1 (.clk(clk) , .opa(opa_1), .opb(opb_1), .prod(prod_1));
multiplier mult_2 (.clk(clk) , .opa(opb_2), .opb(opb_2), .prod(prod_2));
multiplier mult_3 (.clk(clk) , .opa(opa_3), .opb(opb_3), .prod(prod_3));
multiplier mult_4 (.clk(clk) , .opa(opb_4), .opb(opb_4), .prod(prod_4));
multiplier mult_5 (.clk(clk) , .opa(opa_5), .opb(opb_5), .prod(prod_5));
demux15 demux15_1 (.out1(opa_1), .out2(opa_2), .out3(opa_3), .out4(opa_4), .out5(opa_5), .in(opa), .cntrl (S_opa));
demux15 demux15_2 (.out1(opb_1), .out2(opb_2), .out3(opb_3), .out4(opb_4), .out5(opb_5), .in(opb), .cntrl (S_opb));
mux51 mux51 (.out(prod), .in1(prod_1), .in2(prod_2), .in3(prod_3), .in4(prod_4), .in5(prod_5), .cntrl (S_prod));
endmodule

CPF and UPF standards from Candence and Synopsys

Controlling the Power Dissipation in the digital ASICs with the shrinking geometries is the key to success for most of the Microprocessor and Wireless companies. Whether you are INTEL or a small startup in the wireless industry, Power is the main concern.
Total power is a function of switching activity, capacitance, voltage and the type of Transistor. We can also say that
Total Power = Dynamic Power + Leakage
Where Dynamic Power = Switching Power + Short Circuit(sc) Power =CfV^2 + V(sc)I(sc) f
Leakage Power = Sub Threshold Current + Gate Induced Drain leakage{GIDL} + Gate Oxide Leakage + Diode Reverse Bias current

Dynamic Power is reduced by reducing switching in general which means techniques like DVFS(Dynamic Voltage Frequency Scalling) or Power/Clock gating or shutoff whereas leakage is reduced by the techniques like back biasing or multithreshold libraries.

CPF(Common Power Format) from cadence and UPF(Universal Power Format) from Synopsys are two different methods of attacking the above solutions. The common theme behind the two formats is to add Power Constraint to the RTL logic similar to the timing constraint(SDC) these guys have added in the late 90's and in the first decade of 21st century. Now, it is very common to put the timing constraint separate from the RTL logic in a different file. The generally accepted format of this file is called SDC(Synopsys Design Constraint) file.

You can get the CPF and UPF standards from the Cadence and Sysnopsys web sites. These are publically available standards.

Library vendors also have to increase the number of cells and the required characterized library needed to implement these new standards.

The cells that are needed in general are Retnetion Flops, Level Shifters, AlwaysOn and Power Gating cells.
Characterization Requirement
Normally, the logic libraries supplied by the vendors for TSMC 65nm LP process contain the following Chracterized libraries.

Corners	Voltage	Temperature	Comments
SSLT	1.08v	-40C	Temperature Inversion
SS	1.08v	125C	Normal SS Corner
TT	1.2v	25C	Normal TT Corner
FF	1.32v	-40C	Normal FF Corner
FFHTHV	1.32v	125C	Leakage Corner

Now suppose the library vendors introduces another voltage domain i.e 0.81V then additional standard characterized corners required will be

Corners	Voltage	Temperature	Comments
SSLT	1.08v	-40C	Temperature Inversion
SS	1.08v	125C	Normal SS Corner
SSLTLV	0.81v	-40C	Temp Inv Low V
SSLV	0.81v	125C	Low V Corner
TT	1.2v	25C	Normal TT Corner
FF	1.32v	-40C	Normal FF Corner
FFHTHV	1.32v	125C	Leakage Corner

All the cells provided by the library vendor needs to be characterized for atleast these corners except for the level shifters. Level shifter normally work between two domains therefore we need additional characterization points for them.

Corners	Voltage Range	Temperature	Comments
SSLV	0.81 to 0.81	125C	SSLV_0.81_to_0.81
	0.81 to 1.08	125C	SSLV_0.81_to_1.08
SSLTLV	0.81v to 0.81V	-40C	SSLTLV_0.81_to_0.81
	0.81v to 1.08V	-40C	SSLTLV_0.81_1.08
SS	1.08v to 0.81v	125C	SS_1.08_to_0.81
	1.08v to 1.08v	125C	SS_1.08_to_1.08
SSLT	1.08v to 0.864	-40C	SSLT_1.08_to_0.81
	1.08v to 1.08	-40C	SSLT_1.08_to_1.08
TT	1.2v to 1.2v	25C	TT_1.2_to_1.2
FF	1.32v to 1.32	-40C	FF_1.32_to_1.32

Chap 1. Introduction to benchmarking Logic libraries

I have been involved in ASIC support and benchmarking Digital designs for the last 8-10 years. The dilemma of bench marking is that the vendors always try to pick the design which is best suited to meet their requirements whereas an unbiased end customer, who is looking for a fair evaluation, is trying to prove that the product he is buying will meet his end.

Evaluation is comprised of various types which may include
1. Evaluate the name of the vendor in the industry
2. Evaluate the product and determine the fair price
3. Evaluate the product by the face value or by the marketing collateral.

Looking at all this I decided to pursue a Neutral path for the evaluation of Standard cells and memory which is the main product line of Virage Logic where I am currently employed.

Here is what is needed to go this route.
1. Download a fairly comprehensive Digital Design from the opencores.org site that can meet the needs of most of the customers.
2. After downloading the first step is to make sure that the Verilog RTL and the vectors that are given with the design are correctly setup.
3. Setup Virage Std. Cell and Memory compilers for the TSMC 0.18um and also provide hooks for the other vendors as well.
4. Replace all the existing memories with the Virage memories and keep the hooks for the other vendor libraries.
5. Simulate the design using the simulation vectors provided with the design to make sure the memories are inserted correctly
6. Setup Synopsys design compiler and perform the synthesis of the design with Virage Logic library.
7. Setup Place and Route(P&R) tools and then perform the P&R.
8. During this whole exercise the documentation should be very comprehensive and all the hooks should be correctly provided in the simulation, Synthesis and P&R scripts so that any body can replace them as needed.
9. Create a web site or a blog and upload the complete design with the documentation.

In the coming chapters we will cover the details of the selected design and comprehensive explanation of the scripts used during the process

Interesting web sites

My urdu web site - www.geocities.com/sohailabbas4
Jang News paper - www.jang.com.pk

Introduction

My name is Sohail Syed and in the picture I have my two sons with me.

Silicon Valley Engineer

About Me

Sunday, October 31, 2010

JS-Blogger-Client: inserted post

JS-Blogger-Client: inserted post

JS-Blogger-Client: inserted post

JS-Blogger-Client: inserted post

Thursday, June 05, 2008

Reading Process information from FABs

Wednesday, May 21, 2008

Details of the CPF Implementation

Around the Block(CPF) Versus In the Rows(UPF) Implementation

Implementing A simple 4 bit Multiplier using CPF and UPF

CPF and UPF standards from Candence and Synopsys

Sunday, February 05, 2006

Chap 1. Introduction to benchmarking Logic libraries

Thursday, March 31, 2005

Interesting web sites

Introduction