2016年1月6日 星期三

FPGA 速度架構筆記(Timing)

Reference Book : Advanced FPGA Design by Steve Kilts
這篇文章僅是記錄對我來說重要的部分,細節請參閱參考書籍, 這本書真的是本好書!

決定FPGA的速度有三大因素 throughput, latency, timing
Throughtput :  每秒可以處理的資料量(bits per second)
Latency :  輸入資料與輸出處理過後的資料之間的時間(time or clock cycle)
Timing :  sequential element之間的logic delay (clock period or frequency), 如果設計沒有”meet  
                 timing” 表示critical path 大於clock period


系統中兩個sequential element中的最大延遲會決定系統的max speed

Tclk-q is time from clock arrival until data arrives at Q;
Tlogic is propagation delay through logic between flip-flops;
Trouting is routing delay between flip-flops;
Tsetup is minimum time datamust arrive at D before the next rising edge of clock (setup time);
Tskew is propagation delay of clock between the launch flip-flop and the capture flip-flop.

要提高max speed有五種方法可以使用(Add Register Layers, Parallel Structures, Flatten Logic Structure, Register Balancing, Reorder Path)


Add Register Layers
critical path分段拆成幾個小path,但要先確定增加的clock cycle不會影響design specifications functionality
例如Y <= A* X+B* X1+C* X2; 拆成
prod1 <= A * X;
prod2 <= B * X1;
prod3 <= C * X2;
Y <= prod1 + prod2 + prod3;


Parallel Structures
這方法是把連續的邏輯平行處理,譬如一個8 bits的乘法器.可以拆成兩個四位元的來同時處理,再把結果合併起來


例子比較龐大,在書中第九頁,這個方法可以降低path delay


Flatten Logic Structures
這個方法跟Parallel Structures類似,但是用在priority encoding上面,例如下面的例子, synthesis and layout tools are smart enough to duplicate logic to reduce fanout, but they are not smart enough to break up logic structures that are coded in a serial fashion


// reference from : Advanced FPGA Design by Steve Kilts

module regwrite(
 input [3:0] ctrl,
 input clk,in,
 output reg [3:0] rout);
 
always@(posedge clk)
 if(ctrl[0]) rout[0] <= in;
 else if(ctrl[1]) rout[1] <= in;
 else if(ctrl[2]) rout[2] <= in;
 else if(ctrl[3]) rout[3] <= in;

end module



這樣寫法系統會自動合成address decoder,每個訊號都是互斥或,會增加path delay
所以作者推薦把 if else 打散,把條件平等化,依照順序寫也可做到priority control的效果,且可以減少path delay,因為每個訊號不互相影響


// reference from : Advanced FPGA Design by Steve Kilts
module regwrite(
 input [3:0] ctrl,
 input clk,in,
 output reg [3:0] rout);
 
always@(posedge clk)
 if(ctrl[0]) rout[0] <= in;
 if(ctrl[1]) rout[1] <= in;
 if(ctrl[2]) rout[2] <= in;
 if(ctrl[3]) rout[3] <= in;

end module


Register Balancing
這方法用來縮小兩個reg之間的最大延遲
例如下列這樣的寫法,critical path會存在Sum <= rA + rB + rC;
rA <= A;
rB <= B;
rC <= C;
Sum <= rA + rB + rC;
因此改成下面balance的寫法可以縮小critical path delay
rABSum <= A + B;
rC <= C;
Sum <= rABSum + rC;


Reorder Paths
若有數個pathcritical path連在一起,可將這些path重新組合,critical path接近destination register
例如下面例子


// reference from : Advanced FPGA Design by Steve Kilts

module randomlogic(
 input [7:0] A,B,C,
 input clk,
 input Cond1, Cond2,
 output reg [7:0] Out);
 
always@(posedge clk)
begin
 if(Cond1)
  Out <= A;
 else if(Cond2 && (C < 8))
  Out <= B;
 else
  Out <= C;
end 
end module

假設cout之間為critical path,C需要越過兩個gate來到mux(if else)
因此若把程式重新排列如下,就可以減少一級,見書中15頁的圖會更清楚


// reference from : Advanced FPGA Design by Steve Kilts
module randomlogic(
 input [7:0] A,B,C,
 input clk,
 input Cond1, Cond2,
 output reg [7:0] Out);
 
wire CondB = (Cond2 & !Cond1)

always@(posedge clk)
begin
 if(CondB && (C < 8))
  Out <= B;
 else if(Cond1)
  Out <= B;
 else
  Out <= C;
end 
end module


結論就是要把較複雜的比較式寫在前面,這樣可以減少critical path delay

沒有留言:

張貼留言