Reference Book : Advanced FPGA Design by Steve Kilts
這篇文章僅是記錄對我來說重要的部分,細節請參閱參考書籍, 這本書真的是本好書!
決定FPGA的速度有三大因素 throughput, latency, timing
Throughtput : 每秒可以處理的資料量(bits per second)
Latency : 輸入資料與輸出處理過後的資料之間的時間(time or clock cycle)
Timing : sequential element之間的logic delay (clock period or frequency), 如果設計沒有”meet
timing” 表示critical path 大於clock period
要達到high throughput的秘訣就是pipeline,書中舉例了一個For迴圈的例子
C語言(example from reference book)
XPower = 1;
For( i = 0; i < 3 ; i++)
XPower = X*XPower
通常會寫成下面的RTL,下面跟書中相同,我改了一下assign跟加了一下begin end
// reference from : Advanced FPGA Design by Steve Kilts
module power3(
input [7:0] X,
input clk,start,
output [7:0] XPower,
output finished);
reg [7:0] ncount;
reg [7:0] XPower;
assign finished = (ncount == 0)? 1:0;
always@(posedge clk)
begin
if(start)
begin
XPower <= X;
ncount <= 2;
end
else if(!finished) begin
begin
ncount <= ncount - 1;
XPower <= XPower*X;
end
end
end module;
上面這個例子
Throughput : (8/3,8 bits 等3個clock
cycle結果才出來)
Latenc : 3 clocks,
Timing : 一個乘法器的delay
如果是以pipeline的方式來寫
// reference from : Advanced FPGA Design by Steve Kilts
module power3(
input [7:0] X,
input clk,
output [7:0] XPower);
reg [7:0] XPower1, XPower2;
reg [7:0] X1,X2;
always@(posedge clk)
begin
// Pipeline stage 1
X1 <= X;
XPower1 <= X;
// Pipeline stage 2
X2 <= X1;
XPower2 <= XPower1*X1;
// Pipeline stage 3
XPower <= XPower1*X1;
end
end module;
Throughput : (8/1),每個clock cycle都會有資料出來
Latenc : 3
clocks,
Timing : 一個乘法器的delay
當然pipeline也有缺點就是他需要多使用一個register與乘法器
結論,要增大設計的throughput,須把迴圈寫成pipeline,但會增加設計的面積
沒有留言:
張貼留言