前面,我们介绍过对裸机程序进行RTL仿真,那些裸机程序规模比较小,只有几KB大小。
另外,我们也已经实现了针对O_board的SoC进行了RTL仿真(http://blog.csdn.net/rill_zhen/article/details/21190757),本小节,我们将实现在ML501平台上对linux进行RTL仿真。
针对ML501的ORPSoC工程中,默认配置的DDR2的仿真模型与实际板子上使用的DDR2 SDRAM的参数不一致,我们要进行修改。
要想对DDR2 SDRAM的仿真模型进行修改,我们首先要弄明白几个概念。
RANK,BANK,row,,column。这几个都是逻辑上的概念。
此外还有channel,module,chip,device等物理上的概念。
对于ML501使用的DDR2 SDRAM来说,其具体参数如下所示:
通过查看内存条,我们可以看到如下内容:MT4HTF3264HY-667F1 1RX16 256MB PC-5300S,
其中3263是指内存条的organization:32Megx64,x64表示整个内存条的数据线(DQ)宽度是64bit。
667表示内存条的speed grade。PC-5300也是speed grade。
1RX16表示内存条上面的4个device,每个数据宽度是16,16X4正好是64bit。
256MB,毫无疑问,表示内存条的容量是256M bytes。
通过内存条上面的标示,我们就可以获得很多信息,此外,通过查看其数据手册,我们会得到更详细的参数:
RANK:是single rank。
BANK:BA是2bit,说明bank数量是4,每个bank的大小是256MB/4=64MB。
row:宽度是[12:0],一共13bit。
column:宽度是[9:0],一共10bit。
确定了我们实际使用的内存条的参数之后,我们就可以修改仿真模型的具体参数了。
需要注意的是ddr2_model.v只是一个timing model,具体的storage,需要我们自己根据实际情况来定。
这里需要修改的是MEM_BITS,由于ddr2_model.v是一个device的仿真模型,每个device中包含4个四分之一的bank,共64MB,所以对于如下定义:
// Memory Storage
`ifdef MAX_MEM
reg [BL_MAX*DQ_BITS-1:0] memory [0:`MAX_SIZE-1];
`else// [8 * 16 -1:0] [0:(1<<22) -1]==>26bit==>64MB
reg [BL_MAX*DQ_BITS-1:0] memory [0:`MEM_SIZE-1];
reg [`MAX_BITS-1:0] address [0:`MEM_SIZE-1];
reg [MEM_BITS:0] memory_index;
reg [MEM_BITS:0] memory_used;
`endif完整的参数,如下所示:
/****************************************************************************************
*
* Disclaimer This software code and all associated documentation, comments or other
* of Warranty: information (collectively "Software") is provided "AS IS" without
* warranty of any kind. MICRON TECHNOLOGY, INC. ("MTI") EXPRESSLY
* DISCLAIMS ALL WARRANTIES EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
* TO, NONINFRINGEMENT OF THIRD PARTY RIGHTS, AND ANY IMPLIED WARRANTIES
* OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. MTI DOES NOT
* WARRANT THAT THE SOFTWARE WILL MEET YOUR REQUIREMENTS, OR THAT THE
* OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED OR ERROR-FREE.
* FURTHERMORE, MTI DOES NOT MAKE ANY REPRESENTATIONS REGARDING THE USE OR
* THE RESULTS OF THE USE OF THE SOFTWARE IN TERMS OF ITS CORRECTNESS,
* ACCURACY, RELIABILITY, OR OTHERWISE. THE ENTIRE RISK ARISING OUT OF USE
* OR PERFORMANCE OF THE SOFTWARE REMAINS WITH YOU. IN NO EVENT SHALL MTI,
* ITS AFFILIATED COMPANIES OR THEIR SUPPLIERS BE LIABLE FOR ANY DIRECT,
* INDIRECT, CONSEQUENTIAL, INCIDENTAL, OR SPECIAL DAMAGES (INCLUDING,
* WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION,
* OR LOSS OF INFORMATION) ARISING OUT OF YOUR USE OF OR INABILITY TO USE
* THE SOFTWARE, EVEN IF MTI HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
* DAMAGES. Because some jurisdictions prohibit the exclusion or
* limitation of liability for consequential or incidental damages, the
* above limitation may not apply to you.
*
* Copyright 2003 Micron Technology, Inc. All rights reserved.
*
****************************************************************************************/
// Parameters current with 512Mb datasheet rev N
// Timing parameters based on Speed Grade
// SYMBOL UNITS DESCRIPTION
`define sg37E
`define x16
//`define MAX_MEM
`ifdef sg37E
parameter TCK_MIN = 3750; // tCK ps Minimum Clock Cycle Time
parameter TJIT_PER = 125; // tJIT(per) ps Period JItter
parameter TJIT_DUTY = 125; // tJIT(duty) ps Half Period Jitter
parameter TJIT_CC = 250; // tJIT(cc) ps Cycle to Cycle jitter
parameter TERR_2PER = 175; // tERR(nper) ps Accumulated Error (2-cycle)
parameter TERR_3PER = 225; // tERR(nper) ps Accumulated Error (3-cycle)
parameter TERR_4PER = 250; // tERR(nper) ps Accumulated Error (4-cycle)
parameter TERR_5PER = 250; // tERR(nper) ps Accumulated Error (5-cycle)
parameter TERR_N1PER = 350; // tERR(nper) ps Accumulated Error (6-10-cycle)
parameter TERR_N2PER = 450; // tERR(nper) ps Accumulated Error (11-50-cycle)
parameter TQHS = 400; // tQHS ps Data hold skew factor
parameter TAC = 500; // tAC ps DQ output access time from CK/CK#
parameter TDS = 100; // tDS ps DQ and DM input setup time relative to DQS
parameter TDH = 225; // tDH ps DQ and DM input hold time relative to DQS
parameter TDQSCK = 450; // tDQSCK ps DQS output access time from CK/CK#
parameter TDQSQ = 300; // tDQSQ ps DQS-DQ skew, DQS to last DQ valid, per group, per access
parameter TIS = 250; // tIS ps Input Setup Time
parameter TIH = 375; // tIH ps Input Hold Time
parameter TRC = 55000; // tRC ps Active to Active/Auto Refresh command time
parameter TRCD = 15000; // tRCD ps Active to Read/Write command time
parameter TWTR = 7500; // tWTR ps Write to Read command delay
parameter TRP = 15000; // tRP ps Precharge command period
parameter TRPA = 15000; // tRPA ps Precharge All period
parameter TXARDS = 6; // tXARDS tCK Exit low power active power down to a read command
parameter TXARD = 2; // tXARD tCK Exit active power down to a read command
parameter TXP = 2; // tXP tCK Exit power down to a non-read command
parameter TANPD = 3; // tANPD tCK ODT to power-down entry latency
parameter TAXPD = 8; // tAXPD tCK ODT power-down exit latency
parameter CL_TIME = 15000; // CL ps Minimum CAS Latency
`endif // ------ ----- -----------
`ifdef x16
`ifdef sg37E
parameter TFAW = 50000; // tFAW ps Four Bank Activate window
`endif
`endif
// Timing Parameters
// Mode Register
parameter AL_MIN = 0; // AL tCK Minimum Additive Latency
parameter AL_MAX = 6; // AL tCK Maximum Additive Latency
parameter CL_MIN = 3; // CL tCK Minimum CAS Latency
parameter CL_MAX = 7; // CL tCK Maximum CAS Latency
parameter WR_MIN = 2; // WR tCK Minimum Write Recovery
parameter WR_MAX = 8; // WR tCK Maximum Write Recovery
parameter BL_MIN = 4; // BL tCK Minimum Burst Length
parameter BL_MAX = 8; // BL tCK Minimum Burst Length
// Clock
parameter TCK_MAX = 8000; // tCK ps Maximum Clock Cycle Time
parameter TCH_MIN = 0.48; // tCH tCK Minimum Clock High-Level Pulse Width
parameter TCH_MAX = 0.52; // tCH tCK Maximum Clock High-Level Pulse Width
parameter TCL_MIN = 0.48; // tCL tCK Minimum Clock Low-Level Pulse Width
parameter TCL_MAX = 0.52; // tCL tCK Maximum Clock Low-Level Pulse Width
// Data
parameter TLZ = TAC; // tLZ ps Data-out low-impedance window from CK/CK#
parameter THZ = TAC; // tHZ ps Data-out high impedance window from CK/CK#
parameter TDIPW = 0.35; // tDIPW tCK DQ and DM input Pulse Width
// Data Strobe
parameter TDQSH = 0.35; // tDQSH tCK DQS input High Pulse Width
parameter TDQSL = 0.35; // tDQSL tCK DQS input Low Pulse Width
parameter TDSS = 0.20; // tDSS tCK DQS falling edge to CLK rising (setup time)
parameter TDSH = 0.20; // tDSH tCK DQS falling edge from CLK rising (hold time)
parameter TWPRE = 0.35; // tWPRE tCK DQS Write Preamble
parameter TWPST = 0.40; // tWPST tCK DQS Write Postamble
parameter TDQSS = 0.25; // tDQSS tCK Rising clock edge to DQS/DQS# latching transition
// Command and Address
parameter TIPW = 0.6; // tIPW tCK Control and Address input Pulse Width
parameter TCCD = 2; // tCCD tCK Cas to Cas command delay
parameter TRAS_MIN = 40000; // tRAS ps Minimum Active to Precharge command time
parameter TRAS_MAX =70000000; // tRAS ps Maximum Active to Precharge command time
parameter TRTP = 7500; // tRTP ps Read to Precharge command delay
parameter TWR = 15000; // tWR ps Write recovery time
parameter TMRD = 2; // tMRD tCK Load Mode Register command cycle time
parameter TDLLK = 200; // tDLLK tCK DLL locking time
// Refresh
parameter TRFC_MIN = 105000; // tRFC ps Refresh to Refresh Command interval minimum value
parameter TRFC_MAX =70000000; // tRFC ps Refresh to Refresh Command Interval maximum value
// Self Refresh
parameter TXSNR = TRFC_MIN + 10000; // tXSNR ps Exit self refesh to a non-read command
parameter TXSRD = 200; // tXSRD tCK Exit self refresh to a read command
parameter TISXR = TIS; // tISXR ps CKE setup time during self refresh exit.
// ODT
parameter TAOND = 2; // tAOND tCK ODT turn-on delay
parameter TAOFD = 2.5; // tAOFD tCK ODT turn-off delay
parameter TAONPD = 2000; // tAONPD ps ODT turn-on (precharge power-down mode)
parameter TAOFPD = 2000; // tAOFPD ps ODT turn-off (precharge power-down mode)
parameter TMOD = 12000; // tMOD ps ODT enable in EMR to ODT pin transition
// Power Down
parameter TCKE = 3; // tCKE tCK CKE minimum high or low pulse width
// Size Parameters based on Part Width
`ifdef x16
parameter ADDR_BITS = 13; // Address Bits
parameter ROW_BITS = 13; // Number of Address bits
parameter COL_BITS = 10; // Number of Column bits
parameter DM_BITS = 2; // Number of Data Mask bits
parameter DQ_BITS = 16; // Number of Data bits
parameter DQS_BITS = 2; // Number of Dqs bits
parameter TRRD = 10000; // tRRD Active bank a to Active bank b command time
`endif
`ifdef QUAD_RANK
`define DUAL_RANK // also define DUAL_RANK
parameter CS_BITS = 4; // Number of Chip Select Bits
parameter RANKS = 4; // Number of Chip Select Bits
`else `ifdef DUAL_RANK
parameter CS_BITS = 2; // Number of Chip Select Bits
parameter RANKS = 2; // Number of Chip Select Bits
`else
parameter CS_BITS = 2; // Number of Chip Select Bits
parameter RANKS = 1; // Number of Chip Select Bits
`endif `endif
// Size Parameters
parameter BA_BITS = 2; // Set this parmaeter to control how many Bank Address bits
// if MEM_BITS== 14, a DQ=16 each part, DQ=64 total (4 parts) => 1MB total (256KB each)
// if MEM_BITS== 15, a DQ=16 each part, DQ=64 total (4 parts) => 2MB total (512KB each)
// if MEM_BITS== 16, a DQ=16 each part, DQ=64 total (4 parts) => 4MB total (1MB each)
// if MEM_BITS== 17, a DQ=16 each part, DQ=64 total (4 parts) => 8MB total (2MB each)
//parameter MEM_BITS = 14; // Number of write data bursts can be stored in memory. The default is 2^10=1024.
parameter MEM_BITS = 22; // Number of write data bursts can be stored in memory. //256MB total(64MB each),Rill modify from 17 to 22 140410
parameter AP = 10; // the address bit that controls auto-precharge and precharge-all
parameter BL_BITS = 3; // the number of bits required to count to MAX_BL
parameter BO_BITS = 2; // the number of Burst Order Bits
// Simulation parameters
parameter STOP_ON_ERROR = 1; // If set to 1, the model will halt on command sequence/major errors
parameter DEBUG = 0; // Turn on Debug messages
parameter BUS_DELAY = 0; // delay in nanoseconds
parameter RANDOM_OUT_DELAY = 0; // If set to 1, the model will put a random amount of delay on DQ/DQS during reads
parameter RANDOM_SEED = 711689044; //seed value for random generator.
parameter RDQSEN_PRE = 2; // DQS driving time prior to first read strobe
parameter RDQSEN_PST = 1; // DQS driving time after last read strobe
parameter RDQS_PRE = 2; // DQS low time prior to first read strobe
parameter RDQS_PST = 1; // DQS low time after last valid read strobe
parameter RDQEN_PRE = 0; // DQ/DM driving time prior to first read data
parameter RDQEN_PST = 0; // DQ/DM driving time after last read data
parameter WDQS_PRE = 1; // DQS half clock periods prior to first write strobe
parameter WDQS_PST = 1; // DQS half clock periods after last valid write strobe
目前,我们已经建立的和实际硬件一致的仿真模型,但是我们在仿真前,要把linux的镜像实现load到仿真模型中才行,这就需要了解DDR2 SDRAM的内部组织结构,了解BL_MAX,BL_BITS,DQ_BITS等参数的具体含义,了解DDR2 SDRAM的读写过程和时序。这些内容请参考《memory system - cache dram disk》一书。这里不再赘述。
对于仿真linux而言,由于编译时指定的内存大小是32MB,所以,我在preload时也只load32MB,一个bank是64MB,所以我们只需要load bank0即可,但是bank0是分布在4个device里的。
下面是修改后的orpsoc_testbench.v的部分代码:
`ifdef XILINX_DDR2
`ifndef GATE_SIM
defparam dut.xilinx_ddr2_0.xilinx_ddr2_if0.ddr2_mig0.SIM_ONLY = 1;
`endif
always @( * ) begin
ddr2_ck_sdram <= #(TPROP_PCB_CTRL) ddr2_ck_fpga;
ddr2_ck_n_sdram <= #(TPROP_PCB_CTRL) ddr2_ck_n_fpga;
ddr2_a_sdram <= #(TPROP_PCB_CTRL) ddr2_a_fpga;
ddr2_ba_sdram <= #(TPROP_PCB_CTRL) ddr2_ba_fpga;
ddr2_ras_n_sdram <= #(TPROP_PCB_CTRL) ddr2_ras_n_fpga;
ddr2_cas_n_sdram <= #(TPROP_PCB_CTRL) ddr2_cas_n_fpga;
ddr2_we_n_sdram <= #(TPROP_PCB_CTRL) ddr2_we_n_fpga;
ddr2_cs_n_sdram <= #(TPROP_PCB_CTRL) ddr2_cs_n_fpga;
ddr2_cke_sdram <= #(TPROP_PCB_CTRL) ddr2_cke_fpga;
ddr2_odt_sdram <= #(TPROP_PCB_CTRL) ddr2_odt_fpga;
ddr2_dm_sdram_tmp <= #(TPROP_PCB_DATA) ddr2_dm_fpga;//DM signal generation
end // always @ ( * )
// Model delays on bi-directional BUS
genvar dqwd;
generate
for (dqwd = 0;dqwd < DQ_WIDTH;dqwd = dqwd+1) begin : dq_delay
wiredelay #
(
.Delay_g (TPROP_PCB_DATA),
.Delay_rd (TPROP_PCB_DATA_RD)
)
u_delay_dq
(
.A (ddr2_dq_fpga[dqwd]),
.B (ddr2_dq_sdram[dqwd]),
.reset (rst_n)
);
end
endgenerate
genvar dqswd;
generate
for (dqswd = 0;dqswd < DQS_WIDTH;dqswd = dqswd+1) begin : dqs_delay
wiredelay #
(
.Delay_g (TPROP_DQS),
.Delay_rd (TPROP_DQS_RD)
)
u_delay_dqs
(
.A (ddr2_dqs_fpga[dqswd]),
.B (ddr2_dqs_sdram[dqswd]),
.reset (rst_n)
);
wiredelay #
(
.Delay_g (TPROP_DQS),
.Delay_rd (TPROP_DQS_RD)
)
u_delay_dqs_n
(
.A (ddr2_dqs_n_fpga[dqswd]),
.B (ddr2_dqs_n_sdram[dqswd]),
.reset (rst_n)
);
end
endgenerate
assign ddr2_dm_sdram = ddr2_dm_sdram_tmp;
//parameter NUM_PROGRAM_WORDS=1048576;
parameter NUM_PROGRAM_WORDS=8388608; //Rill modify from 1048576
integer ram_ptr, program_word_ptr, k;
reg [31:0] tmp_program_word;
reg [31:0] program_array [0:NUM_PROGRAM_WORDS-1]; // 1M words = 4MB//8M words = 32MB
reg [8*16-1:0] ddr2_ram_mem_line; //8*16-bits= 8 shorts (half-words)
genvar i, j;
generate
// if the data width is multiple of 16
for(j = 0; j < CS_NUM; j = j+1) begin : gen_cs // Loop of 1
for(i = 0; i < DQS_WIDTH/2; i = i+1) begin : gen // Loop of 4 (DQS_WIDTH=8)
initial
begin
`ifdef PRELOAD_RAM
`include "ddr2_model_preload.v"
`endif
end
ddr2_model u_mem0
(
.ck (ddr2_ck_sdram[CLK_WIDTH*i/DQS_WIDTH]),
.ck_n (ddr2_ck_n_sdram[CLK_WIDTH*i/DQS_WIDTH]),
.cke (ddr2_cke_sdram[j]),
.cs_n (ddr2_cs_n_sdram[CS_WIDTH*i/DQS_WIDTH]),
.ras_n (ddr2_ras_n_sdram),
.cas_n (ddr2_cas_n_sdram),
.we_n (ddr2_we_n_sdram),
.dm_rdqs (ddr2_dm_sdram[(2*(i+1))-1 : i*2]),
.ba (ddr2_ba_sdram),
.addr (ddr2_a_sdram),
.dq (ddr2_dq_sdram[(16*(i+1))-1 : i*16]),
.dqs (ddr2_dqs_sdram[(2*(i+1))-1 : i*2]),
.dqs_n (ddr2_dqs_n_sdram[(2*(i+1))-1 : i*2]),
.rdqs_n (),
.odt (ddr2_odt_sdram[ODT_WIDTH*i/DQS_WIDTH])
);
end
end
endgenerate
`endif// File intended to be included in the generate statement for each DDR2 part.
// The following loads a vmem file, "sram.vmem" by default, into the SDRAM.
// Wait until the DDR memory is initialised, and then magically
// load it
$display("%t: wait phy_init_done",$time);
@(posedge dut.xilinx_ddr2_0.xilinx_ddr2_if0.phy_init_done);
$display("%t: Loading DDR2",$time);
$readmemh("sram.vmem", program_array);
/* Now dish it out to the DDR2 model‘s memory */
for(ram_ptr = 0 ; ram_ptr < 64*1024/*4096*/ ; ram_ptr = ram_ptr + 1)
begin
// Construct the burst line, with every second word from where we
// started, and picking the correct half of the word with i%2
program_word_ptr = ram_ptr * 16 + (i/2) ; // Start on word0 or word1
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[15:0] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
program_word_ptr = program_word_ptr + 2;
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[31:16] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
program_word_ptr = program_word_ptr + 2;
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[47:32] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
program_word_ptr = program_word_ptr + 2;
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[63:48] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
program_word_ptr = program_word_ptr + 2;
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[79:64] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
program_word_ptr = program_word_ptr + 2;
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[95:80] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
program_word_ptr = program_word_ptr + 2;
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[111:96] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
program_word_ptr = program_word_ptr + 2;
tmp_program_word = program_array[program_word_ptr];
ddr2_ram_mem_line[127:112] = tmp_program_word[15 + ((i%2)*16):((i%2)*16)];
// Put this assembled line into the RAM using its memory writing TASK
// (bank ,row , { col }, data
u_mem0.memory_write(2‘b00,ram_ptr[19:7], {ram_ptr[6:0],3‘b000},ddr2_ram_mem_line);
//$display("Writing 0x%h, ramline=%d",ddr2_ram_mem_line, ram_ptr);
end // for (ram_ptr = 0 ; ram_ptr < ...
$display("(%t) * DDR2 RAM %1d preloaded",$time, i);
首先,program_array[]是连续线性的,但是4个device的组织不是连续线性的,所以在调用memory_write()之前一定要变成DDR2 SDRAM实际的组织形式。
此外,由于我们只preload了32MB,小于一个bank,所以bank的地址我们一直是2‘b00,如果以后需要仿真的程序规模超过一个bank的大小了,那么就需要修改bank地址了。
修改orpsocv2/sw/makefile.inc中,是指使用现成的elf文件,生成vmem文件。具体修改方法,前面已经介绍过了,这里不再赘述。
执行:make rtl-test TEST=linux PRELOAD_RAM=1
即可得到linux的仿真结果,和实际下板的结果相同。
毫无疑问,由于linux程序规模很大,如果要等到linux启动完成,需要等待很久。
下面是部分输出:
之前搞嵌入式,linux的启动信息很熟悉,但是如果想知道linux启动过程中,几乎是不可能的,现在板子上所有设备的每个clock的状态,通过RTL仿真,即可实现。
enjoy!
OpenRisc-66-基于ORPSoC对linux进行RTL仿真,布布扣,bubuko.com
OpenRisc-66-基于ORPSoC对linux进行RTL仿真
原文:http://blog.csdn.net/rill_zhen/article/details/23381949