RISC-V Based SoC Design with Open-Source Openlane IC Design Tool

This is the second post for the series of open-source IC design flow. The first post talked about how to use open-source tools, namely Openlane to harden a RISC-V CPU core, in which we generated GDS outputs of the CPU core. The famous PicoRV32 RISC-V core is chosen for this purpose. You can find the first post in this link:

https://www.mehmetburakaykenar.com/open-source-ic-design-flow-for-an-open-source-risc-v-core/444/

In PicoRV32 github repo, there is a directory named picosoc, where PicoRV32 RISC-V core is utilized with a few peripherals such as flash memory controller, SRAM and UART to demonstrate the functionality and efficiency of the core. There is an FPGA implementation for picosoc in the repo deploying the SoC in the Lattice iCE40-HX8K. In this post, I will show how to implement picosoc with Openlane and generate GDS outputs.

We have already built our RISC-V core in the previous post and have GDS file. Now we need to import this IP and add extra logic for peripheral interfaces and memories. The question is, what are the necessary and appropriate configuration parameters and methodology for integrating pre-hardened IPs into our top design.

Thanks to Openlane documentation there is a part named “Chip Level Integration”:

https://openlane.readthedocs.io/en/latest/usage/chip_integration.html

According to this document, there are “Chip Core” and “Chip IO”. Chip Core includes hard macros, which we hardened a RISC-V core in the previous post for example and rest of the design. Chip IO includes IO Pads, Power Pads and Corner Pads. In this post, we are interested in Chip Core design. I am copying the documentation part here as an image:

Another good reference documentation is in Efabless Caravel User Project, which uses Openlane also:

https://caravel-user-project.readthedocs.io/en/latest/#caravel-integration

Caravel documentation states that there are 3 Chip hardening options. In their reference, user project wrapper is the top module:

1) Hardening the user macro(s) first, then inserting it in the user project wrapper with no standard cells on the top level:

2) Flattening the user macro(s) with the user_project_wrapper

3) Placing multiple macros in the wrapper along with standard cells on the top level

Since we already have a pre-hardened macro, only 1 or 3 is possible for hardening the Chip core, aka the top module. Now we need to investigate our top module, which is picosoc.v and try to select which method is more appropriate for this design, or, maybe we can try two methods and see the differences and learn the configuration parameters 🙂

In order to use first integration methodology, we only need to have module instantiations in the top module, which is not the case for picosoc.v. There are a few logic assignments in the RTL code alongside module instantiations. So, for the default case, it seems method 3 is more appropriate for picosoc. However, we can gather all these logic into a new module and instantiate it in the top module, so that we can harden all these modules and integrate into the picosoc to utilize methodology 1. Other than logic assignments, these are the module instantiations in the picosoc module:

picorv32: This is the module we hardened previously.

spimemio: Flash memory interface.

simpleuart: UART transciever interface.

picosoc_mem: Memory for the CPU.

We need to be cautious about configuration parameters for methods 1 and 3. Efabless published a webinar record recently (Dec 11 2023) and gave some insights for integration methods. I took these images from this YouTube video:

Method 1:

Method 3:

Okay, let’s first focus on Method1, where we will harden all sub-modules and then instantiate them in the top module, with no standard logic cells. Be careful, we can only have module instantiations in the top module and not any Verilog assignments or always blocks etc that can synthesize any logic.

Method1: Hardening the user macro(s) first, then inserting it in the user project wrapper with no standard cells on the top level

We need to create a new file, which will be hardened as a macro, and take all the logical assignments in this module. Let’s first analyze the logical functions in picosoc.v module step by step:

reg [31:0] irq;
wire irq_stall = 0;
wire irq_uart = 0;

always @* begin
   irq = 0;
   irq[3] = irq_stall;
   irq[4] = irq_uart;
   irq[5] = irq_5;
   irq[6] = irq_6;
   irq[7] = irq_7;
end

This 32-bit irq net is connected to irq input of the picorv32 module. From the assignments we see that only 3-bit comes from input ports (irq_5, irq_6 and irq_7) and other bits are constant zero. We can use concatenation in picorv32 module instantiation such as:

	) cpu (
		.clk         (clk        ),
		.resetn      (resetn     ),
		.mem_valid   (mem_valid  ),
		.mem_instr   (mem_instr  ),
		.mem_ready   (mem_ready  ),
		.mem_addr    (mem_addr   ),
		.mem_wdata   (mem_wdata  ),
		.mem_wstrb   (mem_wstrb  ),
		.mem_rdata   (mem_rdata  ),
		.irq         (24’h000000, irq_7, irq_6, irq_5, 5’b00000)
	);

Other logical assignments in the picosoc module are related to memory decoding issues. We can take all these assignments into a new module.

wire mem_valid;
wire mem_instr;
wire mem_ready;
wire [31:0] mem_addr;
wire [31:0] mem_wdata;
wire [3:0] mem_wstrb;
wire [31:0] mem_rdata;

wire spimem_ready;
wire [31:0] spimem_rdata;

reg ram_ready;
wire [31:0] ram_rdata;

assign iomem_valid = mem_valid && (mem_addr[31:24] > 8'h 01);
assign iomem_wstrb = mem_wstrb;
assign iomem_addr = mem_addr;
assign iomem_wdata = mem_wdata;

wire spimemio_cfgreg_sel = mem_valid && (mem_addr == 32'h 0200_0000);
wire [31:0] spimemio_cfgreg_do;

wire        simpleuart_reg_div_sel = mem_valid && (mem_addr == 32'h 0200_0004);
wire [31:0] simpleuart_reg_div_do;

wire        simpleuart_reg_dat_sel = mem_valid && (mem_addr == 32'h 0200_0008);
wire [31:0] simpleuart_reg_dat_do;
wire        simpleuart_reg_dat_wait;

assign mem_ready = (iomem_valid && iomem_ready) || spimem_ready || ram_ready || spimemio_cfgreg_sel ||
		simpleuart_reg_div_sel || (simpleuart_reg_dat_sel && !simpleuart_reg_dat_wait);

assign mem_rdata = (iomem_valid && iomem_ready) ? iomem_rdata : spimem_ready ? spimem_rdata : ram_ready ? ram_rdata :
		spimemio_cfgreg_sel ? spimemio_cfgreg_do : simpleuart_reg_div_sel ? simpleuart_reg_div_do :
		simpleuart_reg_dat_sel ? simpleuart_reg_dat_do : 32'h 0000_0000;

always @(posedge clk)
	ram_ready <= mem_valid && !mem_ready && mem_addr < 4*MEM_WORDS;

I will name this new module as mem_decode and let’s look for I/Os of this module.

output reg        mem_valid;
output reg        mem_instr;
input             mem_ready;
output reg [31:0] mem_addr ;
output reg [31:0] mem_wdata;
output reg [ 3:0] mem_wstrb;
input      [31:0] mem_rdata;

mem_valid , mem_instr, mem_addr, mem_wdata and mem_wstrb are output, while mem_ready and mem_rdata are input for picorv32 module. So they will be reverse in direction in mem_decode module.

We need a clock which is used in the always assignment but not reset.

spimem_ready and spimem_rdata are outputs from spimemio module, so input for mem_decode module.

MEM_WORDS parameter is used so we need to define it:

parameter integer MEM_WORDS = 256;

ram_rdata is an output from picosoc_mem module, so it will be an input for mem_decode. iomem ports will also be defined in mem_decode:

output        iomem_valid,
input         iomem_ready,
output [ 3:0] iomem_wstrb,
output [31:0] iomem_addr,
output [31:0] iomem_wdata,
input  [31:0] iomem_rdata,

We need these ports for uart:

output simpleuart_reg_div_sel
input [31:0] reg_div_do
input [31:0] simpleuart_reg_dat_do
input simpleuart_reg_dat_wait

Finally, we can start with hardening the modules instantiated inside the picosoc. Let’s remember the modules:

picorv32: This is the module we hardened previously.

spimemio: Flash memory interface.

simpleuart: UART transciever interface.

picosoc_mem: Memory for the CPU

mem_decode: Wrapper for standard logic cells inside the picosoc.

picorv32 has already been hardened in the previous post. Let’s work on spimemio. But, we first need to think about the layout of the chip. This is important, since there will be wire connections between the macros. A reckless IO placement of the macros can result in a longer wire distances for connections. So, let’s analyze which macro communicates which macro, and which macros have external ports, so that we can place these external IOs to near edges of the chip.

For spimemio macro, we can place it to near bottom side of the chip, since it has flash pins, which are connected to external IOs. We can place flash pins to the bottom and other pins accordingly to them. Here is the instantiation of spimemio module:

spimemio spimemio (
.clk    (clk),
.resetn (resetn),
.valid  (mem_valid && mem_addr >= 4*MEM_WORDS && mem_addr < 32'h 0200_0000),
.ready  (spimem_ready),
.addr   (mem_addr[23:0]),
.rdata  (spimem_rdata),

.flash_csb    (flash_csb   ),
.flash_clk    (flash_clk   ),

.flash_io0_oe (flash_io0_oe),
.flash_io1_oe (flash_io1_oe),
.flash_io2_oe (flash_io2_oe),
.flash_io3_oe (flash_io3_oe),

.flash_io0_do (flash_io0_do),
.flash_io1_do (flash_io1_do),
.flash_io2_do (flash_io2_do),
.flash_io3_do (flash_io3_do),

.flash_io0_di (flash_io0_di),
.flash_io1_di (flash_io1_di),
.flash_io2_di (flash_io2_di),
.flash_io3_di (flash_io3_di),

.cfgreg_we(spimemio_cfgreg_sel ? mem_wstrb : 4'b 0000),
.cfgreg_di(mem_wdata),
.cfgreg_do(spimemio_cfgreg_do)
);

There are 142 IO pins for spimemio module. An even distribution of the pins means 35, 36 for each edge. We have 14 flash* pins needed to be placed in the bottom. 36 spimemio_cfgreg* pins can be connected to the right side, since they will be connected to mem_decode module, which is in the right side of the chip. spimem_rdata is also be connected to mem_decode, which is 32-bit size. We can place these pins to top. 24-bit mem_addr will be connected to picosoc_mem, we can place them to the left. Other pins will be separated evenly to the edges.

BOTTOM:

flash_csb
flash_clk
flash_io0_oe
flash_io1_oe
flash_io2_oe
flash_io3_oe
flash_io0_do
flash_io1_do
flash_io2_do
flash_io3_do
flash_io0_di
flash_io1_di
flash_io2_di
flash_io3_di

RIGHT:

cfgreg_we [3:0]
cfgreg_do [31:0]
addr [23:16]

TOP:

rdata [31:0]
clk
resetn
valid
ready

LEFT:

cfgreg_di [31:0]
addr [15:0]

After making necessary configuration and pin changes, do not forget to add power nets to the module:

`ifdef USE_POWER_PINS	
	inout vccd1;	// User area 1 1.8V supply
	inout vssd1;	// User area 1 digital ground
`endif

Then run Openlane with:

> make mount
> ./flow.tcl -design spimemio

I got my first error in this flow in Step 2 STA:

I checked the line 23 and saw that I left “;” instead of “,” at the end of the line. Rerun again. This time flow finishes successfully. However, we got a warning in floorplan step, saying that “Current core area is too small for the power grid settings chosen”.

The default PDN pitch values are 180 um for vertical and horizontal. It is suggested to use a minimum of 200×200 um size for macro hardening. Here the relative floorplanning generated a 149.5×149.6 um core area. We can modify PDN pitch values, or we can increase the area constraint in configuration. I will use a 200×200 um DIE_AREA and rerun the flow with these configuration settings:

set ::env(FP_SIZING) "absolute"
set ::env(DIE_AREA) "0 0 200 200"

We got no warning regarding to PDN pitch this time and here is the GDS view of spimemio macro:

Now let’s harden simple_uart macro. There are 139 pins in simpe_uart module. We can evenly distribute pins to the edges of the macro. We can set DIE_AREA as 200×200 um, same with spimemio. These are the pins of the simple_uart module:

input clk,
input resetn,

output ser_tx,
input  ser_rx,

input   [3:0] reg_div_we,
input  [31:0] reg_div_di,
output [31:0] reg_div_do,

input         reg_dat_we,
input         reg_dat_re,
input  [31:0] reg_dat_di,
output [31:0] reg_dat_do,
output        reg_dat_wait

BOTTOM:

reg_dat_di [31:0]
reg_dat_wait
reg_dat_we
reg_dat_re
clk
resetn

RIGHT:

reg_dat_do [31:0]

TOP:

reg_div_do [31:0]

LEFT:

ser_tx
ser_rx
reg_div_we [3:0]
reg_div_di [31:0]

And again don’t forget adding power pins. The first error is in Linter step:

I checked log file and found that I gave the wrong name for the module:

So, I changed module name in the config.tcl and rerun.

set ::env(DESIGN_NAME) "simpleuart"

I got the second error in IO Placement step regarding to not matching pins:

I added reg_div_we to pin_order.cfg file and rerun.

This time the flow finished with success and here is the GDS output for simpleuart:

For picosoc_mem we have these pins:

input clk,
input [3:0] wen,
input [21:0] addr,
input [31:0] wdata,
output reg [31:0] rdata

There are 92 pins.

BOTTOM:

rdata [31:0]

RIGHT:

wdata [31:0]

TOP:

wen [3:0]
clk

LEFT:

addr [21:0]

Our first error is in IO Placement:

I checked pin_order.cfg and noticed that I forgot to add \ in wen signals:

I modified and rerun. Then we got our second error in global placement step:

It seems 200×200 um DIE_AREA is not enough. I will let Openlane to decide the area with FP_SIZING relative option and rerun. This time flow completed successfully, but it took a lot of times, 35 minutes. We can get run time info from runs/RUN*/cmds.log. Since we let Openlane to define DIE_AREA, we need to check the area of the macro. We can get this info from def file, and we see that it is nearly 1000×1000 um:

DIEAREA ( 0 0 ) ( 929715 940435 ) ;

Here is the GDS output of the picosoc_mem macro:

picosoc_mem macro actually defines a 1kB memory cell. Since it is utilized by RTL synthesis, it takes too much area. We have OpenRAM and DFFRAM for SKY130 for memory cells actually. OpenRAM has some issues for now and DFFRAM usage is encouraged. You can find more info on OpenRAM and DFFRAM in their github repos:

https://github.com/VLSIDA/OpenRAM

https://github.com/AUCOHL/DFFRAM

Now let’s harden finally mem_decode macro.

For mem_decode macro, we have 373 pins, which is very high if we think of the logical functionality of the module. We can place mem* pins to left, iomem* signals to right and seperate other pins to top and bottom. We need to be careful about aspect ratio. A square shape seems not a good idea, if we need a rectangle, then we need to change aspect ratio configuration parameter from 1 to other values. FP_ASPECT_RATIO parameter is calculated as (height/width). So we can start with a value of 2 or 2.5 and evaluate the result.

I run and got first error in Linter step:

I checked linter.log:

I added “assign” keyword to line 44 and rerun:

assign spimemio_cfgreg_sel = mem_valid && (mem_addr == 32'h 0200_0000);

I got new errors in linter step:

I noticed that I have already defined these as ports of the module, so commented-out wire definitions and rerun.

The flow finished but it generated a very very small area macro (around 50×50 um), but I don’t want it that small. So I changed FP_SIZING and give it a 200×600 macro size. Here is the GDS output of the macro:

Finally we have all the macros for our picosoc design. Now, it is time to prepare configuration file for picosoc.

First, let’s look at some examples from open-source github repos.

https://github.com/efabless/openlane-ci-designs/tree/6676a20db8775e0ca9a6df099e807b4951b8da6f/manual_macro_placement_test

In this manual_macro_placement_test design, 2 spm macros are instantiated in the design.v module. For method 1 of chip integration, we need gds and lef files for hardened macros and define these file paths’ in the configuration script. We also need another file, macro_placement.cfg, where we define placement of the macros in the chip floorplan.

Another example repo is Efabless caravel_user_project:

https://github.com/efabless/caravel_user_project/tree/main/openlane/user_project_wrapper

I named our design as “picosoc_method1”. For Verilog source files, we will only have picosoc.v, where we instantiated all the macros and there is no standard logic. In Openlane documentation, necessary files for hardened macros in the config file for integration are given as:

You can find detailed information about macro configuration file in Openlane documentation:

https://github.com/The-OpenROAD-Project/OpenLane/blob/master/docs/source/reference/configuration.md#macro-placement-configuration

Here is options for the macro placement:

Here is the file structure for the picosoc:

We can try first flow with this floorplan:

I will first give 200 um distance from the edges and 100 um between macros. So the macro.cfg will be:

cpu 1300 400 N
spimemio 1600 200 N
simpleuart 1300 200 N
mem_decode 1900 400 N
picosoc_mem 200 200 N

I will give DIE_AREA as: 2300×1400 um

After preparing pin_placement.cfg, I run the flow. The first error is in Linting.

I corrected concatenation operation and rerun:

.irq ({24'h000000, irq_7, irq_6, irq_5, 5'b00000})

Then I got Linting error complaining about parameters in picorv32 macro. I realized that I removed them while hardening the picorv32 macro, so I just deleted parameter passing.

After removing parameters, I successfully passed linting step, then got error in Synthesis step. There are warnings in the synthesis and 2 of them are important. I realized that I forgot to add spimemio_cfgreg_sel and simpleuart_reg_dat_sel into ports of the mem_decode even though I added their assignment logic. Also I needed to remove reg definition for mem_ready. So, I need to add these into ports and then reharden mem_decode first.

I corrected mem_decode and rerun. This time I got an error, I don’t remember this one:

I checked synthesis log file and found this:

Well, mem_decode, picorv32, picosoc_mem, simpleuart and spimemio are OK, but what are these $ge, $logic_and, $logic_not, $lt and $mux? They seem logic cells, but I thought I only instantiated modules. Let’s focus more thoroughly on picosoc.v. Haaaaah! I see some bad mistakes!

.valid (mem_valid && mem_addr >= 4*MEM_WORDS && mem_addr < 32'h 0200_0000),
.cfgreg_we(spimemio_cfgreg_sel ? mem_wstrb : 4'b 0000),
.reg_div_we (simpleuart_reg_div_sel ? mem_wstrb : 4'b 0000),
.reg_dat_we (simpleuart_reg_dat_sel ? mem_wstrb[0] : 1'b 0),
.reg_dat_re (simpleuart_reg_dat_sel && !mem_wstrb),
.wen((mem_valid && !mem_ready && mem_addr < 4*MEM_WORDS) ? mem_wstrb : 4'b0),

Well this pin assignments of the modules have logic! I did not realize them until I got this error. So, how to resolve this? The pins are:

spimemio: valid (input)
spimemio: cfgreg_we [3:0] (input)
simpleuart: reg_div_we [3:0] (input)
simpleuart: reg_dat_we (input)
simpleuart: reg_dat_re (input)
picosoc_mem: wen [3:0] (input)

I think, I can add the logic functionality into mem_decode and generate outputs for these input signals of spimemio, simpleuart and picosoc_mem modules. Let’s check signals needed to generate this logic:

mem_valid, mem_addr, spimemio_cfreg_sel, mem_wstrb, simpleuart_reg_div_sel, simpleuart_reg_dat_sel

These are all I/O for mem_decode. So, only modifying mem_decode macro seems OK to handle this problem.

I will add extra ports for these 6 signals in mem_decode:

extra_spimemio_valid
extra_spimemio_cfgreg_we [3:0]
extra_simpleuart_reg_div_we [3:0]
extra_simpleuart_reg_dat_we
extra_simpleuart_reg_dat_re
extra_picosoc_mem_wen [3:0]

I added in mem_decode these logic assignments:

assign extra_spimemio_valid = mem_valid && mem_addr >= 4*MEM_WORDS && mem_addr < 32'h 0200_0000;
assign extra_spimemio_cfgreg_we = spimemio_cfgreg_sel ? mem_wstrb : 4'b 0000;
assign extra_simpleuart_reg_div_we = simpleuart_reg_div_sel_int ? mem_wstrb : 4'b 0000;
assign extra_simpleuart_reg_dat_we = simpleuart_reg_dat_sel_int ? mem_wstrb[0] : 1'b0;
assign extra_simpleuart_reg_dat_re = simpleuart_reg_dat_sel_int && !mem_wstrb;
assign extra_picosoc_mem_wen = mem_valid && !mem_ready_int && mem_addr < 4*MEM_WORDS;

Then I changed in picosoc.v with output signals of the mem_decode. I also need to reharden mem_decode and do not forget to modify pin_configuration file.

I got an error in STA:

I realized that for verilog black-box files I copied RTL instead of gate-level files. I copied gate-level Verilog files from results/final/verilog/gl/#.v and rerun.

Then I got an error in manual macro placement step:

I realized that I forgot to write in macro.cfg file the instantiation name instead of module name:

So, I changed in macro.cfg file name from picosoc_mem to memory and rerun. Then I got error in PDN step:

I realized in config.tcl, in macro hooks, I forgot to change the name from picosoc_mem to memory.

Then I got an error in detailed routing step:

It seems we can try again with increasing the area and modifying the macro positions. But, first check the floorplan with openroad gui and see if everything is OK:

> make mount
> openroad -gui

Oohhhhhh, I made a mistake for cpu and mem_decode positions :l I updated macro.cfg as:

cpu 1400 600 N
spimemio 1800 200 N
simpleuart 1400 200 N
mem_decode 2200 600 N
memory 200 200 N

This was the routing congestion view:

Hope this time it works. Rerun:

I forgot to change DIE_AREA in config.tcl :l

I changed 2600 x 1500 and rerun:

This time I got error in LVS. I think that could be due to some power connection issues of the macros:

In lef.lvs.log file, I found that there are clock buffers, but we can’t have since we do not enable power rails for std logic cells. I forgot to disable CTS in config.tcl so I disabled it and rerun.

Still got LVS errors:

Then I realized I don’t have power connections in picosoc.v :l

I added power connections in picosoc.v and rerun. Again error in LVS. Oohhhh, I forgot to connect power pins of the macros in picosoc.v :l add them and rerun. A hint from me, if you got LVS error, first check power connections!

I still have LVS errors. So let’s check lvs.log:

This conb std logic cell is unexpected. I need to check netlist. I also have net mismatch. Layout does not have 4 nets.

I check synth result and suspicious of constant assignments:

I think I can’t do this zero assignments in irq. Possibly, I need to add an output to mem_decode macro and connect it to cpu macro. But, let me check power connections also. I checked netlist result after routing and saw that constant assignment brings some issues:

So, this logic cell could cause problems in LVS. There were 4 net mismatches, possibly the power connections of sky130_fd_sc_hd__conb_1 cell, since we don’t enable power rails. Ok, let’s add irq to the output of the mem_decode and connect in that way.

I added this lines to mem_decode:

And finally, the green flow complete 🙂

Let’s see some views first from openroad with odb and then from klayout with gds.

Without pins and power/gnd straps the view is:

Klayout GDS view:

We can see met5 (gold) horizontal and met4 (dark blue) vertical power straps for VDD and GND. We see there are empty regions inside the chip, as this is the method 1 where we do not insert any standard cell in the design but just hardened macros and their connections, nets. In the next blog, if I have energy and free time and motivation, I plan to design this SoC with only CPU hardened and other modules as synthesized logic cells, which is method 3.

I think I need to stop here for this tutorial of creating a RISC-V based SoC with open-source IC design tools, namely Openlane. It took hours to get these good looking views 🙂 By the way, this design is far away from final optimized version. The difficulty starts when you want to optimize your design in area, speed or power. For example, can we fit a smaller area for this design? From the results it seems yes. What about max frequency with this 130 nm technology? This result is just the visible part of an iceberg. By the way, we haven’t verified this design yet! Normally, it takes more time and engineering resource to verify than design. So be aware that, this is just a simple example just to show how to build an SoC with Openlane. I tried to show each and every error that I encountered during this design flow and you see the time that I consumed and iteration number for getting the successful result.

You can find all the files used in this tutorial in my github repo:

https://github.com/mbaykenar/openlane-designs

Regards,

Mehmet Burak AYKENAR

You can connect me via LinkedIn: Just sent me an invitation

https://tr.linkedin.com/in/mehmet-burak-aykenar-73326419a

Bir yanıt yazın Yanıtı iptal et