INTEGER DIVISION in FPGAs with VHDL APPROACH

If you google “addition in vhdl”, “subtraction in vhdl” or “multiply in vhdl” of “… in fpga”, you get tons of results, tutorials and example codes sometimes with detailed utilization and timing analysis with them. But have you ever considered and searched for “division in fpga/vhdl/verilog” etc?

Well I did! First you need to choose what kind of data type are you going to divide: integers or floating-point types? Addition, subtraction, multiplication and division in floating-point type numbers require detailed analysis and elaboration. However, I am aiming at integer types in this post. If you want floating-point arithmetic in FPGAs, I can redirect you to Xilinx Floating Point IP (https://www.xilinx.com/products/intellectual-property/floating_pt.html – PG060) or an opencores.org open-source implementation of Jidan Al-Eryani’s FPU (https://opencores.org/projects/fpu100).

I don’t want to go deep analysis for integer addition/subtraction or multiplication. As I mentioned, you can find tons of example codes and algorithms. Integer division in FPGA is not impossible, but most suggestions say that you should try to avoid using division operation in your designs if possible. You can find LUT-based division algorithm approaches for FPGAs such as mentioned here (https://surf-vhdl.com/how-to-implement-division-in-vhdl/). There is also a detailed VHDL integer division implementation using Newton-Raphson method by Jari Honkanen (https://hardwaredescriptions.com/category/vhdl-integer-arithmetic/division/).

In this post, I will consider 2 cases: 1) A variable number is divided by a constant 2) A variable number is divided by a variable number. Of course I don’t take a number for constant divider such as 2,4 or 8, as in this case division is just shift operation. I will consider 2, 8-bit numbers for dividend and divisor and 8-bit for the quotient. I will use ‘/’ operation and let Vivado synthesizer to choose whatever logic or algorithm it uses and analyze the results. I will also use Xilinx Divider Generator IP (https://www.xilinx.com/products/intellectual-property/divider.html#documentation – PG151) to compare the results. I use Vivado 2020.1 for synthesis and implementation, also I will use NEXYS4 DDR board switch & leds to create constraints.

Case 1 – Divide by constant: First I simply write a VHDL code such as below and try to synthesize it:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity divide_by_constant is
generic (
divisor		: integer := 5
);
port ( 
dividend	: in STD_LOGIC_VECTOR (7 downto 0);
quotient	: out STD_LOGIC_VECTOR (7 downto 0)
);
end divide_by_constant;

architecture Behavioral of divide_by_constant is

begin

quotient	<= dividend / divisor;

end Behavioral;

Vivado gives the error “[Synth 8-944] 0 definitions of operator “/” match here”. Well I used to get this error a lot of times and it remembers me to forget STD_LOGIC_ARITH package. So I added both STD_LOGIC_ARITH and STD_LOGIC_UNSIGNED packages and re-synthesis. However, I got the same error. I was curious and opened the std_logic_arith.vhd file to see if there is a division operator defined for std_logic_vector data types. Then I found that there is no function definition for “/” operator. There are for “+”, “-”, “*” “<=”, “>=”, “/=” “CONV_INTEGER” and “CONV_STD_LOGIC_VECTOR” but not for “/”. Then I tried converting integer to input, and then converting std_logic_vector to that result and the code is just like below:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity divide_by_constant is
generic (
divisor		: integer := 5
);
port ( 
dividend	: in STD_LOGIC_VECTOR (7 downto 0);
quotient	: out STD_LOGIC_VECTOR (7 downto 0)
);
end divide_by_constant;

architecture Behavioral of divide_by_constant is

begin

quotient	<= CONV_STD_LOGIC_VECTOR( (CONV_INTEGER(dividend) / divisor), 8);

end Behavioral;

This time Vivado synthesis with no errors or warnings. I opened the synth report, the utilization was 9 LUT. The cell usage statistics were:

Report Cell Usage:

+——+——+——+

| |Cell |Count |

+——+——+——+

|1 |LUT3 | 1|

|2 |LUT4 | 1|

|3 |LUT5 | 2|

|4 |LUT6 | 6|

|5 |MUXF7 | 1|

|6 |IBUF | 8|

|7 |OBUF | 8|

+——+——+——+

I assigned pins of the inputs to the first 8 switches and output to first 8 leds and run implementation after that. I wanted to see the total combinational delay and opened “Report Timing Summary”. Because there is no clock and no FF is inferred for source and destination, the way to see the combinational path is to go to “Unconstrained Paths” -> NONE to NONE -> Setup. The worst delay path was from dividend[6] to quotient[0] with Total Delay: 9.554, Logic Delay: 3.840 and Net Delay: 5.713.

The schematics view shows that there is a LUT6 and LUT5 in the path with I/O buffers.

Now look for Xilinx’s “Divider Generator” IP.

The IP description tells that LUT-Mult algorithm is suitable for very small operands. Radix-2 algorithm provides a solution suitable for small to medium operand division. High Radix algorithm provides a solution based upon XtremeDSP slices and so is well suited to larger operands (that is, above about 16 bits wide). From this description, I assume for my 8-bit numbers division LUT-Mult is OK. In the IP configuration window, I choose the options such as below:

I rewrite the code by instantiating the Division IP such that:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity divide_by_constant is
generic (
divisor		: integer := 5
);
port ( 
clk			: in std_logic;
dividend	: in STD_LOGIC_VECTOR (7 downto 0);
quotient	: out STD_LOGIC_VECTOR (7 downto 0)
);
end divide_by_constant;

architecture Behavioral of divide_by_constant is

component div_gen_0 IS
PORT (
aclk 					: IN STD_LOGIC;
s_axis_divisor_tvalid 	: IN STD_LOGIC;
s_axis_divisor_tdata 	: IN STD_LOGIC_VECTOR(7 DOWNTO 0);
s_axis_dividend_tvalid 	: IN STD_LOGIC;
s_axis_dividend_tdata 	: IN STD_LOGIC_VECTOR(7 DOWNTO 0);
m_axis_dout_tvalid 	: OUT STD_LOGIC;
m_axis_dout_tuser 	: OUT STD_LOGIC_VECTOR(0 DOWNTO 0);
m_axis_dout_tdata 	: OUT STD_LOGIC_VECTOR(15 DOWNTO 0)
);
END component;

signal tvalid		: std_logic := '0';
signal div_by_zero	: std_logic_vector (0 downto 0) := (others => '0');
signal result		: STD_LOGIC_VECTOR(15 DOWNTO 0) := (others => '0');

begin

div_ip : div_gen_0
PORT MAP (
aclk 			=> clk,
s_axis_divisor_tvalid 	=> '1',
s_axis_divisor_tdata 	=> CONV_STD_LOGIC_VECTOR(divisor, 8),
s_axis_dividend_tvalid 	=> '1',
s_axis_dividend_tdata 	=> dividend,
m_axis_dout_tvalid 	=> tvalid,
m_axis_dout_tuser 	=> div_by_zero,
m_axis_dout_tdata 	=> result
);

quotient	<= result(15 downto 8);

end Behavioral;

I added clk constraint also for this code. The utilization results were very different:

It used 1 DSP and 0.5 BRAM and 9 FF. Of course for the first code, I can add USE_DSP attribute and then the synthesizer will be forced to use DSP instead of LUTs.

I don’t have time to go deeper analysis of utilization or timing for these 2 different methods to implement integer division’s first case. I just showed two different methods for integer division with constant divisor. Now it is the time for 2^nd case.

Case 2 – Division of two numbers: In the first method, which is using “/” operator, I just edited the divide_by_cosntant.vhd file and added another input with 8-bit length divisor and changed the one line such as:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity divide_by_number is
port ( 
dividend	: in STD_LOGIC_VECTOR (7 downto 0);
divisor		: in STD_LOGIC_VECTOR (7 downto 0);
quotient	: out STD_LOGIC_VECTOR (7 downto 0)
);
end divide_by_number;

architecture Behavioral of divide_by_number is

begin

quotient <= CONV_STD_LOGIC_VECTOR( (CONV_INTEGER(dividend) / CONV_INTEGER(divisor)), 8);

end Behavioral;

I modified the constraints and used the next 8 switch for the divisor. The utilization report is shown:

Well 69 LUT is not big problem. However, when I look at the timing analysis, it is a disaster. Let’s first see the schematic:

Well you can see the combinational elements, I even didn’t count the combinational depth. The timing analysis window gives the logic and routing delay as:

So, using “/” operator seems fine for utilization but possibly you get timing error. Let’s see the IP use case. The code is below:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity divide_by_number is
port ( 
clk		: in std_logic;
dividend	: in STD_LOGIC_VECTOR (7 downto 0);
divisor		: in STD_LOGIC_VECTOR (7 downto 0);
quotient	: out STD_LOGIC_VECTOR (7 downto 0)
);
end divide_by_number;

architecture Behavioral of divide_by_number is

component div_gen_0 IS
PORT (
aclk 			: IN STD_LOGIC;
s_axis_divisor_tvalid 	: IN STD_LOGIC;
s_axis_divisor_tdata 	: IN STD_LOGIC_VECTOR(7 DOWNTO 0);
s_axis_dividend_tvalid 	: IN STD_LOGIC;
s_axis_dividend_tdata 	: IN STD_LOGIC_VECTOR(7 DOWNTO 0);
m_axis_dout_tvalid 	: OUT STD_LOGIC;
m_axis_dout_tuser 	: OUT STD_LOGIC_VECTOR(0 DOWNTO 0);
m_axis_dout_tdata 	: OUT STD_LOGIC_VECTOR(15 DOWNTO 0)
);
END component;

signal tvalid		: std_logic := '0';
signal div_by_zero	: std_logic_vector (0 downto 0) := (others => '0');
signal result		: STD_LOGIC_VECTOR(15 DOWNTO 0) := (others => '0');

begin

div_ip : div_gen_0
PORT MAP (
aclk 			=> clk,
s_axis_divisor_tvalid 	=> '1',
s_axis_divisor_tdata 	=> divisor,
s_axis_dividend_tvalid 	=> '1',
s_axis_dividend_tdata 	=> dividend,
m_axis_dout_tvalid 	=> tvalid,
m_axis_dout_tuser 	=> div_by_zero,
m_axis_dout_tdata 	=> result
);

quotient	<= result(15 downto 8);

end Behavioral;

The utilization is same with the case#1

By the way, somehow Vivado does not show utilization after synthesis, I need to implement the design to see the utilization values. It could be a bug or because of using an IP, not sure.

I have written a simple testbench code handling a few cases to see if the module is working properly. The waveform seems to be fine.

Well this is the first post of the “FPGA DESIGN” page in my website. I hope I could find time to write posts like this one. I wish someone will benefit from the information in this post and could increase the design time about integer division.

Regards,

Mehmet Burak AYKENAR

You can connect me via LinledIn: Just sent me an invitation

https://tr.linkedin.com/in/mehmet-burak-aykenar-73326419a

Bir yanıt yazın Yanıtı iptal et