Mastering Verilog: Thinking in Hardware

Mastering Verilog: Thinking in Hardware | Generated by AI

Home 2026.05

Question: How to study Verilog? What open-source projects and runnable code examples exist?

Answer:

Verilog is learnable in days syntactically — the hard part is thinking in hardware (everything is parallel, time is explicit). Here’s the fastest path.

Mental Model Shift First

Software mindset:          Hardware mindset:
─────────────────          ────────────────
Sequential execution       Everything runs simultaneously
Functions call each other  Modules exist in parallel always
Variables store values     Wires carry signals RIGHT NOW
Time is implicit           Time is clock edges, explicit

This is the #1 stumbling block. A Verilog module isn’t a function — it’s a physical circuit that is always on.

Verilog in 50 Lines — Core Syntax

// --- WIRES vs REGISTERS ---
wire a;          // combinational — driven by another module/gate
reg  b;          // sequential — holds value across clock edges

// --- MODULES (the basic unit, like a class but hardware) ---
module adder #(parameter WIDTH=8) (
    input  [WIDTH-1:0] a,
    input  [WIDTH-1:0] b,
    output [WIDTH-1:0] sum,
    output             carry
);
    assign {carry, sum} = a + b;  // combinational, always active
endmodule

// --- ALWAYS BLOCKS ---
// Combinational (no clock):
always @(*) begin
    case (sel)
        2'b00: out = a;
        2'b01: out = b;
        default: out = 0;
    endcase
end

// Sequential (clocked flip-flop):
always @(posedge clk or posedge rst) begin
    if (rst)
        count <= 8'h00;    // non-blocking assignment <=
    else
        count <= count + 1;
end

// --- BLOCKING vs NON-BLOCKING (critical distinction) ---
// = blocking:     executes sequentially within always block (use in comb)
// <= non-blocking: all RHS evaluated first, then assigned (use in sequential)

// --- INSTANTIATION ---
adder #(.WIDTH(16)) my_adder (
    .a(x), .b(y), .sum(result), .carry(c_out)
);

Simulation Tools — Run Verilog Now

Option 1: Icarus Verilog + GTKWave (local, free)

brew install icarus-verilog gtkwave
# or
sudo apt install iverilog gtkwave

Write a module + testbench, simulate:

// counter.v
module counter(
    input clk, rst,
    output reg [7:0] count
);
    always @(posedge clk or posedge rst)
        if (rst) count <= 0;
        else     count <= count + 1;
endmodule

// counter_tb.v  (testbench)
module counter_tb;
    reg clk, rst;
    wire [7:0] count;

    counter dut(.clk(clk), .rst(rst), .count(count));

    // generate clock
    initial clk = 0;
    always #5 clk = ~clk;  // toggle every 5ns = 100MHz

    initial begin
        $dumpfile("counter.vcd");  // waveform output
        $dumpvars(0, counter_tb);
        rst = 1; #20;
        rst = 0; #200;
        $finish;
    end

    // monitor output
    initial $monitor("t=%0t count=%d", $time, count);
endmodule

iverilog -o sim counter.v counter_tb.v
./sim
gtkwave counter.vcd   # see the waveform

Option 2: EDA Playground — Zero Install, Browser-Based

Go to edaplayground.com — paste Verilog, run in browser. Supports Icarus, Synopsys VCS, Cadence Xcelium. Best for quick experiments.

Option 3: Verilator — Fastest Simulator, C++ Output

brew install verilator

# Verilator compiles Verilog → C++ → native binary
# 10-100x faster than Icarus for large designs
verilator --cc counter.v --exe counter_tb.cpp --build
./obj_dir/Vcounter

Verilator is what serious open-source projects (including RISC-V cores) use for simulation.

Structured Learning Path

Stage 1: Combinational Logic (Week 1)

Build these in order, simulate each:

// 1. Half adder
module half_adder(input a, b, output sum, carry);
    assign sum   = a ^ b;
    assign carry = a & b;
endmodule

// 2. Full adder
module full_adder(input a, b, cin, output sum, cout);
    assign {cout, sum} = a + b + cin;
endmodule

// 3. 4-to-1 MUX
module mux4(input [3:0] d, input [1:0] sel, output reg out);
    always @(*) out = d[sel];
endmodule

// 4. Priority encoder
module priority_enc(input [3:0] in, output reg [1:0] out, output valid);
    assign valid = |in;
    always @(*) casez(in)
        4'b1???: out = 2'd3;
        4'b01??: out = 2'd2;
        4'b001?: out = 2'd1;
        default: out = 2'd0;
    endcase
endmodule

Stage 2: Sequential Logic (Week 2)

// 1. D flip-flop (fundamental building block)
module dff(input clk, rst, d, output reg q);
    always @(posedge clk or posedge rst)
        q <= rst ? 0 : d;
endmodule

// 2. Shift register
module shift_reg #(parameter N=8)(
    input clk, rst, sin,
    output reg [N-1:0] q
);
    always @(posedge clk or posedge rst)
        q <= rst ? 0 : {q[N-2:0], sin};
endmodule

// 3. FIFO — critical for GPU memory pipelines
module fifo #(parameter DEPTH=16, WIDTH=8)(
    input clk, rst, wr_en, rd_en,
    input [WIDTH-1:0] din,
    output reg [WIDTH-1:0] dout,
    output full, empty
);
    reg [WIDTH-1:0] mem [0:DEPTH-1];
    reg [$clog2(DEPTH):0] wr_ptr, rd_ptr, count;

    assign full  = (count == DEPTH);
    assign empty = (count == 0);

    always @(posedge clk or posedge rst) begin
        if (rst) begin
            wr_ptr <= 0; rd_ptr <= 0; count <= 0;
        end else begin
            if (wr_en && !full) begin
                mem[wr_ptr] <= din;
                wr_ptr <= wr_ptr + 1;
                count <= count + 1;
            end
            if (rd_en && !empty) begin
                dout <= mem[rd_ptr];
                rd_ptr <= rd_ptr + 1;
                count <= count - 1;
            end
        end
    end
endmodule

Stage 3: FSM (Week 3) — How Control Logic Works

// Simple FSM: detect sequence "1011"
module seq_detector(input clk, rst, in, output reg detected);
    typedef enum reg [1:0] {S0, S1, S2, S3} state_t;
    state_t state, next;

    always @(posedge clk or posedge rst)
        state <= rst ? S0 : next;

    always @(*) begin
        next = S0; detected = 0;
        case (state)
            S0: next = in ? S1 : S0;
            S1: next = in ? S1 : S2;
            S2: next = in ? S3 : S0;
            S3: begin detected = 1; next = in ? S1 : S2; end
        endcase
    end
endmodule

FSMs are how every GPU control unit, memory controller, and cache arbiter is implemented. This is not academic.

Real Open-Source Projects to Read

1. picorv32 — Minimal RISC-V CPU (~2000 lines)

The best first real CPU to read. Clean, well-commented, single file.

git clone https://github.com/YosysHQ/picorv32
wc -l picorv32.v   # ~3000 lines — read all of it

Key things to find inside:

The fetch/decode/execute FSM
The register file (32 x 32-bit registers)
How memory transactions work
The multiply/divide unit

# Simulate it running actual RISC-V code
cd picorv32
make test   # runs Icarus simulation

2. VexRiscv — Pipelined RISC-V in SpinalHDL

More realistic, pipelined, plugin-based architecture:

git clone https://github.com/SpinalHDL/VexRiscv
# Written in SpinalHDL (Scala-based like Chisel)
# Generates clean Verilog
sbt "runMain vexriscv.demo.GenFull"
# outputs VexRiscv.v — read the generated Verilog

3. MIAOW — Open-Source AMD GCN GPU

Directly relevant to your MI300X interest:

git clone https://github.com/VerticalResearchGroup/miaow
ls miaow/src/gpu/
# fetch_decode/   — instruction fetch and decode
# execute/        — ALU, float units
# mem/            — memory pipeline
# simfiles/       — simulation infrastructure

Read miaow/src/gpu/fetch_decode/fetch.v first — it’s where instructions come in.

4. Gemmini — Systolic Array (Direct AI Chip Relevance)

git clone https://github.com/ucb-bar/gemmini
# Written in Chisel, generates Verilog
# This is conceptually what MI300X matrix cores do
ls generators/gemmini/src/main/scala/gemmini/
# Systolic array PE, mesh, controller

5. OpenTitan — Production-Quality SystemVerilog

Google’s open-source silicon project. Real-world code quality:

git clone https://github.com/lowRISC/opentitan
# 500k+ lines of SystemVerilog
# AES, SHA, RISC-V core, USB, I2C — all open
ls hw/ip/aes/rtl/   # AES hardware accelerator

Your Concrete First Week

# Day 1: Install tools
brew install icarus-verilog gtkwave verilator

# Day 2: Write + simulate counter, adder, mux
# (use the code above, add testbenches, view waveforms)

# Day 3: Build a FIFO, simulate with random write/read

# Day 4: Read picorv32.v top to bottom
git clone https://github.com/YosysHQ/picorv32
cat picorv32.v | grep -A 20 "always @"  # find all FSMs

# Day 5: Run picorv32 simulation
cd picorv32 && make test

# Day 6: Synthesize something with Yosys
yosys -p "read_verilog counter.v; synth -top counter; stat"
# See how many gates your RTL becomes

# Day 7: Browse MIAOW GPU source
git clone https://github.com/VerticalResearchGroup/miaow

Key Mental Models to Lock In

Concept	What it maps to in software
`wire`	A live value, like a return value always being computed
`reg`	A variable that only updates on clock edge
`always @(*)`	A pure function, recomputed when inputs change
`always @(posedge clk)`	An event loop tick
Module instantiation	Spawning a goroutine that runs forever
Testbench	Your `main()` / test harness
`$display`	`printf`
`$finish`	`exit(0)`

The hardest concept: there is no scheduler. Every always block runs in parallel, forever, simultaneously. Synthesis figures out how to wire transistors to implement that behavior.

References

HDLBits is especially good — it’s LeetCode for Verilog, with instant simulation feedback in the browser.

Back Donate