Y
Published on

TD4 4-Bit CPU A Complete Deep-Dive Tutorial

Authors
  • avatar
    Name
    Yinhuan Yuan
    Twitter

TD4 4-Bit CPU: A Complete Deep-Dive Tutorial

Table of Contents

  1. Introduction
  2. Historical Context
  3. Architecture Overview
  4. Component Breakdown
  5. Instruction Set Architecture
  6. Data Path Analysis
  7. Control Logic
  8. Clock and Reset Circuitry
  9. Step-by-Step Operation
  10. Building the TD4
  11. Example Programs
  12. Extensions and Modifications

1. Introduction

What is the TD4?

The TD4 (often written as TD4-4BIT-CPU) is a minimalist 4-bit CPU designed for educational purposes. The name "TD4" comes from the Japanese book title "CPU no Tsukurikata" (CPUの創りかた - "How to Make a CPU") by Watanabe Tetsuya, published in 2003.

Why Study the TD4?

The TD4 is exceptional for learning because:

  • Minimal complexity: Only 10 ICs total (all 74HC series)
  • No microcode: Direct hardware instruction decoding
  • Transparent operation: Every signal can be traced and understood
  • Buildable: Can be constructed on a breadboard in a few hours
  • Complete: Despite its simplicity, it's a fully functional stored-program computer

Design Philosophy

The TD4 strips a CPU down to its absolute essentials:

  • 4-bit data bus
  • 4-bit address space (16 bytes of program memory)
  • 4 registers (2 general-purpose, 1 output, 1 program counter)
  • 16 instructions (4-bit opcode)
  • Single accumulator architecture with carry flag

2. Historical Context

The Book and Its Impact

Watanabe Tetsuya's book became a cult classic in Japan among electronics hobbyists and students. It walks readers through building a CPU from scratch using only commonly available 74HC logic ICs.

Educational Lineage

The TD4 fits into a lineage of educational CPUs:

Simple Logic Gates
   Adders/ALUs
    SAP-1 (Simple As Possible)
      TD4 ←── You are here
    HACK (Nand2Tetris)
  Real CPUs (6502, Z80, etc.)

The TD4 occupies a sweet spot: complex enough to be a real CPU, simple enough to fully understand.


3. Architecture Overview

Block Diagram

                    ┌─────────────────────────────────────────────────────────┐
TD4 CPU                    │                                                         │
   ┌────────┐       │  ┌─────┐    ┌─────┐                                     │
Clock  │───────┼─▶│ PC  │───▶│ ROM │──┬─ OPCODE[7:4] ──▶ Instruction   └────────┘       │  └─────┘    └─────┘  │                   Decoder                    │     ▲                │                      │           │
   ┌────────┐       │     │                │                      ▼           │
Reset  │───────┼─────┤                │              Control Signals   └────────┘       │     │                │                      │           │
                    │     │                └─ IM[3:0] ───────────────┐        │
                    (Immediate)           │        │
                    │  ┌──┴──┐                                       ▼        │
                    │  │ +1  │◀── Carry ◀── ┌─────┐    ┌─────┐    ┌─────┐    │
                    │  └─────┘              │ ALU │◀───│ MUX │◀───│ A/B │    │
                    │                       │ ADD │    │ 4:1 │    │ Reg │    │
   ┌────────┐       │                       └──┬──┘    └─────┘    └─────┘    │
IN    │───────┼──────────────────────────┼──────────▲                   │
Port   │       │                          │          │                   │
   └────────┘       │                          ▼          │                   │
                    │                       ┌─────┐       │                   │
   ┌────────┐       │                       │ OUT │───────┘                   │
OUT   │◀──────┼───────────────────────│ Reg │                           │
Port   │       │                       └─────┘                           │
   └────────┘       │                                                         │
                    └─────────────────────────────────────────────────────────┘

Register Set

RegisterNameSizeDescription
AAccumulator A4-bitGeneral purpose register
BAccumulator B4-bitGeneral purpose register
OUTOutput Register4-bitConnected to output port (directly drives LEDs)
PCProgram Counter4-bitPoints to current instruction (0-15)
CCarry Flag1-bitSet when ALU produces carry-out

Memory Map

Address    Content
0x0        Instruction 0
0x1        Instruction 1
...        ...
0xF        Instruction 15  (only 16 bytes total!)

4. Component Breakdown

Complete Parts List

QtyPart NumberFunction
274HC1614-bit binary counter (PC and Address)
274HC153Dual 4-to-1 multiplexer
174HC2834-bit binary adder
274HC74Dual D flip-flop (registers)
174HC540Octal buffer with inverted outputs
174HC10Triple 3-input NAND
1ROM16×8 (can use DIP switches, EEPROM, or diode matrix)

4.1 74HC161 - Program Counter

The 74HC161 is a synchronous 4-bit binary counter with parallel load capability.

        ┌───────────────┐
   CLR ─┤1           16├─ VCC
   CLK ─┤2           15├─ RCO (Ripple Carry Out)
    D0 ─┤3           14├─ Q0
    D1 ─┤4           13├─ Q1
    D2 ─┤5           12├─ Q2
    D3 ─┤6           11├─ Q3
   ENP ─┤7           10├─ ENT
   GND ─┤8            9├─ LOAD
        └───────────────┘

In TD4, this IC serves as the Program Counter:

  • On each clock pulse (with ENP and ENT high), it counts up: 0→1→2→...→15→0
  • When LOAD is asserted low, it loads a new address (for JMP instruction)
  • CLR resets to 0 (system reset)

Key insight: The PC doesn't just count—it can also be loaded with an immediate value for jumps!

4.2 74HC153 - Data Selector (MUX)

The 74HC153 contains two independent 4-to-1 multiplexers.

        ┌───────────────┐
   1Ea ─┤1           16├─ VCC
    S1 ─┤2           15├─ 2Ea
  1I3a ─┤3           14├─ S0
  1I2a ─┤4           13├─ 2I3a
  1I1a ─┤5           12├─ 2I2a
  1I0a ─┤6           11├─ 2I1a
   1Ya ─┤7           10├─ 2I0a
   GND ─┤8            9├─ 2Ya
        └───────────────┘

In TD4, two 74HC153s form a 4-bit wide 4:1 MUX:

Select bits S1 S0:
   00Select Register A
   01Select Register B
   10Select Input Port
   11Select 0000 (constant zero)

This MUX selects what goes to one input of the ALU.

4.3 74HC283 - 4-Bit Adder (The ALU)

The 74HC283 is a 4-bit binary full adder with fast carry.

        ┌───────────────┐
   Σ2 ─┤1           16├─ VCC
   B2 ─┤2           15├─ B3
   A2 ─┤3           14├─ A3
   Σ1 ─┤4           13├─ Σ3
   A1 ─┤5           12├─ Σ4
   B1 ─┤6           11├─ C4 (Carry Out)
  C0  ─┤7           10├─ A4
  GND ─┤8            9├─ B4
        └───────────────┘

The TD4's ALU is JUST an adder!

This is a crucial simplification. The TD4 has no subtraction, no AND, no OR—just ADD. Yet it's still Turing complete!

Operation:

A inputs ← Selected by MUX (A, B, IN, or 0)
B inputs ← Immediate value from instruction [3:0]
Σ outputs ← A + B + Cin
C4Carry out (stored in carry flip-flop)

4.4 74HC74 - D Flip-Flops (Registers)

The 74HC74 contains two independent D flip-flops with preset and clear.

        ┌───────────────┐
  1CLR ─┤1           14├─ VCC
   1D  ─┤2           13├─ 2CLR
  1CLK ─┤3           12├─ 2D
  1PRE ─┤4           11├─ 2CLK
   1Q  ─┤5           10├─ 2PRE
  1Q̄   ─┤6            9├─ 2Q
  GND  ─┤7            8├─ 2Q̄
        └───────────────┘

In TD4:

  • Two 74HC74s provide 4 flip-flops for registers A and B (2 bits each? No—see below)
  • Actually, the original TD4 uses 74HC161s for A, B, and OUT registers too (they have built-in flip-flops)
  • One flip-flop stores the carry flag

4.5 74HC540 - Octal Buffer (Output Register)

Used to drive the output LEDs and isolate the output register from the bus.

4.6 74HC10 - Triple 3-Input NAND (Instruction Decoder)

The instruction decoder converts the 4-bit opcode into control signals.


5. Instruction Set Architecture

Instruction Format

Each instruction is 8 bits:

  7   6   5   4   3   2   1   0
┌───┬───┬───┬───┬───┬───┬───┬───┐
OP3OP2OP1OP0IM3IM2IM1IM0└───┴───┴───┴───┴───┴───┴───┴───┘
│←── OPCODE ──→│←── IMMEDIATE ─→│
  • OPCODE [7:4]: Determines the operation
  • IMMEDIATE [3:0]: 4-bit immediate value (0-15)

Complete Instruction Set

BinaryHexMnemonicOperationDescription
00000ADD A, ImA ← A + ImAdd immediate to A
00011MOV A, BA ← B + 0Copy B to A
00102IN AA ← IN + 0Read input port to A
00113MOV A, ImA ← 0 + ImLoad immediate to A
01004MOV B, AB ← A + 0Copy A to B
01015ADD B, ImB ← B + ImAdd immediate to B
01106IN BB ← IN + 0Read input port to B
01117MOV B, ImB ← 0 + ImLoad immediate to B
10008(unused)Reserved
10019OUT BOUT ← B + 0Output B to port
1010A(unused)Reserved
1011BOUT ImOUT ← 0 + ImOutput immediate to port
1100C(unused)Reserved
1101D(unused)Reserved
1110EJNC Imif C=0: PC ← ImJump if no carry
1111FJMP ImPC ← ImUnconditional jump

Understanding the Instruction Encoding

The brilliance of TD4's instruction encoding lies in how opcodes directly generate control signals:

OPCODE bits:
  OP3 OP2 OP1 OP0
   │   │   │   │
   │   │   └───┴── MUX select (which register feeds ALU)
   │   │              00 = A
   │   │              01 = B  
   │   │              10 = IN
   │   │              11 = 0 (constant)
   │   │
   └───┴────────── Destination select
                     00 = A register
                     01 = B register
                     10 = OUT register
                     11 = PC (jump)

This is incredibly elegant! The opcode bits ARE the control signals (almost).


6. Data Path Analysis

The Central Data Path

                              ┌─────────────────────────┐
INSTRUCTION                              │   ┌───────┬───────┐     │
                              │   │OPCODEIMMED │     │
                              │   │[7:4][3:0] │     │
                              │   └───┬───┴───┬───┘     │
                              │       │       │         │
                              │       ▼       │         │
                              │   ┌───────┐   │         │
                              │   │DECODE │   │         │
                              │   └───┬───┘   │         │
                              │       │       │         │
                              └───────┼───────┼─────────┘
                                      │       │
            ┌─────────────────────────┼───────┼──────────────────┐
            │                         │       │                  │
SELECT                 │       │  IMMEDIATE            │    │                    │       │    │             │
            │    ▼                    │       │    ▼             │
            │  ┌────┐   ┌────┐        │       │  ┌────┐          │
            │  │REG │   │REG │        │       │  │    │          │
            │  │ A  │   │ B  │        │       │  │    │          │
            │  └─┬──┘   └─┬──┘        │       │  │    │          │
            │    │        │           │       │  │    │          │
            │    ▼        ▼           │       │  │    │          │
            │  ┌────────────┐         │       │  │    │          │
INPUT ──────┼─▶│  4:1 MUX   │─────────┼───────┼─▶│ADD │──┐       │
  PORT      │  └────────────┘         │       │  │    │  │       │
            │         ▲               │       │  │    │  │       │
            │         │               │       │  └────┘  │       │
0000─┘               │       │    │     │       │
            │                         │       │    │COUT │       │
            │                         ▼       │    ▼     │       │
            │                    ┌────────┐   │  ┌───┐   │       │
            │                    │DEST SEL│   │  │ C │   │       │
            │                    └────────┘   │  │FLG│   │       │
            │                         │       │  └───┘   │       │
            │    ┌────────────────────┼───────┘          │       │
            │    │        │           │                  │       │
            │    ▼        ▼           ▼                  │       │
            │  ┌────┐   ┌────┐   ┌────────┐              │       │
            │  │REG │   │REG │   │OUT REG │              │       │
            │  │ A  │   │ B  │   └────┬───┘              │       │
            │  └────┘   └────┘        │                  │       │
            │                         ▼                  │       │
            │                    ┌────────┐              │       │
            │                    │ OUTPUT │              │       │
            │                    │  PORT  │              │       │
            │                    └────────┘              │       │
            │                                            │       │
            │            ┌──────────────────────────────┘       │
            │            │                                       │
            │            ▼                                       │
            │       ┌────────┐      ┌─────┐                      │
            │       │   PC   │──────│ ROM │                      │
(74HC161│◀─────│     │                      │
            │       └────────┘      └─────┘                      │
            │            ▲                                       │
            │            │                                       │
CLK───┘                                       │
            │                                                    │
            └────────────────────────────────────────────────────┘

Signal Flow for Each Instruction Type

ADD A, Im (Opcode 0000)

Step 1: PC outputs address → ROM outputs instruction
Step 2: Opcode 0000 decoded:
        - MUX select = 00 (choose A)
        - Dest select = 00 (load A)
Step 3: ALU computes: A + Immediate
Step 4: Rising clock edge:
        - Result loaded into A
        - Carry loaded into C flag
        - PC increments

JMP Im (Opcode 1111)

Step 1: PC outputs address → ROM outputs instruction
Step 2: Opcode 1111 decoded:
        - MUX select = 11 (choose 0)
        - Dest select = 11 (load PC)
        - PC LOAD signal asserted
Step 3: ALU computes: 0 + Immediate = Immediate
Step 4: Rising clock edge:
        - PC loads Immediate value (not increment!)
        - Next instruction fetched from new address

7. Control Logic

Instruction Decoder Truth Table

OP3OP2OP1OP0LOAD_ALOAD_BLOAD_OUTLOAD_PCSEL1SEL0
0000100000
0001100001
0010100010
0011100011
0100010000
0101010001
0110010010
0111010011
1001001001
1011001011
1110000~C11
1111000111

Decoder Logic Equations

From the truth table, we can derive:

LOAD_A  = ~OP3 & ~OP2
LOAD_B  = ~OP3 & OP2
LOAD_OUT = OP3 & ~OP2 & OP0
LOAD_PC = OP3 & OP2 & (OP0 | (OP1 & ~CARRY))
SEL1    = OP1
SEL0    = OP0

These equations can be implemented with just a few NAND gates!

The JNC (Jump if No Carry) Logic

JNC instruction (opcode 1110):
  LOAD_PC = OP3 & OP2 & OP1 & ~OP0 & ~CARRY
          = "It's a JNC" AND "Carry is clear"
          
JMP instruction (opcode 1111):
  LOAD_PC = OP3 & OP2 & OP1 & OP0
          = Always load PC (unconditional)

Combined:

LOAD_PC = OP3 & OP2 & OP1 & (OP0 | ~CARRY)

8. Clock and Reset Circuitry

Clock Generator Options

Option 1: 555 Timer Oscillator

                VCC
                 R1
          ┌──────┴──────┐
8     ┌────┤7    555   3 ├────── CLK OUT
     │    │             │
     R26 ─────┤
     │    │      2 ─────┤
     ├────┤             │
     │    │      1     C    └──────┬──────┘
     │           │
    GND         GND

f ≈ 1.44 / ((R1 + 2×R2) × C)

For ~1 Hz clock (easy to observe):

  • R1 = 10kΩ
  • R2 = 470kΩ
  • C = 1µF

Option 2: Manual Clock (Push Button)

VCC ─────┬─────────── CLK OUT
        10kΩ
        ─┴─
        ───  Push Button
        GND

Add debounce circuit:

VCC ─┬─ 10kΩ ─┬─ 74HC14 ─┬─ 74HC14 ─── CLK OUT
     │        │          │
   Button     C         (Schmitt trigger for clean edges)
0.1µF
    GND             GND

Reset Circuit

         VCC
         10kΩ
          ├───────────── RESET (to all CLR inputs)
       ───┴───
       ───────  Reset Button
         GND

Power-on reset (automatic):

VCC ──┬── 10kΩ ──┬─── RESET
      │          │
      │         ─┴─ 10µF
     ─┴─         │
     ───        GND
     GND

9. Step-by-Step Operation

Execution Cycle

The TD4 executes each instruction in a single clock cycle:

        ┌───────────────────────────────────────────────────┐
CLOCK CYCLE        │                                                    │
        │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌───────┐ │
        │  │ FETCH   │  │ DECODE  │  │ EXECUTE │  │ STORE │ │
        │  │         │  │         │  │         │  │       │ │
        │  │ PCROM  │  │ Opcode  │  │  ALU    │  │ Regs  │ │
        │  │         │→ │ →Ctrl   │→ │ Compute │→ │ Load  │ │
        │  └─────────┘  └─────────┘  └─────────┘  └───────┘ │
        │                                                    │
        │◀────────────────── ~1 clock ─────────────────────▶│
        └───────────────────────────────────────────────────┘

Clock:  ────┐     ┌─────────────────┐     ┌─────
            │     │                 │     │
            └─────┘                 └─────┘
                  ▲                       ▲
                  │                       │
            Registers loaded        Next cycle

Detailed Timing

Time
CLK:    ─────┐          ┌──────────────┐          ┌─────
             │          │              │          │
             └──────────┘              └──────────┘

PC:     ════╤════════════════╤════════════════════╤════
        ADDRADDR NADDR N+1            │                │                    │

ROM:    ────┼────────────────┼────────────────────┼────
OPIM (prev)OPIM (curr)
ALU:    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             (computing)       (result ready)
Regs:   ════╪════════════════╪════════════════════╪════
             (previous val)    (loads new val)            │                │         ▲          │
            │                │         │          │
                                   Register loads
                                   on rising edge

Worked Example: Running a Program

Let's trace through this program:

Address  Instruction    Assembly
0x0      0011 0001     MOV A, 1      ; A = 1
0x1      0111 0010     MOV B, 2      ; B = 2
0x2      0001 0000     MOV A, B      ; A = B (A = 2)
0x3      0000 0011     ADD A, 3      ; A = A + 3 = 5
0x4      1011 0000     OUT 0         ; Output = 0 (clear)
0x5      1001 0000     OUT B         ; Output = B = 2
0x6      1111 0000     JMP 0         ; Loop forever

Cycle 0:

PC = 0ROM[0] = 0011_0001
Opcode 0011MOV A, Im
Immediate = 1
ALU: 0 + 1 = 1
On clock edge: A1, PC1

Cycle 1:

PC = 1ROM[1] = 0111_0010
Opcode 0111MOV B, Im
Immediate = 2
ALU: 0 + 2 = 2
On clock edge: B2, PC2

Cycle 2:

PC = 2ROM[2] = 0001_0000
Opcode 0001MOV A, B
MUX selects B (value 2)
ALU: 2 + 0 = 2
On clock edge: A2, PC3

Cycle 3:

PC = 3ROM[3] = 0000_0011
Opcode 0000ADD A, Im
MUX selects A (value 2)
ALU: 2 + 3 = 5
On clock edge: A5, PC4

Cycle 4:

PC = 4ROM[4] = 1011_0000
Opcode 1011OUT Im
ALU: 0 + 0 = 0
On clock edge: OUT0, PC5
LEDs show: 0000

Cycle 5:

PC = 5ROM[5] = 1001_0000
Opcode 1001OUT B
MUX selects B (value 2)
ALU: 2 + 0 = 2
On clock edge: OUT2, PC6
LEDs show: 0010

Cycle 6:

PC = 6ROM[6] = 1111_0000
Opcode 1111JMP
Immediate = 0
LOAD_PC asserted
On clock edge: PC0 (not PC + 1!)

Cycle 7:

PC = 0  (Program repeats from beginning)

10. Building the TD4

Breadboard Layout

┌─────────────────────────────────────────────────────────────┐
POWER RAILS+ ─────────────────────────────────────────────────── +- ─────────────────────────────────────────────────── -├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐        │
│  │ 74HC161 │  │ 74HC161 │  │ 74HC153 │  │ 74HC153 │        │
│  │   PC    │  │  REG A  │  │  MUX    │  │  MUX    │        │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘        │
│                                                             │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐        │
│  │ 74HC161 │  │ 74HC283 │  │ 74HC10  │  │ 74HC74  │        │
│  │  REG B  │  │  ADDER  │  │ DECODER │  │  CARRY  │        │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘        │
│                                                             │
│  ┌─────────┐  ┌─────────────────────────────────────┐      │
│  │ 74HC540 │  │           ROM (DIP SWITCHES)        │      │
│  │   OUT   │  │         16 x 8 bits = 128 switches  │      │
│  └─────────┘  │   OR use AT28C16 EEPROM             │      │
│               └─────────────────────────────────────┘      │
│                                                             │
[CLOCK] [RESET]  [INPUT SWITCHES 0-3]  [OUTPUT LEDS 0-3]│                                                             │
+ ─────────────────────────────────────────────────── +- ─────────────────────────────────────────────────── -└─────────────────────────────────────────────────────────────┘

Wiring Checklist

Power Connections (FIRST!)

  • VCC (pin 16) to +5V on all ICs
  • GND (pin 8) to ground on all ICs
  • Add 0.1µF decoupling capacitor near each IC
  • Add 10µF capacitor across main power rails

Program Counter (74HC161 #1)

  • CLK (pin 2) ← System clock
  • CLR (pin 1) ← Reset (active low)
  • LOAD (pin 9) ← Instruction decoder (for JMP)
  • D0-D3 (pins 3-6) ← ALU output (for JMP address)
  • Q0-Q3 (pins 14,13,12,11) → ROM address inputs
  • ENP, ENT (pins 7,10) ← Logic for PC enable

ROM Connections

  • A0-A3 ← PC outputs
  • D0-D3 → Immediate value to ALU B-input
  • D4-D7 → Opcode to instruction decoder

ALU (74HC283)

  • A1-A4 (pins 5,3,14,12) ← MUX output (selected register)
  • B1-B4 (pins 6,2,15,9) ← Immediate from ROM
  • Σ1-Σ4 (pins 4,1,13,10) → Data bus to registers
  • C0 (pin 7) ← GND (or carry-in for future expansion)
  • C4 (pin 9) → Carry flag flip-flop

MUX (74HC153 × 2)

Connect in parallel to create 4-bit wide MUX:

  • S0, S1 ← Opcode bits OP0, OP1
  • I0 inputs ← Register A outputs
  • I1 inputs ← Register B outputs
  • I2 inputs ← Input port switches
  • I3 inputs ← GND (constant 0)
  • Y outputs → ALU A-inputs

ROM Programming Options

Option A: DIP Switch Matrix (Most Educational)

For each address (0-15):
  - 8 DIP switches set the instruction
  - Address selected by PC through a decoder

       Address 0    Address 1    ...   Address 15
      ┌────────┐   ┌────────┐        ┌────────┐
      │████████│   │████████│        │████████│
      │▓▓▓▓░░░░│   │░░░░░░░░│        │▓▓▓▓▓▓▓▓│
      └────────┘   └────────┘        └────────┘
       0011_0001    0111_0010          1111_0000
       MOV A,1      MOV B,2            JMP 0

Option B: Diode Matrix ROM (Classic Approach)

                Address Lines (from PC)
                A3  A2  A1  A0
                │   │   │   │
        ┌───────┼───┼───┼───┼───────┐
    D7 ─┤   ◄───┼───●───┼───┼───►   │
    D6 ─┤   ◄───┼───┼───●───┼───►   │
    D5 ─┤   ◄───●───┼───┼───●───►   │
    D4 ─┤   ◄───┼───┼───┼───┼───►   │
    D3 ─┤   ◄───┼───●───●───┼───►   │
    D2 ─┤   ◄───●───┼───┼───┼───►   │
    D1 ─┤   ◄───┼───┼───●───●───►   │
    D0 ─┤   ◄───●───●───┼───┼───►   │
        └───────────────────────────┘
        
= 1N4148 diode (cathode to data line)
    No diode = logic 0
    Diode present = logic 1

Option C: EEPROM (AT28C16 or similar)

  • Most convenient for reprogramming
  • Use EEPROM programmer or Arduino to write
  • Only need 16 bytes of the 2KB capacity

Test Points to Add

Add test LEDs or probe points for debugging:

  1. Clock signal
  2. PC outputs (Q0-Q3)
  3. ALU outputs (Σ1-Σ4)
  4. Carry flag
  5. Each control signal (LOAD_A, LOAD_B, LOAD_OUT, LOAD_PC)

11. Example Programs

Program 1: LED Counter (Knight Rider)

; Counts 0-15 on output LEDs, then repeats
; Address  Binary      Hex   Assembly
    0      0011_0000   30    MOV A, 0      ; A = 0
    1      1001_0000   90    OUT B         ; (actually OUT A via trick)
    2      0000_0001   01    ADD A, 1      ; A = A + 1
    3      1110_0001   E1    JNC 1         ; If no overflow, goto 1
    4      1111_0000   F0    JMP 0         ; Overflow! Reset

Wait—there's no OUT A instruction! We need a workaround:

; Corrected LED Counter
    0      0011_0000   30    MOV A, 0      ; A = 0
    1      0100_0000   40    MOV B, A      ; B = A
    2      1001_0000   90    OUT B         ; Output B
    3      0000_0001   01    ADD A, 1      ; A++
    4      1110_0001   E1    JNC 1         ; Loop until overflow
    5      1111_0000   F0    JMP 0         ; Start over

Program 2: Alternating Pattern

; Alternates between 0101 and 1010 on LEDs
    0      1011_0101   B5    OUT 5         ; Output 0101
    1      1011_1010   BA    OUT 10        ; Output 1010  
    2      1111_0000   F0    JMP 0         ; Repeat

Program 3: Input Echo

; Reads input switches and displays on LEDs
    0      0010_0000   20    IN A          ; Read input to A
    1      0100_0000   40    MOV B, A      ; Copy to B
    2      1001_0000   90    OUT B         ; Display B
    3      1111_0000   F0    JMP 0         ; Repeat

Program 4: Addition Calculator

; Adds two 4-bit numbers from input
; First input, press button, second input, press button, shows sum
; (Simplified version - assumes clock is manual button)
    0      0010_0000   20    IN A          ; First number
    1      0110_0000   60    IN B          ; Second number
    2      0000_0000   00    ADD A, 0      ; A = A + 0 (sets up for next)
    3      0001_0000   10    MOV A, B      ; A = B
    4      0000_0000   00    ADD A, 0      ; Dummy (need ADD A,B which doesn't exist!)

Hmm, TD4 can't directly add A and B. Let's try a different approach:

; Add input to running total
    0      0011_0000   30    MOV A, 0      ; Clear accumulator
    1      0110_0000   60    IN B          ; Read input to B
    2      0100_0000   40    MOV B, A      ; Save A to B
    3      0010_0000   20    IN A          ; Read new input to A
    ; ... TD4 is limited here!

Key insight: TD4's limitations show why real CPUs need more instructions!

Program 5: Fibonacci Sequence (Partial)

; Generates Fibonacci: 1, 1, 2, 3, 5, 8, 13... (mod 16)
; F(n) = F(n-1) + F(n-2)
; Uses A as F(n-1), B as F(n-2)

    0      0011_0001   31    MOV A, 1      ; A = 1 (F1)
    1      0111_0001   71    MOV B, 1      ; B = 1 (F0)
    2      0100_0000   40    MOV B, A      ; temp = B; B = A
    ; Can't do: A = A + temp
    ; TD4 limitation: can only add IMMEDIATE to register, not register to register!

This reveals TD4's fundamental limitation: No register-to-register addition. The only way to add A and B is to use IN port or Immediate values!

Program 6: Practical LED Chaser

; "Knight Rider" style LED pattern: 1-2-4-8-4-2-1-2-4-8-...
    0      1011_0001   B1    OUT 1         ; 0001
    1      1011_0010   B2    OUT 2         ; 0010
    2      1011_0100   B4    OUT 4         ; 0100
    3      1011_1000   B8    OUT 8         ; 1000
    4      1011_0100   B4    OUT 4         ; 0100
    5      1011_0010   B2    OUT 2         ; 0010
    6      1111_0000   F0    JMP 0         ; Loop

12. Extensions and Modifications

12.1 Adding Subtraction

TD4 can only add. To subtract, use two's complement:

A - B = A + (~B + 1)

Example: 7 - 3
  7 = 0111
  3 = 0011
 ~3 = 1100
~3+1= 1101 (which is -3 in two's complement)
7 + (-3) = 0111 + 1101 = 101000100 = 4

But TD4 can't compute ~B easily without hardware mods.

Hardware modification: Add XOR gates before ALU B-input, controlled by a SUB signal.

12.2 Expanding to 8-bit

To make an 8-bit version:

  • Replace 74HC283 with two cascaded 74HC283s (or one 74HC283 + 74HC283)
  • Double all register widths
  • Expand MUX to 8-bit wide
  • Expand ROM data width to 12 bits (4 opcode + 8 immediate)

12.3 Adding More Instructions

Current unused opcodes (8, A, C, D) could implement:

  • NOP: No operation (useful for timing)
  • AND: Logical AND (needs new ALU hardware)
  • NOT: Bitwise invert A
  • SHL: Shift left (multiply by 2)

12.4 Memory for Data (RAM)

TD4 has no RAM—only ROM for instructions. To add data memory:

┌──────────────────────────────────────────────┐
TD4 + RAM│                                               │
ROM (instructions) ◄──── PC│          │                                    │
│          ▼                                    │
Decoder│          │                                    │
│          ▼                                    │
RAM (data) ◄─────── Address from instruction │
│      ▲   │                                    │
│      │   ▼                                    │
│      └─ ALU ◄──── Registers│                                               │
└──────────────────────────────────────────────┘

New instructions needed:

  • LOAD addr: A ← RAM[addr]
  • STORE addr: RAM[addr] ← A

12.5 Stack and Subroutines

For CALL/RET functionality:

  1. Add a stack pointer register (SP)
  2. Add RAM for stack storage
  3. Implement PUSH, POP, CALL, RET

This significantly increases complexity but enables recursion!


Appendix A: Quick Reference Card

┌─────────────────────────────────────────────────────────────┐
TD4 QUICK REFERENCE├─────────────────────────────────────────────────────────────┤
REGISTERSA, B    : 4-bit general purpose                          │
OUT     : 4-bit output port                              │
PC      : 4-bit program counter (0-15)C       : 1-bit carry flag                               │
├─────────────────────────────────────────────────────────────┤
INSTRUCTION FORMAT[OPCODE:4][IMMEDIATE:4]├─────────────────────────────────────────────────────────────┤
OPCODE  MNEMONIC      OPERATION0     ADD A, Im     AA + Im1     MOV A, B      AB2     IN A          AInput Port3     MOV A, Im     AIm4     MOV B, A      BA5     ADD B, Im     BB + Im6     IN B          BInput Port7     MOV B, Im     BIm9     OUT B         OutputBB     OUT Im        OutputImE     JNC Im        if C=0: PCImF     JMP Im        PCIm├─────────────────────────────────────────────────────────────┤
UNUSED OPCODES: 8, A, C, D└─────────────────────────────────────────────────────────────┘

Appendix B: Comparison with Other Educational CPUs

FeatureTD4SAP-1HACK6502
Data width4-bit8-bit16-bit8-bit
Address space16 bytes16 bytes32K×264KB
Registers2 GP + OUT1 (A)2 (A,D)3 (A,X,Y)
Instructions125~2856
ALU operationsADD onlyADD, SUBADD, AND, NOTFull
RAMNoNo16KYes
StackNoNoCall stackYes
IC count~10~15~251
Clock cycles/instr161-22-7

Appendix C: Troubleshooting Guide

SymptomPossible CauseSolution
Nothing worksPower not connectedCheck VCC and GND
Random behaviorMissing decoupling capsAdd 0.1µF near each IC
PC doesn't countClock not reaching 161Check CLK wiring
Wrong instructionROM programmed incorrectlyVerify ROM contents
Carry always 0/1Carry FF not connectedCheck 74HC74 wiring
Jump doesn't workLOAD not reaching PCTrace decoder output
Output stuckOUT register not loadingCheck LOAD_OUT signal

Conclusion

The TD4 is a masterpiece of minimalist design. With just 10 ICs, it demonstrates:

  1. Fetch-Decode-Execute cycle in hardware
  2. Stored-program architecture (von Neumann concept)
  3. Control signal generation from opcodes
  4. Conditional branching with flags
  5. Register transfer operations

Building and understanding the TD4 gives you intuition that transfers directly to understanding real CPUs—you'll never look at a computer the same way again.

Suggested next steps after TD4:

  1. Build it on a breadboard
  2. Write several programs
  3. Add hardware modifications
  4. Move to 8-bit (SAP-1 or similar)
  5. Study the HACK computer (Nand2Tetris)
  6. Explore real vintage CPUs (6502, Z80)

Happy building! 🔧