- Published on
How Registers Connect to the ALU A Complete Deep Dive
- Authors

- Name
- Yinhuan Yuan
How Registers Connect to the ALU: A Complete Deep Dive
Let me explain the intricate dance between registers and the ALU - this is where the magic of computation actually happens!
The Big Picture: Data Flow Architecture
Overview of Connections
The Complete Data Path:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ A │ │ D │ │ PC │
│ Register│ │ Register│ │ Counter │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
│ A_Out[15:0] │ D_Out[15:0] │ PC_Out[14:0]
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ ROM │
│ │ │ (Instruct│
│ │ └────┬─────┘
│ │ │
│ │ │ Inst[15:0]
│ │ │
│ │ ▼
│ │ ┌──────────┐
│ │ │ Control │
│ │ │ Unit │
│ │ └────┬─────┘
│ │ │
│ │ Control Signals
│ │ │
▼ ▼ ▼
┌────────────────────────────────────────────┐
│ │
│ ALU (16-bit) │
│ │
│ X Input ◄───┐ Y Input ◄───┐ │
│ │ │ │
│ [A or M] [D] │
│ │ │ │
└──────────────┼────────────────────┼───────┘
│ │
│ ALU_Out[15:0] │
│ (Result) │
│ │
▼ ▼
Data Bus [15:0] ◄────► RAM
│
│
┌─────────┼─────────┐
│ │ │
▼ ▼ ▼
To A To D To M
(via LOAD_A)(LOAD_D) (WRITE_M)
The key insight: Registers feed the ALU, and ALU results feed back to registers. This creates a computational feedback loop.
Connection 1: D Register to ALU (Y Input)
The Direct Connection
This is the simplest connection:
Physical Connection:
────────────────────
D Register (U3, U4 - 74HC574s)
│
│ Q pins output D_Out[15:0]
│
├─ U3.Q0 (pin 6) ──► D_Out[0] ───┐
├─ U3.Q1 (pin 7) ──► D_Out[1] ───┤
├─ U3.Q2 (pin 8) ──► D_Out[2] ───┤
├─ U3.Q3 (pin 9) ──► D_Out[3] ───┤
├─ U3.Q4 (pin 12) ──► D_Out[4] ───┤
├─ U3.Q5 (pin 13) ──► D_Out[5] ───┤
├─ U3.Q6 (pin 14) ──► D_Out[6] ───┤
├─ U3.Q7 (pin 15) ──► D_Out[7] ───┤
│ │
├─ U4.Q0 (pin 6) ──► D_Out[8] ───┤
├─ U4.Q1 (pin 7) ──► D_Out[9] ───┤
├─ U4.Q2 (pin 8) ──► D_Out[10] ───┤
├─ U4.Q3 (pin 9) ──► D_Out[11] ───┤
├─ U4.Q4 (pin 12) ──► D_Out[12] ───┤
├─ U4.Q5 (pin 13) ──► D_Out[13] ───┤
├─ U4.Q6 (pin 14) ──► D_Out[14] ───┤
└─ U4.Q7 (pin 15) ──► D_Out[15] ───┤
│
16 wires │
│
▼
ALU Y Input Processing
(U9-U16 - the Y path)
│
┌─────────▼──────────┐
│ Zero Select (zy) │
│ 74HC157 × 4 │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ Negate (ny) │
│ 74HC86 × 4 │
└─────────┬──────────┘
│
Y_PROC[15:0]
(to ALU core)
Why This is Simple
Advantages:
──────────
✓ Direct connection (no multiplexing)
✓ D is ALWAYS available to ALU
✓ No switching delays
✓ Clean signal path
✓ Easy to route on PCB
The D register has one job:
└─ Feed data to ALU Y input
Always active, always available!
Signal Characteristics
Electrical Properties:
─────────────────────
Output Drive Strength (74HC574):
├─ Source current: 5.2mA (typical)
├─ Sink current: 5.2mA (typical)
├─ Output voltage HIGH: >4.4V at 4mA
├─ Output voltage LOW: <0.4V at 4mA
└─ Strong enough to drive multiple inputs
Loading:
├─ Each D_Out line drives:
│ ├─ 4× 74HC157 inputs (zy selection)
│ ├─ 4× 74HC86 inputs (ny negation)
│ └─ Total: ~2mA load per bit
└─ Well within 74HC574 capability
Propagation Time:
├─ Register output: 25ns
├─ To Y_PROC: +15ns (through zy/ny)
└─ Total: ~40ns D register to ALU core
PCB Routing Strategy
Recommended Layout (Top View):
┌──────────────────────────┐
│ D Register │
│ U3 U4 │
│ [7:0] [15:8] │
└─────┬──────┬─────────────┘
│ │
│ │ 16 parallel traces
│ │ Width: 0.4mm (15 mil)
│ │ Spacing: 0.4mm
│ │
┌─────▼──────▼─────────────┐
│ Y Input Conditioning │
│ U9-U16 │
│ (ALU preprocessing) │
└──────────────────────────┘
Trace Length: ~50mm typical
Propagation Delay: ~1ns (negligible)
Bus Routing:
├─ Group all 16 traces together
├─ Keep parallel to maintain timing
├─ Avoid crossing other buses
├─ Run on top layer (short path)
└─ No vias needed (stay on one layer)
Connection 2: A Register to ALU (X Input via Multiplexer)
This is more complex because we have a choice!
The A/M Selection Problem
The Dilemma:
────────────
ALU X input can be:
├─ A Register value (for calculations with A)
└─ Memory[A] value (for calculations with RAM data)
Example showing why we need both:
Operation 1: A = A + 1
├─ Need A register value
└─ X input = A
Operation 2: D = M + 1
├─ Need Memory value (RAM[A])
└─ X input = M (which is RAM[A])
Solution: MULTIPLEXER!
The Multiplexer Network
Complete A/M Selection Circuit:
From A Register: From RAM:
A_Out[15:0] M[15:0]
│ │
│ │
├─────────────┬──────────────────────┤
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ 74HC157 │ │ 74HC157 │ ... │ 74HC157 │
│ (4 bits) │ │ (4 bits) │ │ (4 bits) │
│ │ │ │ │ │
│ A[3:0] ─►│1A │ │ │
│ M[3:0] ─►│1B │ │ │
│ │ │ │ │
│ A_OR_M ──►│SELECT │◄──────┴───────────┘
│ │ │ (shared control)
│ GND ────►│/EN │
│ │ │
│ 1Y│ │
└───────┬───┘ │
│ │
▼ ▼
X_IN[3:0] X_IN[15:4]
│ │
└────────┬───────┘
│
▼
X Input to ALU
(X_IN[15:0])
│
▼
┌────────────────────┐
│ Zero Select (zx) │
│ 74HC157 × 4 │
└────────┬───────────┘
│
┌────────▼───────────┐
│ Negate (nx) │
│ 74HC86 × 4 │
└────────┬───────────┘
│
X_PROC[15:0]
(to ALU core)
Control Signal: A_OR_M
This comes from the instruction decoder:
Instruction Bit 12 (a-bit):
──────────────────────────
From Hack instruction format:
┌──┬──┬──┬─┬──────┬───┬───┐
│15│14│13│a│cccccc│ddd│jjj│
└──┴──┴──┴─┴──────┴───┴───┘
▲
This bit!
Bit 12 = a-bit = A_OR_M control
a = 0 → Use A register
a = 1 → Use Memory[A]
Examples:
─────────
D = A → a = 0 (use A)
D = M → a = 1 (use M)
D = A + 1 → a = 0 (use A)
D = M + 1 → a = 1 (use M)
Detailed Connection Schematic
Let me show ONE 4-bit slice in detail:
A Register outputs: RAM outputs:
A_Out[0] ────────┐ M[0] ────────┐
A_Out[1] ────┐ │ M[1] ────┐ │
A_Out[2] ──┐ │ │ M[2] ──┐ │ │
A_Out[3] ┐ │ │ │ M[3] ┐ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
┌──▼─▼─▼───▼──┐ ┌──▼─▼─▼───▼──┐
│ │ │ │
│ Pins 2,5, │ │ Pins 3,6, │
│ 11,14 │ │ 10,13 │
│ (A inputs) │ │ (B inputs) │
│ │ │ │
│ U_MUX │◄──────┤ A_OR_M │
│ 74HC157 │ │ (Pin 1) │
│ │ │ │
GND ─►│ Pin 15 (/EN│ │ │
│ │ │ │
│ Pins 4,7, │ │ │
│ 9,12 │ │ │
│ (Y outputs)│ │ │
└─────┬───────┘ └─────────────┘
│
├─ X_IN[0]
├─ X_IN[1]
├─ X_IN[2]
└─ X_IN[3]
│
▼
To zx/nx processing
(X input conditioning)
Pin Connections for 74HC157 (X input mux):
──────────────────────────────────────────
Pin 1: SELECT (A_OR_M from instruction)
Pin 2: 1A (A_Out[0])
Pin 3: 1B (M[0])
Pin 4: 1Y (X_IN[0])
Pin 5: 2A (A_Out[1])
Pin 6: 2B (M[1])
Pin 7: 2Y (X_IN[1])
Pin 8: GND
Pin 9: 3Y (X_IN[2])
Pin 10: 3B (M[2])
Pin 11: 3A (A_Out[2])
Pin 12: 4Y (X_IN[3])
Pin 13: 4B (M[3])
Pin 14: 4A (A_Out[3])
Pin 15: /EN (tied to GND - always enabled)
Pin 16: VCC
Repeat this 4 times for full 16 bits!
Timing Through the Multiplexer
Signal Propagation Path:
Step 1: Sources stable
──────────────────────
A_Out[15:0] valid at t=0
M[15:0] valid at t=0
A_OR_M control set at t=0
Step 2: Multiplexer selection
─────────────────────────────
74HC157 propagation: ~12ns
If A_OR_M = 0:
├─ A inputs selected
├─ A_Out → Y outputs
└─ X_IN = A_Out (after 12ns)
If A_OR_M = 1:
├─ B inputs selected
├─ M → Y outputs
└─ X_IN = M (after 12ns)
Step 3: To X conditioning
─────────────────────────
X_IN → zx/nx processing
Additional delay: ~15ns
Step 4: To ALU core
───────────────────
X_PROC arrives at ALU
Total delay: 27ns (12 + 15)
Timing Budget:
──────────────
A_Out stable: t = 0ns
Through MUX: t = 12ns
Through zx/nx: t = 27ns
Ready for ALU: t = 27ns ✓
This fits well within clock cycle!
Why Not Just Two Separate Inputs?
Bad Idea: Two separate X inputs to ALU
Hypothetical Design:
────────────────────
ALU with:
├─ X_A input (from A register)
├─ X_M input (from Memory)
└─ Select inside ALU
Problems:
─────────
✗ Need to route TWO 16-bit buses to ALU
✗ 32 wires instead of 16
✗ More complex ALU internal routing
✗ Harder PCB layout
✗ More crosstalk between buses
✗ Larger board area needed
Good Design: Mux BEFORE ALU
───────────────────────────
✓ Only ONE 16-bit bus to ALU
✓ 16 wires total
✓ Simpler ALU (no internal mux)
✓ Easier PCB routing
✓ Cleaner signal integrity
✓ Standard CPU design pattern
This is why ALL modern CPUs use
input multiplexing before ALU!
Connection 3: ALU Output Back to Registers
This is the feedback path - how results get saved!
The Result Distribution Network
ALU produces one result, but it can go to THREE places:
ALU_Out[15:0]
│
│ (16-bit result bus)
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ A │ │ D │ │ RAM │
│Register│ │Register│ │ [A] │
│ │ │ │ │ │
│D inputs│ │D inputs│ │D input │
└────┬───┘ └────┬───┘ └────┬───┘
│ │ │
LOAD_A LOAD_D WRITE_M
(control) (control) (control)
Only ONE destination active at a time!
(Controlled by instruction decode)
Physical Bus Connection
ALU Output Stage:
─────────────────
From ALU (U29-U32 output XOR gates):
U29.Q0 (pin 3) ──► ALU_Out[0] ───┐
U29.Q1 (pin 6) ──► ALU_Out[1] ───┤
U29.Q2 (pin 8) ──► ALU_Out[2] ───┤
U29.Q3 (pin 11) ──► ALU_Out[3] ───┤
U30.Q0 (pin 3) ──► ALU_Out[4] ───┤
U30.Q1 (pin 6) ──► ALU_Out[5] ───┤
U30.Q2 (pin 8) ──► ALU_Out[6] ───┤
U30.Q3 (pin 11) ──► ALU_Out[7] ───┤
U31.Q0 (pin 3) ──► ALU_Out[8] ───┤
U31.Q1 (pin 6) ──► ALU_Out[9] ───┤
U31.Q2 (pin 8) ──► ALU_Out[10] ───┤
U31.Q3 (pin 11) ──► ALU_Out[11] ───┤
U32.Q0 (pin 3) ──► ALU_Out[12] ───┤
U32.Q1 (pin 6) ──► ALU_Out[13] ───┤
U32.Q2 (pin 8) ──► ALU_Out[14] ───┤
U32.Q3 (pin 11) ──► ALU_Out[15] ───┤
│
16 wires │
(Result Bus) │
│
┌───────────────────────────────┘
│
├──► To A Register (U1.D0-7, U2.D0-7)
│ (loaded when LOAD_A pulses)
│
├──► To D Register (U3.D0-7, U4.D0-7)
│ (loaded when LOAD_D pulses)
│
└──► To RAM input (via 74HC245 transceivers)
(stored when WRITE_M pulses)
The Control Signals (Destination Select)
Instruction Decode → Destination Control:
From instruction bits [5:3] (destination bits):
Bit 5 (d1): destA → LOAD_A
Bit 4 (d2): destD → LOAD_D
Bit 3 (d3): destM → WRITE_M
Destination Decoder Logic:
─────────────────────────
Instruction[5] (destA) ──┐
├─ AND ─► LOAD_A
C_INST (is C-inst?) ────┘
(only load on C-instruction)
Instruction[4] (destD) ──┐
├─ AND ─► LOAD_D
C_INST ─────────────────┘
Instruction[3] (destM) ──┐
├─ AND ─► WRITE_M
C_INST ─────────────────┘
Using 74HC08 (AND gates):
┌─────────┐
Inst[5]─►│1 3 │─► LOAD_A
C_INST──►│2 74HC08│
└─────────┘
┌─────────┐
Inst[4]─►│4 6 │─► LOAD_D
C_INST──►│5 74HC08│
└─────────┘
┌─────────┐
Inst[3]─►│10 11 │─► WRITE_M
C_INST──►│9 74HC08│
└─────────┘
Multiple Destination Capability
Hack can write to multiple destinations simultaneously!
Example: D = A + 1; A = D
Assembly: AD=A+1
Instruction bits:
├─ destA = 1 (bit 5)
├─ destD = 1 (bit 4)
└─ destM = 0 (bit 3)
Result:
─────
Both LOAD_A and LOAD_D pulse simultaneously!
ALU_Out = A + 1
│
├──────┬───────┐
▼ ▼ │
┌────────┐ ┌────────┐ │
│ A │ │ D │ │
└────────┘ └────────┘ │
│ │ │
LOAD_A LOAD_D (WRITE_M=0)
║ ║
╚══════════╝
Both pulse!
After one cycle:
├─ A ← ALU_Out
├─ D ← ALU_Out
└─ Both registers get same value!
This is efficient! One computation → two saves
Timing Diagram: Complete Feedback Path
Clock Cycle Breakdown:
Phase 1: Register Output (0-25ns)
──────────────────────────────────
t=0ns: Clock rises
t=5ns: Register output valid
t=25ns: A_Out, D_Out stable
┌────────┐
│A or D │
│Register│
└───┬────┘
│ 25ns
▼
Phase 2: Through Mux (25-40ns)
───────────────────────────────
t=25ns: A_OUT or M arrives at mux
t=27ns: A_OR_M control stable
t=37ns: Mux output (X_IN) stable
┌────────┐
│ MUX │
│ 74HC157│
└───┬────┘
│ 12ns
▼
Phase 3: ALU Processing (40-100ns)
───────────────────────────────────
t=40ns: X_PROC, Y_PROC ready
t=50ns: Through zx/nx/zy/ny
t=65ns: ADD/AND computed
t=80ns: Function selected (f)
t=100ns: Output negated (no)
ALU_Out stable ✓
┌────────┐
│ ALU │
│ (34ICs)│
└───┬────┘
│ 60ns
▼
Phase 4: Back to Register (100-110ns)
──────────────────────────────────────
t=100ns: ALU_Out drives result bus
t=105ns: Data stable at register inputs
t=110ns: Clock edge!
├─ LOAD_A/D pulses
├─ Data captured
└─ New value stored
┌────────┐
│Register│
│ Input │
└───┬────┘
│ 10ns setup
▼
CLOCK EDGE
Total Loop Time: 110ns
Maximum Frequency: ~9 MHz
Critical Path:
Register → Mux → ALU → Register
25ns + 12ns + 60ns + 10ns + 3ns = 110ns
Connection 4: Memory in the Loop
RAM Connection to ALU X Input
Memory Access Path:
A Register holds address
│
▼
┌──────────┐
│ RAM │
│ AS6C4008 │
│ [A14:0] │
└────┬─────┘
│ D[15:0]
│ (read data)
│
├───────────────────┐
│ │
▼ ▼
To X_IN mux To Data Bus
(for ALU) (for registers)
│
▼
ALU computation
│
▼
Result back to
RAM or registers
The Complete M (Memory) Path
Detailed Memory Interface:
A Register:
A_Out[14:0] ────────────────┬───► RAM Address
│
│
RAM Chip (AS6C4008 × 2): │
│
┌──────────────────┐ │
│ Address[14:0] ◄──┴──────┘
│ │
│ /OE ◄─── READ_M │
│ /WE ◄─── WRITE_M│
│ /CE ◄─── GND │
│ │
│ D[15:0] ◄──────┤
└──────┬──────────┘
│
│ M[15:0] (read data)
│
├──────────────────────────┬────────────┐
│ │ │
▼ ▼ ▼
To X_IN MUX To D_In To A_In
(ALU X input) (D reg) (A reg)
│ │ │
Select via Load via Load via
A_OR_M=1 LOAD_D LOAD_A
Memory Read Cycle
Operation: D = M
Timeline:
─────────
t=0ns: A register holds address
A_Out = 0x0100 (example)
t=10ns: RAM receives address
Address decode begins
t=60ns: RAM access complete
M[0x0100] data valid
M[15:0] = 0xABCD (example)
t=65ns: Data reaches X_IN mux
A_OR_M = 1 (select M)
X_IN = M = 0xABCD
t=80ns: Through zx/nx processing
X_PROC = 0xABCD
t=90ns: ALU computes (might just pass through)
ALU_Out = 0xABCD
t=110ns: Clock edge
LOAD_D pulses
D ← 0xABCD ✓
Timing Diagram:
───────────────
CLK: ───╗ ╔════╗ ╔════
╚═════╝ ╚═════╝
A_Out: ──── 0x0100 ───────────── (stable)
M[15:0]: ─────┌─── 0xABCD ───────── (after access)
└─ 60ns access time
X_IN: ─────┌─── 0xABCD ───────── (mux selects M)
└─ 65ns
ALU_Out: ─────────┌── 0xABCD ────── (ALU passes through)
└─ 90ns
LOAD_D: ─────────────╗ ╔═══════ (pulse at clock)
╚════╝
D_Out: ──── old ────┴─── 0xABCD ── (updated)
Memory Write Cycle
Operation: M = D
Timeline:
─────────
t=0ns: A register holds address
D register holds data
A_Out = 0x0100
D_Out = 0x1234
t=10ns: RAM receives address
Address decode
t=50ns: Control signals stable
WRITE_M = 1 (active)
/WE goes LOW
t=60ns: D_Out drives data bus
Data = 0x1234
t=110ns: Clock edge
WRITE_M pulses LOW
RAM captures data
RAM[0x0100] ← 0x1234 ✓
Timing Diagram:
───────────────
CLK: ───╗ ╔════╗ ╔════
╚═════╝ ╚═════╝
A_Out: ──── 0x0100 ───────────── (address)
D_Out: ──── 0x1234 ───────────── (data to write)
WRITE_M: ───────╗ ╔═══════ (write enable)
╚═════════╝
/WE: ════════╗ ╔═══════ (active LOW)
╚═════════╝
Data_Bus: ──── 0x1234 ─────────── (D drives bus)
RAM[100]: ──── old ───┴─── 0x1234 ── (written!)
The Tri-State Bus System
Why Tri-State Matters
Problem: Multiple Drivers
─────────────────────────
Without tri-state:
A_Out ──► ║
║
D_Out ──► ║──► CONFLICT!
║ (bus fight)
M_Out ──► ║
All trying to drive bus simultaneously!
With tri-state:
A_Out ──[/OE]──► ─┐
│
D_Out ──[/OE]──► ─┤──► Bus (only one active)
│
M_Out ──[/OE]──► ─┘
Only one driver active at a time
Others are "floating" (hi-Z)
Bus Arbitration
Data Bus Control (16-bit shared bus):
┌──────────────────────────────┐
│ Main Data Bus │
│ (16 wires) │
└─┬───────┬────────┬───────┬──┘
│ │ │ │
▼ ▼ ▼ ▼
┌────┐ ┌────┐ ┌────┐ ┌────┐
│ A │ │ D │ │RAM │ │ALU │
│Reg │ │Reg │ │I/F │ │Out │
└─┬──┘ └─┬──┘ └─┬──┘ └─┬──┘
│ │ │ │
/OE_A /OE_D /OE_M /OE_ALU
Control Logic:
──────────────
Only ONE /OE can be active (LOW) at a time
Example: Loading D from A
├─ /OE_A = 0 (A drives bus)
├─ /OE_D = 1 (D listening, not driving)
├─ /OE_M = 1 (RAM disconnected)
├─ /OE_ALU = 1 (ALU disconnected)
└─ LOAD_D pulses → D captures A's value
74HC245 Bus Transceiver (for RAM)
RAM needs bidirectional access:
74HC245 (Bus Transceiver)
┌──────────────┐
│ │
CPU Bus ◄┼──► A side │
[15:0] │ │
│ DIR ◄───┤─── READ/WRITE
│ │ (direction control)
│ │
RAM Bus ◄┼──► B side │
[15:0] │ │
│ /OE ◄────┤─── ENABLE
│ │ (3-state control)
└──────────────┘
DIR = 0: B → A (read from RAM)
DIR = 1: A → B (write to RAM)
/OE = 0: Transceiver active
/OE = 1: All outputs hi-Z
Pin Connections (74HC245):
──────────────────────────
Pin 1: DIR (direction control)
├─ 0 = B→A (RAM to CPU)
└─ 1 = A→B (CPU to RAM)
Pin 19: /OE (output enable)
├─ 0 = Active
└─ 1 = Hi-Z (disabled)
Pins 2-9: A side (CPU data bus)
Pins 11-18: B side (RAM data bus)
Control Logic:
──────────────
DIR = WRITE_M (1 when writing, 0 when reading)
/OE = 0 (always enabled when RAM selected)
Complete Operation Examples
Let me trace THREE complete operations showing all connections:
Example 1: D = D + 1 (Simple Feedback)
Instruction: D = D + 1
Control Signals:
────────────────
ALU_CTRL = 011111 (increment Y input)
A_OR_M = 0 (don't care, not using X)
LOAD_D = 1 (save to D)
LOAD_A = 0 (don't save to A)
WRITE_M = 0 (don't write memory)
Phase 1: D Register Output (0-25ns)
────────────────────────────────────
D_Out = 0x0005 (current value)
┌──────────┐
│ D Reg │
│ U3, U4 │
│ 74HC574 │
└────┬─────┘
│
│ Q outputs: 0x0005
│
▼
Phase 2: To ALU Y Input (25-40ns)
──────────────────────────────────
D_Out → Y input processing
D_Out[15:0] = 0x0005
│
▼
┌──────────┐
│ U9-U16 │ Zero select (zy=0, pass through)
│ 74HC157 │
└────┬─────┘
│
Y_ZERO = 0x0005
│
▼
┌──────────┐
│ U10,12, │ Negate (ny=1, increment needs this)
│ 14,16 │
│ 74HC86 │
└────┬─────┘
│
Y_PROC = 0x0005
│
▼
Phase 3: ALU Computation (40-100ns)
────────────────────────────────────
X input doesn't matter (using zx=1 → X=0)
Y_PROC = 0x0005
ALU operation: X + Y + 1
Since zx=1: X = 0
Result = 0 + 5 + 1 = 6
┌──────────────┐
│ ALU Core │
│ U17-U20 │ Adders compute: 0 + 5
│ 74HC283 │ Result: 5
└──────┬───────┘
│
▼
┌──────────────┐
│ Output Stage │
│ U29-U32 │ no=1 → negate
│ 74HC86 │ !5 = ... wait
└──────┬───────┘
│
Actually, let me recalculate:
ALU_CTRL = 011111
zx=0, nx=1, zy=1, ny=1, f=1, no=1
Step by step:
1. X path: zx=0,nx=1 → X becomes !X
2. Y path: zy=1,ny=1 → Y becomes !0 = 0xFFFF
3. Add: !X + 0xFFFF
4. Negate output
Actually for D+1, control is:
zx=0, nx=1, zy=1, ny=1, f=1, no=1
Hmm, let me check the actual encoding...
For Y+1 operation:
zx=1, nx=1, zy=0, ny=1, f=1, no=1
Let me trace correctly:
1. zx=1 → X = 0
2. nx=1 → X = !0 = 0xFFFF
3. zy=0 → use Y (D value)
4. ny=1 → Y = !D = !0x0005 = 0xFFFA
5. f=1 → ADD: 0xFFFF + 0xFFFA = 0xFFF9
6. no=1 → !0xFFF9 = 0x0006 ✓
ALU_Out = 0x0006
Phase 4: Back to D Register (100-110ns)
────────────────────────────────────────
ALU_Out drives data bus
D register inputs receive 0x0006
t=110ns: Clock edge!
LOAD_D pulses
┌──────────┐
│ D Reg │
│ U3, U4 │
│ D inputs │ ◄─── 0x0006
└────┬─────┘
▲
│
CLK pulses
│
D_Out = 0x0006 ✓
Result Check:
─────────────
Before: D = 0x0005
After: D = 0x0006 ✓
Incremented successfully!
Example 2: D = A + D (Two Inputs)
Instruction: D = A + D
Initial Values:
───────────────
A = 0x0003
D = 0x0005
Control Signals:
────────────────
ALU_CTRL = 000010 (add X and Y)
zx=0, nx=0, zy=0, ny=0, f=1, no=0
A_OR_M = 0 (use A, not M)
LOAD_D = 1
LOAD_A = 0
WRITE_M = 0
Phase 1: Both Registers Output (0-25ns)
────────────────────────────────────────
┌──────────┐ ┌──────────┐
│ A Reg │ │ D Reg │
│ U1, U2 │ │ U3, U4 │
└────┬─────┘ └────┬─────┘
│ │
A_Out = 0x0003 D_Out = 0x0005
│ │
▼ ▼
Phase 2: A → X Input MUX (25-40ns)
───────────────────────────────────
A_Out and M both available to mux
A_OR_M = 0 → select A
┌──────────────────────┐
│ X Input MUX │
│ 4× 74HC157 │
│ │
│ A[15:0] ──┐ │
│ ├─ MUX ──► │ X_IN = 0x0003
│ M[15:0] ──┘ │
│ │
│ A_OR_M = 0 (select A)│
└──────────┬───────────┘
│
▼
X_IN = 0x0003
Phase 3: Input Conditioning (40-50ns)
──────────────────────────────────────
X path: zx=0, nx=0 → pass through
Y path: zy=0, ny=0 → pass through
X_PROC = 0x0003
Y_PROC = 0x0005
Phase 4: ALU Core (50-100ns)
─────────────────────────────
ADD operation (f=1):
X_PROC = 0x0003
+ Y_PROC = 0x0005
─────────────────
SUM = 0x0008
┌───────────────────┐
│ Adder Chain │
│ U17-U20 │
│ 74HC283 × 4 │
│ │
│ 0x0003 + 0x0005 │
│ = 0x0008 │
└─────────┬─────────┘
│
SUM = 0x0008
│
▼
┌───────────────────┐
│ Function Select │
│ U25-U28 │
│ 74HC157 × 4 │
│ │
│ f=1 → select ADD │
└─────────┬─────────┘
│
F_OUT = 0x0008
│
▼
┌───────────────────┐
│ Output Negation │
│ U29-U32 │
│ 74HC86 × 4 │
│ │
│ no=0 → pass thru │
└─────────┬─────────┘
│
ALU_Out = 0x0008
Phase 5: Save to D (100-110ns)
───────────────────────────────
ALU_Out → Data Bus → D_In
t=110ns: LOAD_D pulses
┌──────────┐
│ D Reg │
│ │
│ D ← 0x0008│
└──────────┘
Result Check:
─────────────
Before: A = 0x0003, D = 0x0005
After: A = 0x0003, D = 0x0008 ✓
Sum computed and saved!
Connection Summary:
───────────────────
A_Out → MUX → X_PROC → ALU ──┐
├─► ALU_Out → D_In
D_Out ──────► Y_PROC → ALU ──┘
Both inputs used simultaneously!
Example 3: M = D + M (Memory Involved)
Instruction: M = D + M
Initial Values:
───────────────
A = 0x0100 (points to RAM address 256)
D = 0x0007
RAM[256] = 0x0003
Control Signals:
────────────────
ALU_CTRL = 000010 (add)
A_OR_M = 1 (use M, not A)
LOAD_D = 0
LOAD_A = 0
WRITE_M = 1 (save to memory!)
Phase 1: Memory Read (0-60ns)
──────────────────────────────
A register outputs address:
┌──────────┐
│ A Reg │
│ │
│ A = 0x0100│
└────┬─────┘
│
A_Out = 0x0100
│
▼
┌──────────┐
│ RAM │
│ AS6C4008 │
│ │
│ Addr ←───┤ 0x0100
│ │
│ /OE = 0 │ (read enabled)
│ /WE = 1 │ (not writing yet)
└────┬─────┘
│
(60ns access time)
│
M[256] = 0x0003
│
▼
Phase 2: M → X Input MUX (60-75ns)
───────────────────────────────────
A_Out and M both available
A_OR_M = 1 → select M
┌──────────────────────┐
│ X Input MUX │
│ │
│ A = 0x0100 ──┐ │
│ ├─MUX ─►│ X_IN = 0x0003
│ M = 0x0003 ──┘ │
│ │
│ A_OR_M = 1 (select M)│
└──────────┬───────────┘
│
X_IN = 0x0003
(memory value!)
Phase 3: Both Inputs to ALU (75-100ns)
───────────────────────────────────────
X path: M value (via mux)
Y path: D value (direct)
X_PROC = 0x0003 (from M)
+ Y_PROC = 0x0007 (from D)
─────────────────
ALU_Out = 0x000A
Phase 4: Write to Memory (100-150ns)
─────────────────────────────────────
ALU_Out goes to data bus
A still holds address (0x0100)
t=110ns: Clock edge
WRITE_M pulses LOW
RAM /WE goes LOW
┌──────────┐
│ RAM │
│ │
│ Addr = 0x0100 (from A)
│ Data = 0x000A (from ALU)
│ │
│ /WE ─────┘ LOW (write!)
│ │
│ RAM[256] ← 0x000A │
└──────────┘
Result Check:
─────────────
Before:
├─ A = 0x0100 (address)
├─ D = 0x0007 (data)
└─ RAM[256] = 0x0003
After:
├─ A = 0x0100 (unchanged)
├─ D = 0x0007 (unchanged)
└─ RAM[256] = 0x000A ✓
(0x0003 + 0x0007 = 0x000A)
Connection Summary:
───────────────────
RAM[A] → M → MUX → X_PROC → ALU ──┐
├─► ALU_Out
D_Out ─────────► Y_PROC → ALU ────┘ │
│
A_Out ──────────────────────────────┐ │
▼ │
RAM Address │
│ │
RAM ◄─────────┘
(write back)
Read from memory, compute, write back!
PCB Routing Strategies
Layer Assignment Philosophy
TOP LAYER (F.Cu): Signal Routing
─────────────────────────────────
├─ Register outputs to ALU inputs
├─ ALU output to register inputs
├─ Control signals (LOAD_A, LOAD_D, etc)
├─ Short, direct point-to-point connections
└─ Horizontal preference
BOTTOM LAYER (B.Cu): Power + Vertical Signals
──────────────────────────────────────────────
├─ Ground plane (large pour)
├─ +5V plane (where needed)
├─ Vertical signal crossings
├─ Return paths for all signals
└─ Memory data bus (needs more routing space)
Critical Trace Groups
Group 1: A_Out to X MUX (16 traces)
────────────────────────────────────
From: A Register (U1, U2)
To: X Input MUX (74HC157 × 4)
Routing:
├─ Length: 40-60mm typical
├─ Width: 0.4mm (15 mil)
├─ Spacing: 0.3mm (12 mil)
├─ Layer: Top (F.Cu)
├─ Keep parallel
└─ No vias
Termination: None needed (short traces)
Group 2: D_Out to Y Processing (16 traces)
───────────────────────────────────────────
From: D Register (U3, U4)
To: Y Input Processing (74HC157 × 4)
Routing:
├─ Length: 40-60mm typical
├─ Width: 0.4mm
├─ Spacing: 0.3mm
├─ Layer: Top (F.Cu)
├─ Straight run
└─ Shortest path possible
Group 3: ALU_Out to Registers (16 traces)
──────────────────────────────────────────
From: ALU Output (U29-U32)
To: A_In, D_In, M_In
Routing:
├─ Length: 50-80mm
├─ Width: 0.4mm
├─ Spacing: 0.3mm
├─ Layer: Top and Bottom (may need vias)
├─ Fan-out to three destinations
└─ Star topology from ALU
Group 4: Control Signals (6 traces)
────────────────────────────────────
Signals: LOAD_A, LOAD_D, WRITE_M, A_OR_M, etc.
Routing:
├─ Length: Variable (star topology)
├─ Width: 0.4mm
├─ Layer: Top (F.Cu)
├─ Keep away from data buses
├─ No crosstalk with data
└─ Buffer if fanout > 4
Example Board Layout
Physical arrangement (10cm × 15cm board):
0mm ┌────────────────────────────────────┐
│ [Connectors: J1-J6] │
├────────────────────────────────────┤
20mm │ [A Register: U1-U2] │
│ ║ 16 traces │
30mm │ ║ │
│ ▼ │
40mm │ [X Input MUX: 74HC157 × 4] │
│ │
│ [D Register: U3-U4] │
│ ║ 16 traces │
60mm │ ▼ │
│ [Y Input Processing] │
├────────────────────────────────────┤
80mm │ [ALU CORE] │
│ Adders, AND gates, MUXes │
│ (largest section - 20+ ICs) │
120mm ├────────────────────────────────────┤
│ [Output Stage] │
│ ║ │
│ ║ 16 traces (result bus) │
│ ║ │
│ ╚═══╦═══╗ │
│ ║ ║ │
140mm │ To A To D To RAM │
150mm └────────────────────────────────────┘
Trace Routing:
──────────────
A→MUX: Vertical, left side
D→Y: Vertical, right side
ALU→Regs: Fan-out from center
Control: Horizontal, top layer
Power: Bottom layer (flood)
Signal Integrity Considerations
Capacitive Loading
Each output drives multiple inputs:
Example: D_Out[0] fanout
────────────────────────
D Register U3.Q0 (pin 6)
│
├─► Y Input MUX (U9 pin 2)
│ Load: 1pF
│
├─► (Future expansion points)
│
Total capacitance: ~5pF
Output drive capability:
74HC574 can drive 50pF easily
No buffer needed!
Critical case: ALU_Out fanout
──────────────────────────────
ALU_Out[0] from U29 (pin 3)
│
├─► A Register (U1 pin 2) 1pF
├─► D Register (U3 pin 2) 1pF
├─► RAM interface (transceiver) 2pF
├─► Test point 1pF
│
Total: ~5pF + trace capacitance
Trace capacitance:
50mm trace @ 100pF/m = 5pF
Total: 10pF - still okay!
Rise time calculation:
R_source × C_load = 100Ω × 10pF = 1ns
Fast enough for MHz operation ✓
Crosstalk Prevention
Problem: Parallel traces coupling
──────────────────────────────────
When traces run parallel:
Signal A ────────────────
│ (coupled capacitance)
Signal B ────────────────
Fast edge on A induces noise on B!
Solution: Proper spacing
────────────────────────
Signal A ────────────────
(0.3mm gap)
Signal B ────────────────
Crosstalk < 5% with 0.3mm spacing
(at 1MHz, 0.4mm traces)
Additional techniques:
──────────────────────
1. Ground trace between critical signals
Data0 ──── GND ──── Data1 ────
2. Different layers for different buses
Top: A_Out bus
Bottom: D_Out bus (no parallel run)
3. Perpendicular crossing when needed
────────── (A_Out, horizontal)
│
│ (D_Out, vertical)
│
Ground Bounce (Power Integrity)
Problem: Simultaneous Switching
────────────────────────────────
When all 16 ALU outputs switch:
├─ 16 × 5mA = 80mA current spike
├─ Through ground inductance
├─ Creates voltage bounce
└─ Can cause false triggering!
Ground bounce voltage:
V = L × (di/dt)
= 10nH × (80mA / 2ns)
= 0.4V bounce!
Solution: Decoupling Capacitors
────────────────────────────────
Place 0.1µF cap at EVERY IC:
VCC ─────┬───── IC
│
┌┴┐ 0.1µF
│ │ (very close!)
└┬┘
│
GND ─────┴───── IC
Capacitor provides local current:
├─ IC switches → draws current from cap
├─ Cap is close (low inductance)
├─ Ground bounce reduced to ~50mV
└─ Safe for operation! ✓
Additional bulk caps:
├─ 10µF at power entry
├─ 100µF for entire board
└─ Stabilizes power supply
Performance Analysis
Maximum Operating Frequency
Critical Path Delay Budget:
───────────────────────────
Register output: 25ns
├─ 74HC574 CLK-to-Q
Through input mux: 12ns
├─ 74HC157 propagation
Input conditioning: 15ns
├─ Zero select (74HC157)
├─ Negate (74HC86)
ALU core: 60ns
├─ Preprocessing (already counted)
├─ Adder carry chain: 40ns
├─ Function select: 10ns
├─ Output negate: 10ns
Register setup: 12ns
├─ 74HC574 setup time
Clock skew allowance: 6ns
├─ Clock distribution variation
TOTAL: 130ns
Maximum frequency: 7.7 MHz
Conservative design: 100ns cycle (10 MHz)
Typical operation: 125ns cycle (8 MHz)
Instruction Throughput
Instructions Per Second:
────────────────────────
At 8 MHz clock:
├─ 8,000,000 cycles/second
├─ Typical instruction: 1 cycle
└─ 8 MIPS (Million Instructions Per Second)
Some instructions need multiple cycles:
├─ Memory access: 1 cycle
├─ Jump: 1 cycle (PC load)
├─ Complex calculations: 1 cycle
└─ Almost everything: 1 cycle!
Effective throughput: ~7-8 MIPS
Comparison:
───────────
Architecture Frequency MIPS
────────────────────────────────────────
Hack (our design) 8 MHz ~8
Original 8080 2 MHz ~0.5
MOS 6502 1 MHz ~1
Z80 4 MHz ~1
68000 8 MHz ~1
Our Hack is competitive!
(Simple architecture = high efficiency)
Summary: The Connection Philosophy
Key Design Principles
1. Direct Paths Where Possible
D → ALU Y input: Direct
└─ No multiplexing needed
└─ Simplest, fastest path
2. Multiplex Only When Necessary
A vs M → ALU X input: Multiplexed
└─ Need to choose between two sources
└─ One 74HC157 mux per 4 bits
3. Shared Result Bus
ALU → Registers: Common bus
└─ Time-multiplexed (control signals)
└─ Only one destination active at once
4. Synchronous Operation
Everything clocked together
└─ Predictable timing
└─ No race conditions
5. Feedback Loop
Register → ALU → Register
└─ Computational loop
└─ Enables iterative calculations
The Elegant Integration
The register-ALU connections show how simple components create complex computation:
Just three types of connections:
├─ Register outputs (sources)
├─ ALU inputs (via mux when needed)
└─ ALU output (feedback to registers)
These three connections enable:
├─ Arithmetic (+, -, increment, decrement)
├─ Logic (AND, OR, NOT)
├─ Memory access (load, store)
├─ Program flow (jumps based on ALU flags)
└─ Complete computation! ✓
The connections are the nervous system of the computer - they carry data and enable the computational feedback loop that makes programming possible!
Would you like me to explain:
- The memory interface connections in the same detail?
- How the control unit generates all these signals?
- The complete fetch-decode-execute cycle with all connections active?
- How to optimize this design for higher speeds?