Blog Logo

2025-11-21 ~ 27 min read

How Registers Connect to the ALU A Complete Deep Dive


How Registers Connect to the ALU: A Complete Deep Dive

Let me explain the intricate dance between registers and the ALU - this is where the magic of computation actually happens!


The Big Picture: Data Flow Architecture

Overview of Connections

The Complete Data Path:

    ┌─────────┐       ┌─────────┐       ┌─────────┐
    │    A    │       │    D    │       │   PC    │
    │ Register│       │ Register│       │ Counter │
    └────┬────┘       └────┬────┘       └────┬────┘
         │                 │                  │
         │ A_Out[15:0]     │ D_Out[15:0]      │ PC_Out[14:0]
         │                 │                  │
         │                 │                  ▼
         │                 │            ┌──────────┐
         │                 │            │   ROM    │
         │                 │            │ (Instruct│
         │                 │            └────┬─────┘
         │                 │                 │
         │                 │                 │ Inst[15:0]
         │                 │                 │
         │                 │                 ▼
         │                 │           ┌──────────┐
         │                 │           │ Control  │
         │                 │           │   Unit   │
         │                 │           └────┬─────┘
         │                 │                │
         │                 │         Control Signals
         │                 │                │
         ▼                 ▼                ▼
    ┌────────────────────────────────────────────┐
    │                                            │
    │              ALU (16-bit)                  │
    │                                            │
    │  X Input ◄───┐        Y Input ◄───┐       │
    │              │                    │       │
    │         [A or M]               [D]        │
    │              │                    │       │
    └──────────────┼────────────────────┼───────┘
                   │                    │
                   │ ALU_Out[15:0]      │
                   │ (Result)           │
                   │                    │
                   ▼                    ▼
              Data Bus [15:0] ◄────► RAM


         ┌─────────┼─────────┐
         │         │         │
         ▼         ▼         ▼
      To A      To D      To M
    (via LOAD_A)(LOAD_D) (WRITE_M)

The key insight: Registers feed the ALU, and ALU results feed back to registers. This creates a computational feedback loop.


Connection 1: D Register to ALU (Y Input)

The Direct Connection

This is the simplest connection:

Physical Connection:
────────────────────

D Register (U3, U4 - 74HC574s)

    │ Q pins output D_Out[15:0]

    ├─ U3.Q0 (pin 6)  ──► D_Out[0]  ───┐
    ├─ U3.Q1 (pin 7)  ──► D_Out[1]  ───┤
    ├─ U3.Q2 (pin 8)  ──► D_Out[2]  ───┤
    ├─ U3.Q3 (pin 9)  ──► D_Out[3]  ───┤
    ├─ U3.Q4 (pin 12) ──► D_Out[4]  ───┤
    ├─ U3.Q5 (pin 13) ──► D_Out[5]  ───┤
    ├─ U3.Q6 (pin 14) ──► D_Out[6]  ───┤
    ├─ U3.Q7 (pin 15) ──► D_Out[7]  ───┤
    │                                  │
    ├─ U4.Q0 (pin 6)  ──► D_Out[8]  ───┤
    ├─ U4.Q1 (pin 7)  ──► D_Out[9]  ───┤
    ├─ U4.Q2 (pin 8)  ──► D_Out[10] ───┤
    ├─ U4.Q3 (pin 9)  ──► D_Out[11] ───┤
    ├─ U4.Q4 (pin 12) ──► D_Out[12] ───┤
    ├─ U4.Q5 (pin 13) ──► D_Out[13] ───┤
    ├─ U4.Q6 (pin 14) ──► D_Out[14] ───┤
    └─ U4.Q7 (pin 15) ──► D_Out[15] ───┤

                            16 wires    │


                              ALU Y Input Processing
                              (U9-U16 - the Y path)

                              ┌─────────▼──────────┐
                              │ Zero Select (zy)   │
                              │ 74HC157 × 4        │
                              └─────────┬──────────┘

                              ┌─────────▼──────────┐
                              │ Negate (ny)        │
                              │ 74HC86 × 4         │
                              └─────────┬──────────┘

                                   Y_PROC[15:0]
                                   (to ALU core)

Why This is Simple

Advantages:
──────────
✓ Direct connection (no multiplexing)
✓ D is ALWAYS available to ALU
✓ No switching delays
✓ Clean signal path
✓ Easy to route on PCB

The D register has one job:
└─ Feed data to ALU Y input

Always active, always available!

Signal Characteristics

Electrical Properties:
─────────────────────

Output Drive Strength (74HC574):
├─ Source current: 5.2mA (typical)
├─ Sink current: 5.2mA (typical)
├─ Output voltage HIGH: >4.4V at 4mA
├─ Output voltage LOW: <0.4V at 4mA
└─ Strong enough to drive multiple inputs

Loading:
├─ Each D_Out line drives:
│   ├─ 4× 74HC157 inputs (zy selection)
│   ├─ 4× 74HC86 inputs (ny negation)
│   └─ Total: ~2mA load per bit
└─ Well within 74HC574 capability

Propagation Time:
├─ Register output: 25ns
├─ To Y_PROC: +15ns (through zy/ny)
└─ Total: ~40ns D register to ALU core

PCB Routing Strategy

Recommended Layout (Top View):

    ┌──────────────────────────┐
    │    D Register            │
    │    U3      U4            │
    │   [7:0]   [15:8]         │
    └─────┬──────┬─────────────┘
          │      │
          │      │  16 parallel traces
          │      │  Width: 0.4mm (15 mil)
          │      │  Spacing: 0.4mm
          │      │
    ┌─────▼──────▼─────────────┐
    │    Y Input Conditioning   │
    │    U9-U16                │
    │    (ALU preprocessing)    │
    └──────────────────────────┘

Trace Length: ~50mm typical
Propagation Delay: ~1ns (negligible)

Bus Routing:
├─ Group all 16 traces together
├─ Keep parallel to maintain timing
├─ Avoid crossing other buses
├─ Run on top layer (short path)
└─ No vias needed (stay on one layer)

Connection 2: A Register to ALU (X Input via Multiplexer)

This is more complex because we have a choice!

The A/M Selection Problem

The Dilemma:
────────────

ALU X input can be:
├─ A Register value (for calculations with A)
└─ Memory[A] value (for calculations with RAM data)

Example showing why we need both:

Operation 1: A = A + 1
├─ Need A register value
└─ X input = A

Operation 2: D = M + 1
├─ Need Memory value (RAM[A])
└─ X input = M (which is RAM[A])

Solution: MULTIPLEXER!

The Multiplexer Network

Complete A/M Selection Circuit:

From A Register:                     From RAM:
A_Out[15:0]                          M[15:0]
    │                                    │
    │                                    │
    ├─────────────┬──────────────────────┤
    │             │                      │
    ▼             ▼                      ▼
┌───────────┐ ┌───────────┐       ┌───────────┐
│  74HC157  │ │  74HC157  │  ...  │  74HC157  │
│  (4 bits) │ │  (4 bits) │       │  (4 bits) │
│           │ │           │       │           │
│  A[3:0] ─►│1A          │       │           │
│  M[3:0] ─►│1B          │       │           │
│           │            │       │           │
│ A_OR_M ──►│SELECT      │◄──────┴───────────┘
│           │            │    (shared control)
│  GND ────►│/EN         │
│           │            │
│         1Y│            │
└───────┬───┘            │
        │                │
        ▼                ▼
   X_IN[3:0]        X_IN[15:4]
        │                │
        └────────┬───────┘


         X Input to ALU
         (X_IN[15:0])


    ┌────────────────────┐
    │ Zero Select (zx)   │
    │ 74HC157 × 4        │
    └────────┬───────────┘

    ┌────────▼───────────┐
    │ Negate (nx)        │
    │ 74HC86 × 4         │
    └────────┬───────────┘

        X_PROC[15:0]
        (to ALU core)

Control Signal: A_OR_M

This comes from the instruction decoder:

Instruction Bit 12 (a-bit):
──────────────────────────

From Hack instruction format:
┌──┬──┬──┬─┬──────┬───┬───┐
│15│14│13│a│cccccc│ddd│jjj│
└──┴──┴──┴─┴──────┴───┴───┘

       This bit!

Bit 12 = a-bit = A_OR_M control

a = 0 → Use A register
a = 1 → Use Memory[A]

Examples:
─────────
D = A     → a = 0 (use A)
D = M     → a = 1 (use M)
D = A + 1 → a = 0 (use A)
D = M + 1 → a = 1 (use M)

Detailed Connection Schematic

Let me show ONE 4-bit slice in detail:

A Register outputs:        RAM outputs:
A_Out[0] ────────┐        M[0] ────────┐
A_Out[1] ────┐   │        M[1] ────┐   │
A_Out[2] ──┐ │   │        M[2] ──┐ │   │
A_Out[3] ┐ │ │   │        M[3] ┐ │ │   │
         │ │ │   │             │ │ │   │
         │ │ │   │             │ │ │   │
      ┌──▼─▼─▼───▼──┐       ┌──▼─▼─▼───▼──┐
      │             │       │             │
      │  Pins 2,5,  │       │  Pins 3,6,  │
      │  11,14      │       │  10,13      │
      │  (A inputs) │       │  (B inputs) │
      │             │       │             │
      │    U_MUX    │◄──────┤    A_OR_M   │
      │   74HC157   │       │   (Pin 1)   │
      │             │       │             │
GND ─►│  Pin 15 (/EN│       │             │
      │             │       │             │
      │  Pins 4,7,  │       │             │
      │  9,12       │       │             │
      │  (Y outputs)│       │             │
      └─────┬───────┘       └─────────────┘

            ├─ X_IN[0]
            ├─ X_IN[1]
            ├─ X_IN[2]
            └─ X_IN[3]


        To zx/nx processing
        (X input conditioning)

Pin Connections for 74HC157 (X input mux):
──────────────────────────────────────────
Pin 1:  SELECT (A_OR_M from instruction)
Pin 2:  1A (A_Out[0])
Pin 3:  1B (M[0])
Pin 4:  1Y (X_IN[0])
Pin 5:  2A (A_Out[1])
Pin 6:  2B (M[1])
Pin 7:  2Y (X_IN[1])
Pin 8:  GND
Pin 9:  3Y (X_IN[2])
Pin 10: 3B (M[2])
Pin 11: 3A (A_Out[2])
Pin 12: 4Y (X_IN[3])
Pin 13: 4B (M[3])
Pin 14: 4A (A_Out[3])
Pin 15: /EN (tied to GND - always enabled)
Pin 16: VCC

Repeat this 4 times for full 16 bits!

Timing Through the Multiplexer

Signal Propagation Path:

Step 1: Sources stable
──────────────────────
A_Out[15:0] valid at t=0
M[15:0] valid at t=0
A_OR_M control set at t=0

Step 2: Multiplexer selection
─────────────────────────────
74HC157 propagation: ~12ns

If A_OR_M = 0:
├─ A inputs selected
├─ A_Out → Y outputs
└─ X_IN = A_Out (after 12ns)

If A_OR_M = 1:
├─ B inputs selected
├─ M → Y outputs
└─ X_IN = M (after 12ns)

Step 3: To X conditioning
─────────────────────────
X_IN → zx/nx processing
Additional delay: ~15ns

Step 4: To ALU core
───────────────────
X_PROC arrives at ALU
Total delay: 27ns (12 + 15)

Timing Budget:
──────────────
A_Out stable:     t = 0ns
Through MUX:      t = 12ns
Through zx/nx:    t = 27ns
Ready for ALU:    t = 27ns ✓

This fits well within clock cycle!

Why Not Just Two Separate Inputs?

Bad Idea: Two separate X inputs to ALU

Hypothetical Design:
────────────────────
ALU with:
├─ X_A input (from A register)
├─ X_M input (from Memory)
└─ Select inside ALU

Problems:
─────────
✗ Need to route TWO 16-bit buses to ALU
✗ 32 wires instead of 16
✗ More complex ALU internal routing
✗ Harder PCB layout
✗ More crosstalk between buses
✗ Larger board area needed

Good Design: Mux BEFORE ALU
───────────────────────────
✓ Only ONE 16-bit bus to ALU
✓ 16 wires total
✓ Simpler ALU (no internal mux)
✓ Easier PCB routing
✓ Cleaner signal integrity
✓ Standard CPU design pattern

This is why ALL modern CPUs use
input multiplexing before ALU!

Connection 3: ALU Output Back to Registers

This is the feedback path - how results get saved!

The Result Distribution Network

ALU produces one result, but it can go to THREE places:

                ALU_Out[15:0]

                      │ (16-bit result bus)

         ┌────────────┼────────────┐
         │            │            │
         ▼            ▼            ▼
    ┌────────┐  ┌────────┐  ┌────────┐
    │   A    │  │   D    │  │  RAM   │
    │Register│  │Register│  │ [A]    │
    │        │  │        │  │        │
    │D inputs│  │D inputs│  │D input │
    └────┬───┘  └────┬───┘  └────┬───┘
         │           │           │
    LOAD_A      LOAD_D      WRITE_M
    (control)   (control)   (control)

Only ONE destination active at a time!
(Controlled by instruction decode)

Physical Bus Connection

ALU Output Stage:
─────────────────

From ALU (U29-U32 output XOR gates):

U29.Q0 (pin 3)  ──► ALU_Out[0]  ───┐
U29.Q1 (pin 6)  ──► ALU_Out[1]  ───┤
U29.Q2 (pin 8)  ──► ALU_Out[2]  ───┤
U29.Q3 (pin 11) ──► ALU_Out[3]  ───┤
U30.Q0 (pin 3)  ──► ALU_Out[4]  ───┤
U30.Q1 (pin 6)  ──► ALU_Out[5]  ───┤
U30.Q2 (pin 8)  ──► ALU_Out[6]  ───┤
U30.Q3 (pin 11) ──► ALU_Out[7]  ───┤
U31.Q0 (pin 3)  ──► ALU_Out[8]  ───┤
U31.Q1 (pin 6)  ──► ALU_Out[9]  ───┤
U31.Q2 (pin 8)  ──► ALU_Out[10] ───┤
U31.Q3 (pin 11) ──► ALU_Out[11] ───┤
U32.Q0 (pin 3)  ──► ALU_Out[12] ───┤
U32.Q1 (pin 6)  ──► ALU_Out[13] ───┤
U32.Q2 (pin 8)  ──► ALU_Out[14] ───┤
U32.Q3 (pin 11) ──► ALU_Out[15] ───┤

                        16 wires    │
                     (Result Bus)   │

    ┌───────────────────────────────┘

    ├──► To A Register (U1.D0-7, U2.D0-7)
    │    (loaded when LOAD_A pulses)

    ├──► To D Register (U3.D0-7, U4.D0-7)
    │    (loaded when LOAD_D pulses)

    └──► To RAM input (via 74HC245 transceivers)
         (stored when WRITE_M pulses)

The Control Signals (Destination Select)

Instruction Decode → Destination Control:

From instruction bits [5:3] (destination bits):

Bit 5 (d1): destA → LOAD_A
Bit 4 (d2): destD → LOAD_D
Bit 3 (d3): destM → WRITE_M

Destination Decoder Logic:
─────────────────────────

Instruction[5] (destA) ──┐
                         ├─ AND ─► LOAD_A
C_INST (is C-inst?) ────┘
                              (only load on C-instruction)

Instruction[4] (destD) ──┐
                         ├─ AND ─► LOAD_D
C_INST ─────────────────┘

Instruction[3] (destM) ──┐
                         ├─ AND ─► WRITE_M
C_INST ─────────────────┘


Using 74HC08 (AND gates):

         ┌─────────┐
Inst[5]─►│1      3 │─► LOAD_A
C_INST──►│2  74HC08│
         └─────────┘

         ┌─────────┐
Inst[4]─►│4      6 │─► LOAD_D
C_INST──►│5  74HC08│
         └─────────┘

         ┌─────────┐
Inst[3]─►│10    11 │─► WRITE_M
C_INST──►│9  74HC08│
         └─────────┘

Multiple Destination Capability

Hack can write to multiple destinations simultaneously!

Example: D = A + 1; A = D

Assembly: AD=A+1

Instruction bits:
├─ destA = 1 (bit 5)
├─ destD = 1 (bit 4)
└─ destM = 0 (bit 3)

Result:
─────
Both LOAD_A and LOAD_D pulse simultaneously!

         ALU_Out = A + 1

              ├──────┬───────┐
              ▼      ▼       │
         ┌────────┐ ┌────────┐ │
         │   A    │ │   D    │ │
         └────────┘ └────────┘ │
              │        │       │
         LOAD_A    LOAD_D    (WRITE_M=0)
           ║          ║
           ╚══════════╝
         Both pulse!

After one cycle:
├─ A ← ALU_Out
├─ D ← ALU_Out
└─ Both registers get same value!

This is efficient! One computation → two saves

Timing Diagram: Complete Feedback Path

Clock Cycle Breakdown:

Phase 1: Register Output (0-25ns)
──────────────────────────────────
t=0ns:   Clock rises
t=5ns:   Register output valid
t=25ns:  A_Out, D_Out stable

         ┌────────┐
         │A or D  │
         │Register│
         └───┬────┘
             │ 25ns



Phase 2: Through Mux (25-40ns)
───────────────────────────────
t=25ns:  A_OUT or M arrives at mux
t=27ns:  A_OR_M control stable
t=37ns:  Mux output (X_IN) stable

         ┌────────┐
         │  MUX   │
         │ 74HC157│
         └───┬────┘
             │ 12ns



Phase 3: ALU Processing (40-100ns)
───────────────────────────────────
t=40ns:  X_PROC, Y_PROC ready
t=50ns:  Through zx/nx/zy/ny
t=65ns:  ADD/AND computed
t=80ns:  Function selected (f)
t=100ns: Output negated (no)
         ALU_Out stable ✓

         ┌────────┐
         │  ALU   │
         │ (34ICs)│
         └───┬────┘
             │ 60ns



Phase 4: Back to Register (100-110ns)
──────────────────────────────────────
t=100ns: ALU_Out drives result bus
t=105ns: Data stable at register inputs
t=110ns: Clock edge!
         ├─ LOAD_A/D pulses
         ├─ Data captured
         └─ New value stored

         ┌────────┐
         │Register│
         │ Input  │
         └───┬────┘
             │ 10ns setup

         CLOCK EDGE


Total Loop Time: 110ns
Maximum Frequency: ~9 MHz

Critical Path:
Register → Mux → ALU → Register
25ns + 12ns + 60ns + 10ns + 3ns = 110ns

Connection 4: Memory in the Loop

RAM Connection to ALU X Input

Memory Access Path:

A Register holds address


    ┌──────────┐
    │   RAM    │
    │ AS6C4008 │
    │ [A14:0]  │
    └────┬─────┘
         │ D[15:0]
         │ (read data)

         ├───────────────────┐
         │                   │
         ▼                   ▼
    To X_IN mux         To Data Bus
    (for ALU)          (for registers)


   ALU computation


    Result back to
    RAM or registers

The Complete M (Memory) Path

Detailed Memory Interface:

A Register:
A_Out[14:0] ────────────────┬───► RAM Address


RAM Chip (AS6C4008 × 2):    │

  ┌──────────────────┐      │
  │ Address[14:0] ◄──┴──────┘
  │                 │
  │ /OE ◄─── READ_M │
  │ /WE ◄─── WRITE_M│
  │ /CE ◄─── GND    │
  │                 │
  │ D[15:0] ◄──────┤
  └──────┬──────────┘

         │ M[15:0] (read data)

         ├──────────────────────────┬────────────┐
         │                          │            │
         ▼                          ▼            ▼
    To X_IN MUX               To D_In      To A_In
    (ALU X input)             (D reg)      (A reg)
         │                         │            │
    Select via               Load via      Load via
    A_OR_M=1                 LOAD_D        LOAD_A

Memory Read Cycle

Operation: D = M

Timeline:
─────────

t=0ns: A register holds address
       A_Out = 0x0100 (example)

t=10ns: RAM receives address
        Address decode begins

t=60ns: RAM access complete
        M[0x0100] data valid
        M[15:0] = 0xABCD (example)

t=65ns: Data reaches X_IN mux
        A_OR_M = 1 (select M)
        X_IN = M = 0xABCD

t=80ns: Through zx/nx processing
        X_PROC = 0xABCD

t=90ns: ALU computes (might just pass through)
        ALU_Out = 0xABCD

t=110ns: Clock edge
         LOAD_D pulses
         D ← 0xABCD ✓


Timing Diagram:
───────────────

CLK:     ───╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝

A_Out:   ──── 0x0100 ─────────────  (stable)

M[15:0]: ─────┌─── 0xABCD ─────────  (after access)
              └─ 60ns access time

X_IN:    ─────┌─── 0xABCD ─────────  (mux selects M)
              └─ 65ns

ALU_Out: ─────────┌── 0xABCD ──────  (ALU passes through)
                  └─ 90ns

LOAD_D:  ─────────────╗    ╔═══════  (pulse at clock)
                      ╚════╝

D_Out:   ──── old ────┴─── 0xABCD ──  (updated)

Memory Write Cycle

Operation: M = D

Timeline:
─────────

t=0ns: A register holds address
       D register holds data
       A_Out = 0x0100
       D_Out = 0x1234

t=10ns: RAM receives address
        Address decode

t=50ns: Control signals stable
        WRITE_M = 1 (active)
        /WE goes LOW

t=60ns: D_Out drives data bus
        Data = 0x1234

t=110ns: Clock edge
         WRITE_M pulses LOW
         RAM captures data
         RAM[0x0100] ← 0x1234 ✓


Timing Diagram:
───────────────

CLK:     ───╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝

A_Out:   ──── 0x0100 ─────────────  (address)

D_Out:   ──── 0x1234 ─────────────  (data to write)

WRITE_M: ───────╗         ╔═══════  (write enable)
                ╚═════════╝

/WE:     ════════╗         ╔═══════  (active LOW)
                ╚═════════╝

Data_Bus: ──── 0x1234 ───────────   (D drives bus)

RAM[100]: ──── old ───┴─── 0x1234 ── (written!)

The Tri-State Bus System

Why Tri-State Matters

Problem: Multiple Drivers
─────────────────────────

Without tri-state:
    A_Out ──► ║

    D_Out ──► ║──► CONFLICT!
              ║    (bus fight)
    M_Out ──► ║

All trying to drive bus simultaneously!


With tri-state:
    A_Out ──[/OE]──► ─┐

    D_Out ──[/OE]──► ─┤──► Bus (only one active)

    M_Out ──[/OE]──► ─┘

Only one driver active at a time
Others are "floating" (hi-Z)

Bus Arbitration

Data Bus Control (16-bit shared bus):

    ┌──────────────────────────────┐
    │      Main Data Bus           │
    │      (16 wires)              │
    └─┬───────┬────────┬───────┬──┘
      │       │        │       │
      ▼       ▼        ▼       ▼
   ┌────┐  ┌────┐  ┌────┐  ┌────┐
   │ A  │  │ D  │  │RAM │  │ALU │
   │Reg │  │Reg │  │I/F │  │Out │
   └─┬──┘  └─┬──┘  └─┬──┘  └─┬──┘
     │       │       │       │
    /OE_A  /OE_D  /OE_M   /OE_ALU

Control Logic:
──────────────
Only ONE /OE can be active (LOW) at a time

Example: Loading D from A
├─ /OE_A = 0 (A drives bus)
├─ /OE_D = 1 (D listening, not driving)
├─ /OE_M = 1 (RAM disconnected)
├─ /OE_ALU = 1 (ALU disconnected)
└─ LOAD_D pulses → D captures A's value

74HC245 Bus Transceiver (for RAM)

RAM needs bidirectional access:

         74HC245 (Bus Transceiver)
         ┌──────────────┐
         │              │
CPU Bus ◄┼──► A side    │
[15:0]   │              │
         │      DIR ◄───┤─── READ/WRITE
         │              │       (direction control)
         │              │
RAM Bus ◄┼──► B side    │
[15:0]   │              │
         │     /OE ◄────┤─── ENABLE
         │              │       (3-state control)
         └──────────────┘

DIR = 0: B → A (read from RAM)
DIR = 1: A → B (write to RAM)

/OE = 0: Transceiver active
/OE = 1: All outputs hi-Z


Pin Connections (74HC245):
──────────────────────────
Pin 1:  DIR (direction control)
        ├─ 0 = B→A (RAM to CPU)
        └─ 1 = A→B (CPU to RAM)

Pin 19: /OE (output enable)
        ├─ 0 = Active
        └─ 1 = Hi-Z (disabled)

Pins 2-9:   A side (CPU data bus)
Pins 11-18: B side (RAM data bus)

Control Logic:
──────────────
DIR = WRITE_M (1 when writing, 0 when reading)
/OE = 0 (always enabled when RAM selected)

Complete Operation Examples

Let me trace THREE complete operations showing all connections:

Example 1: D = D + 1 (Simple Feedback)

Instruction: D = D + 1

Control Signals:
────────────────
ALU_CTRL = 011111 (increment Y input)
A_OR_M = 0 (don't care, not using X)
LOAD_D = 1 (save to D)
LOAD_A = 0 (don't save to A)
WRITE_M = 0 (don't write memory)


Phase 1: D Register Output (0-25ns)
────────────────────────────────────
D_Out = 0x0005 (current value)

    ┌──────────┐
    │ D Reg    │
    │ U3, U4   │
    │ 74HC574  │
    └────┬─────┘

         │ Q outputs: 0x0005




Phase 2: To ALU Y Input (25-40ns)
──────────────────────────────────
D_Out → Y input processing

    D_Out[15:0] = 0x0005


    ┌──────────┐
    │ U9-U16   │  Zero select (zy=0, pass through)
    │ 74HC157  │
    └────┬─────┘

    Y_ZERO = 0x0005


    ┌──────────┐
    │ U10,12,  │  Negate (ny=1, increment needs this)
    │ 14,16    │
    │ 74HC86   │
    └────┬─────┘

    Y_PROC = 0x0005




Phase 3: ALU Computation (40-100ns)
────────────────────────────────────
X input doesn't matter (using zx=1 → X=0)
Y_PROC = 0x0005

ALU operation: X + Y + 1
Since zx=1: X = 0
Result = 0 + 5 + 1 = 6

    ┌──────────────┐
    │ ALU Core     │
    │ U17-U20      │  Adders compute: 0 + 5
    │ 74HC283      │  Result: 5
    └──────┬───────┘


    ┌──────────────┐
    │ Output Stage │
    │ U29-U32      │  no=1 → negate
    │ 74HC86       │  !5 = ... wait
    └──────┬───────┘

    Actually, let me recalculate:

ALU_CTRL = 011111
zx=0, nx=1, zy=1, ny=1, f=1, no=1

Step by step:
1. X path: zx=0,nx=1 → X becomes !X
2. Y path: zy=1,ny=1 → Y becomes !0 = 0xFFFF
3. Add: !X + 0xFFFF
4. Negate output

Actually for D+1, control is:
zx=0, nx=1, zy=1, ny=1, f=1, no=1

Hmm, let me check the actual encoding...

For Y+1 operation:
zx=1, nx=1, zy=0, ny=1, f=1, no=1

Let me trace correctly:
1. zx=1 → X = 0
2. nx=1 → X = !0 = 0xFFFF
3. zy=0 → use Y (D value)
4. ny=1 → Y = !D = !0x0005 = 0xFFFA
5. f=1 → ADD: 0xFFFF + 0xFFFA = 0xFFF9
6. no=1 → !0xFFF9 = 0x0006 ✓

    ALU_Out = 0x0006


Phase 4: Back to D Register (100-110ns)
────────────────────────────────────────
ALU_Out drives data bus
D register inputs receive 0x0006

t=110ns: Clock edge!
         LOAD_D pulses

    ┌──────────┐
    │ D Reg    │
    │ U3, U4   │
    │ D inputs │ ◄─── 0x0006
    └────┬─────┘


      CLK pulses

    D_Out = 0x0006 ✓


Result Check:
─────────────
Before: D = 0x0005
After:  D = 0x0006 ✓
Incremented successfully!

Example 2: D = A + D (Two Inputs)

Instruction: D = A + D

Initial Values:
───────────────
A = 0x0003
D = 0x0005

Control Signals:
────────────────
ALU_CTRL = 000010 (add X and Y)
zx=0, nx=0, zy=0, ny=0, f=1, no=0
A_OR_M = 0 (use A, not M)
LOAD_D = 1
LOAD_A = 0
WRITE_M = 0


Phase 1: Both Registers Output (0-25ns)
────────────────────────────────────────

    ┌──────────┐        ┌──────────┐
    │ A Reg    │        │ D Reg    │
    │ U1, U2   │        │ U3, U4   │
    └────┬─────┘        └────┬─────┘
         │                   │
    A_Out = 0x0003      D_Out = 0x0005
         │                   │
         ▼                   ▼


Phase 2: A → X Input MUX (25-40ns)
───────────────────────────────────

A_Out and M both available to mux
A_OR_M = 0 → select A

    ┌──────────────────────┐
    │    X Input MUX       │
    │    4× 74HC157        │
    │                      │
    │ A[15:0] ──┐          │
    │           ├─ MUX ──► │ X_IN = 0x0003
    │ M[15:0] ──┘          │
    │                      │
    │ A_OR_M = 0 (select A)│
    └──────────┬───────────┘


          X_IN = 0x0003


Phase 3: Input Conditioning (40-50ns)
──────────────────────────────────────

X path: zx=0, nx=0 → pass through
Y path: zy=0, ny=0 → pass through

    X_PROC = 0x0003
    Y_PROC = 0x0005


Phase 4: ALU Core (50-100ns)
─────────────────────────────

ADD operation (f=1):

    X_PROC = 0x0003
  + Y_PROC = 0x0005
  ─────────────────
    SUM    = 0x0008

    ┌───────────────────┐
    │ Adder Chain       │
    │ U17-U20           │
    │ 74HC283 × 4       │
    │                   │
    │ 0x0003 + 0x0005   │
    │    = 0x0008       │
    └─────────┬─────────┘

         SUM = 0x0008


    ┌───────────────────┐
    │ Function Select   │
    │ U25-U28           │
    │ 74HC157 × 4       │
    │                   │
    │ f=1 → select ADD  │
    └─────────┬─────────┘

        F_OUT = 0x0008


    ┌───────────────────┐
    │ Output Negation   │
    │ U29-U32           │
    │ 74HC86 × 4        │
    │                   │
    │ no=0 → pass thru  │
    └─────────┬─────────┘

        ALU_Out = 0x0008


Phase 5: Save to D (100-110ns)
───────────────────────────────

ALU_Out → Data Bus → D_In

t=110ns: LOAD_D pulses

    ┌──────────┐
    │ D Reg    │
    │          │
    │ D ← 0x0008│
    └──────────┘


Result Check:
─────────────
Before: A = 0x0003, D = 0x0005
After:  A = 0x0003, D = 0x0008 ✓
Sum computed and saved!


Connection Summary:
───────────────────
A_Out → MUX → X_PROC → ALU ──┐
                              ├─► ALU_Out → D_In
D_Out ──────► Y_PROC → ALU ──┘

Both inputs used simultaneously!

Example 3: M = D + M (Memory Involved)

Instruction: M = D + M

Initial Values:
───────────────
A = 0x0100 (points to RAM address 256)
D = 0x0007
RAM[256] = 0x0003

Control Signals:
────────────────
ALU_CTRL = 000010 (add)
A_OR_M = 1 (use M, not A)
LOAD_D = 0
LOAD_A = 0
WRITE_M = 1 (save to memory!)


Phase 1: Memory Read (0-60ns)
──────────────────────────────

A register outputs address:

    ┌──────────┐
    │ A Reg    │
    │          │
    │ A = 0x0100│
    └────┬─────┘

    A_Out = 0x0100


    ┌──────────┐
    │   RAM    │
    │ AS6C4008 │
    │          │
    │ Addr ←───┤ 0x0100
    │          │
    │ /OE = 0  │ (read enabled)
    │ /WE = 1  │ (not writing yet)
    └────┬─────┘

    (60ns access time)

    M[256] = 0x0003




Phase 2: M → X Input MUX (60-75ns)
───────────────────────────────────

A_Out and M both available
A_OR_M = 1 → select M

    ┌──────────────────────┐
    │    X Input MUX       │
    │                      │
    │ A = 0x0100 ──┐       │
    │              ├─MUX ─►│ X_IN = 0x0003
    │ M = 0x0003 ──┘       │
    │                      │
    │ A_OR_M = 1 (select M)│
    └──────────┬───────────┘

          X_IN = 0x0003
          (memory value!)


Phase 3: Both Inputs to ALU (75-100ns)
───────────────────────────────────────

X path: M value (via mux)
Y path: D value (direct)

    X_PROC = 0x0003 (from M)
  + Y_PROC = 0x0007 (from D)
  ─────────────────
    ALU_Out = 0x000A


Phase 4: Write to Memory (100-150ns)
─────────────────────────────────────

ALU_Out goes to data bus
A still holds address (0x0100)

t=110ns: Clock edge
         WRITE_M pulses LOW
         RAM /WE goes LOW

    ┌──────────┐
    │   RAM    │
    │          │
    │ Addr = 0x0100 (from A)
    │ Data = 0x000A (from ALU)
    │          │
    │ /WE ─────┘  LOW (write!)
    │          │
    │ RAM[256] ← 0x000A │
    └──────────┘


Result Check:
─────────────
Before:
├─ A = 0x0100 (address)
├─ D = 0x0007 (data)
└─ RAM[256] = 0x0003

After:
├─ A = 0x0100 (unchanged)
├─ D = 0x0007 (unchanged)
└─ RAM[256] = 0x000A ✓
    (0x0003 + 0x0007 = 0x000A)


Connection Summary:
───────────────────
RAM[A] → M → MUX → X_PROC → ALU ──┐
                                   ├─► ALU_Out
D_Out ─────────► Y_PROC → ALU ────┘         │

A_Out ──────────────────────────────┐       │
                                    ▼       │
                              RAM Address   │
                                    │       │
                              RAM ◄─────────┘
                              (write back)

Read from memory, compute, write back!

PCB Routing Strategies

Layer Assignment Philosophy

TOP LAYER (F.Cu): Signal Routing
─────────────────────────────────
├─ Register outputs to ALU inputs
├─ ALU output to register inputs  
├─ Control signals (LOAD_A, LOAD_D, etc)
├─ Short, direct point-to-point connections
└─ Horizontal preference

BOTTOM LAYER (B.Cu): Power + Vertical Signals
──────────────────────────────────────────────
├─ Ground plane (large pour)
├─ +5V plane (where needed)
├─ Vertical signal crossings
├─ Return paths for all signals
└─ Memory data bus (needs more routing space)

Critical Trace Groups

Group 1: A_Out to X MUX (16 traces)
────────────────────────────────────
From: A Register (U1, U2)
To:   X Input MUX (74HC157 × 4)

Routing:
├─ Length: 40-60mm typical
├─ Width: 0.4mm (15 mil)
├─ Spacing: 0.3mm (12 mil)
├─ Layer: Top (F.Cu)
├─ Keep parallel
└─ No vias

Termination: None needed (short traces)


Group 2: D_Out to Y Processing (16 traces)
───────────────────────────────────────────
From: D Register (U3, U4)
To:   Y Input Processing (74HC157 × 4)

Routing:
├─ Length: 40-60mm typical
├─ Width: 0.4mm
├─ Spacing: 0.3mm
├─ Layer: Top (F.Cu)
├─ Straight run
└─ Shortest path possible


Group 3: ALU_Out to Registers (16 traces)
──────────────────────────────────────────
From: ALU Output (U29-U32)
To:   A_In, D_In, M_In

Routing:
├─ Length: 50-80mm
├─ Width: 0.4mm
├─ Spacing: 0.3mm
├─ Layer: Top and Bottom (may need vias)
├─ Fan-out to three destinations
└─ Star topology from ALU


Group 4: Control Signals (6 traces)
────────────────────────────────────
Signals: LOAD_A, LOAD_D, WRITE_M, A_OR_M, etc.

Routing:
├─ Length: Variable (star topology)
├─ Width: 0.4mm
├─ Layer: Top (F.Cu)
├─ Keep away from data buses
├─ No crosstalk with data
└─ Buffer if fanout > 4

Example Board Layout

Physical arrangement (10cm × 15cm board):

  0mm ┌────────────────────────────────────┐
      │ [Connectors: J1-J6]                │
      ├────────────────────────────────────┤
 20mm │ [A Register: U1-U2]                │
      │          ║ 16 traces                │
 30mm │          ║                          │
      │          ▼                          │
 40mm │ [X Input MUX: 74HC157 × 4]         │
      │                                     │
      │ [D Register: U3-U4]                │
      │          ║ 16 traces                │
 60mm │          ▼                          │
      │ [Y Input Processing]                │
      ├────────────────────────────────────┤
 80mm │        [ALU CORE]                   │
      │   Adders, AND gates, MUXes         │
      │   (largest section - 20+ ICs)      │
120mm ├────────────────────────────────────┤
      │   [Output Stage]                   │
      │          ║                          │
      │          ║ 16 traces (result bus)  │
      │          ║                          │
      │          ╚═══╦═══╗                  │
      │              ║   ║                  │
140mm │         To A  To D  To RAM          │
150mm └────────────────────────────────────┘

Trace Routing:
──────────────
A→MUX:    Vertical, left side
D→Y:      Vertical, right side
ALU→Regs: Fan-out from center
Control:  Horizontal, top layer
Power:    Bottom layer (flood)

Signal Integrity Considerations

Capacitive Loading

Each output drives multiple inputs:

Example: D_Out[0] fanout
────────────────────────
D Register U3.Q0 (pin 6)

        ├─► Y Input MUX (U9 pin 2)
        │   Load: 1pF

        ├─► (Future expansion points)

Total capacitance: ~5pF

Output drive capability:
74HC574 can drive 50pF easily
No buffer needed!


Critical case: ALU_Out fanout
──────────────────────────────
ALU_Out[0] from U29 (pin 3)

        ├─► A Register (U1 pin 2)      1pF
        ├─► D Register (U3 pin 2)      1pF
        ├─► RAM interface (transceiver) 2pF
        ├─► Test point                 1pF

Total: ~5pF + trace capacitance

Trace capacitance:
50mm trace @ 100pF/m = 5pF

Total: 10pF - still okay!

Rise time calculation:
R_source × C_load = 100Ω × 10pF = 1ns
Fast enough for MHz operation ✓

Crosstalk Prevention

Problem: Parallel traces coupling
──────────────────────────────────

When traces run parallel:

    Signal A ────────────────
         │ (coupled capacitance)
    Signal B ────────────────

Fast edge on A induces noise on B!

Solution: Proper spacing
────────────────────────

    Signal A ────────────────
                 (0.3mm gap)
    Signal B ────────────────

Crosstalk < 5% with 0.3mm spacing
(at 1MHz, 0.4mm traces)


Additional techniques:
──────────────────────
1. Ground trace between critical signals
   Data0 ──── GND ──── Data1 ────

2. Different layers for different buses
   Top: A_Out bus
   Bottom: D_Out bus (no parallel run)

3. Perpendicular crossing when needed
   ────────── (A_Out, horizontal)

        │ (D_Out, vertical)

Ground Bounce (Power Integrity)

Problem: Simultaneous Switching
────────────────────────────────

When all 16 ALU outputs switch:
├─ 16 × 5mA = 80mA current spike
├─ Through ground inductance
├─ Creates voltage bounce
└─ Can cause false triggering!

Ground bounce voltage:
V = L × (di/dt)
  = 10nH × (80mA / 2ns)
  = 0.4V bounce!

Solution: Decoupling Capacitors
────────────────────────────────

Place 0.1µF cap at EVERY IC:

    VCC ─────┬───── IC

            ┌┴┐ 0.1µF
            │ │ (very close!)
            └┬┘

    GND ─────┴───── IC

Capacitor provides local current:
├─ IC switches → draws current from cap
├─ Cap is close (low inductance)
├─ Ground bounce reduced to ~50mV
└─ Safe for operation! ✓

Additional bulk caps:
├─ 10µF at power entry
├─ 100µF for entire board
└─ Stabilizes power supply

Performance Analysis

Maximum Operating Frequency

Critical Path Delay Budget:
───────────────────────────

Register output:        25ns
├─ 74HC574 CLK-to-Q

Through input mux:      12ns
├─ 74HC157 propagation

Input conditioning:     15ns
├─ Zero select (74HC157)
├─ Negate (74HC86)

ALU core:              60ns
├─ Preprocessing (already counted)
├─ Adder carry chain: 40ns
├─ Function select: 10ns
├─ Output negate: 10ns

Register setup:         12ns
├─ 74HC574 setup time

Clock skew allowance:   6ns
├─ Clock distribution variation

TOTAL:                 130ns
Maximum frequency:     7.7 MHz

Conservative design:   100ns cycle (10 MHz)
Typical operation:     125ns cycle (8 MHz)

Instruction Throughput

Instructions Per Second:
────────────────────────

At 8 MHz clock:
├─ 8,000,000 cycles/second
├─ Typical instruction: 1 cycle
└─ 8 MIPS (Million Instructions Per Second)

Some instructions need multiple cycles:
├─ Memory access: 1 cycle
├─ Jump: 1 cycle (PC load)
├─ Complex calculations: 1 cycle
└─ Almost everything: 1 cycle!

Effective throughput: ~7-8 MIPS


Comparison:
───────────
Architecture        Frequency    MIPS
────────────────────────────────────────
Hack (our design)   8 MHz        ~8
Original 8080       2 MHz        ~0.5
MOS 6502           1 MHz        ~1
Z80                4 MHz        ~1
68000              8 MHz        ~1

Our Hack is competitive!
(Simple architecture = high efficiency)

Summary: The Connection Philosophy

Key Design Principles

1. Direct Paths Where Possible

D → ALU Y input: Direct
└─ No multiplexing needed
└─ Simplest, fastest path

2. Multiplex Only When Necessary

A vs M → ALU X input: Multiplexed
└─ Need to choose between two sources
└─ One 74HC157 mux per 4 bits

3. Shared Result Bus

ALU → Registers: Common bus
└─ Time-multiplexed (control signals)
└─ Only one destination active at once

4. Synchronous Operation

Everything clocked together
└─ Predictable timing
└─ No race conditions

5. Feedback Loop

Register → ALU → Register
└─ Computational loop
└─ Enables iterative calculations

The Elegant Integration

The register-ALU connections show how simple components create complex computation:

Just three types of connections:
├─ Register outputs (sources)
├─ ALU inputs (via mux when needed)
└─ ALU output (feedback to registers)

These three connections enable:
├─ Arithmetic (+, -, increment, decrement)
├─ Logic (AND, OR, NOT)
├─ Memory access (load, store)
├─ Program flow (jumps based on ALU flags)
└─ Complete computation! ✓

The connections are the nervous system of the computer - they carry data and enable the computational feedback loop that makes programming possible!


Would you like me to explain:

  1. The memory interface connections in the same detail?
  2. How the control unit generates all these signals?
  3. The complete fetch-decode-execute cycle with all connections active?
  4. How to optimize this design for higher speeds?

Photo of Yinhuan Yuan

Hi, I'm Yinhuan Yuan. I'm a software engineer based in Toronto. You can read more about me on yuan.fyi.