Y
Published on

How Registers Connect to the ALU A Complete Deep Dive

Authors
  • avatar
    Name
    Yinhuan Yuan
    Twitter

How Registers Connect to the ALU: A Complete Deep Dive

Let me explain the intricate dance between registers and the ALU - this is where the magic of computation actually happens!


The Big Picture: Data Flow Architecture

Overview of Connections

The Complete Data Path:

    ┌─────────┐       ┌─────────┐       ┌─────────┐
A    │       │    D    │       │   PCRegister│Register│Counter    └────┬────┘       └────┬────┘       └────┬────┘
         │                 │                  │
A_Out[15:0]D_Out[15:0]PC_Out[14:0]
         │                 │                  │
         │                 │                  ▼
         │                 │            ┌──────────┐
         │                 │            │   ROM         │                 │             (Instruct│
         │                 │            └────┬─────┘
         │                 │                 │
         │                 │                 │ Inst[15:0]
         │                 │                 │
         │                 │                 ▼
         │                 │           ┌──────────┐
         │                 │           │ Control         │                 │           │   Unit         │                 │           └────┬─────┘
         │                 │                │
         │                 │         Control Signals
         │                 │                │
         ▼                 ▼                ▼
    ┌────────────────────────────────────────────┐
    │                                            │
ALU (16-bit)    │                                            │
X Input ◄───┐        Y Input ◄───┐       │
    │              │                    │       │
[A or M]               [D]    │              │                    │       │
    └──────────────┼────────────────────┼───────┘
                   │                    │
ALU_Out[15:0]                    (Result)                   │                    │
                   ▼                    ▼
              Data Bus [15:0] ◄────► RAM
         ┌─────────┼─────────┐
         │         │         │
         ▼         ▼         ▼
      To A      To D      To M
    (via LOAD_A)(LOAD_D) (WRITE_M)

The key insight: Registers feed the ALU, and ALU results feed back to registers. This creates a computational feedback loop.


Connection 1: D Register to ALU (Y Input)

The Direct Connection

This is the simplest connection:

Physical Connection:
────────────────────

D Register (U3, U4 - 74HC574s)
Q pins output D_Out[15:0]
    ├─ U3.Q0 (pin 6)  ──► D_Out[0]  ───┐
    ├─ U3.Q1 (pin 7)  ──► D_Out[1]  ───┤
    ├─ U3.Q2 (pin 8)  ──► D_Out[2]  ───┤
    ├─ U3.Q3 (pin 9)  ──► D_Out[3]  ───┤
    ├─ U3.Q4 (pin 12) ──► D_Out[4]  ───┤
    ├─ U3.Q5 (pin 13) ──► D_Out[5]  ───┤
    ├─ U3.Q6 (pin 14) ──► D_Out[6]  ───┤
    ├─ U3.Q7 (pin 15) ──► D_Out[7]  ───┤
    │                                  │
    ├─ U4.Q0 (pin 6)  ──► D_Out[8]  ───┤
    ├─ U4.Q1 (pin 7)  ──► D_Out[9]  ───┤
    ├─ U4.Q2 (pin 8)  ──► D_Out[10] ───┤
    ├─ U4.Q3 (pin 9)  ──► D_Out[11] ───┤
    ├─ U4.Q4 (pin 12) ──► D_Out[12] ───┤
    ├─ U4.Q5 (pin 13) ──► D_Out[13] ───┤
    ├─ U4.Q6 (pin 14) ──► D_Out[14] ───┤
    └─ U4.Q7 (pin 15) ──► D_Out[15] ───┤
                            16 wires    │
                              ALU Y Input Processing
                              (U9-U16 - the Y path)
                              ┌─────────▼──────────┐
Zero Select (zy)                              │ 74HC157 × 4                              └─────────┬──────────┘
                              ┌─────────▼──────────┐
Negate (ny)                              │ 74HC86 × 4                              └─────────┬──────────┘
                                   Y_PROC[15:0]
                                   (to ALU core)

Why This is Simple

Advantages:
──────────
Direct connection (no multiplexing)
D is ALWAYS available to ALU
No switching delays
Clean signal path
Easy to route on PCB

The D register has one job:
└─ Feed data to ALU Y input

Always active, always available!

Signal Characteristics

Electrical Properties:
─────────────────────

Output Drive Strength (74HC574):
├─ Source current: 5.2mA (typical)
├─ Sink current: 5.2mA (typical)
├─ Output voltage HIGH: >4.4V at 4mA
├─ Output voltage LOW: <0.4V at 4mA
└─ Strong enough to drive multiple inputs

Loading:
├─ Each D_Out line drives:
│   ├─ 4× 74HC157 inputs (zy selection)
│   ├─ 4× 74HC86 inputs (ny negation)
│   └─ Total: ~2mA load per bit
└─ Well within 74HC574 capability

Propagation Time:
├─ Register output: 25ns
├─ To Y_PROC: +15ns (through zy/ny)
└─ Total: ~40ns D register to ALU core

PCB Routing Strategy

Recommended Layout (Top View):

    ┌──────────────────────────┐
D RegisterU3      U4[7:0]   [15:8]    └─────┬──────┬─────────────┘
          │      │
          │      │  16 parallel traces
          │      │  Width: 0.4mm (15 mil)
          │      │  Spacing: 0.4mm
          │      │
    ┌─────▼──────▼─────────────┐
Y Input ConditioningU9-U16        (ALU preprocessing)    └──────────────────────────┘

Trace Length: ~50mm typical
Propagation Delay: ~1ns (negligible)

Bus Routing:
├─ Group all 16 traces together
├─ Keep parallel to maintain timing
├─ Avoid crossing other buses
├─ Run on top layer (short path)
└─ No vias needed (stay on one layer)

Connection 2: A Register to ALU (X Input via Multiplexer)

This is more complex because we have a choice!

The A/M Selection Problem

The Dilemma:
────────────

ALU X input can be:
├─ A Register value (for calculations with A)
└─ Memory[A] value (for calculations with RAM data)

Example showing why we need both:

Operation 1: A = A + 1
├─ Need A register value
└─ X input = A

Operation 2: D = M + 1
├─ Need Memory value (RAM[A])
└─ X input = M (which is RAM[A])

Solution: MULTIPLEXER!

The Multiplexer Network

Complete A/M Selection Circuit:

From A Register:                     From RAM:
A_Out[15:0]                          M[15:0]
    │                                    │
    │                                    │
    ├─────────────┬──────────────────────┤
    │             │                      │
    ▼             ▼                      ▼
┌───────────┐ ┌───────────┐       ┌───────────┐
│  74HC157  │ │  74HC157  │  ...  │  74HC157  │
  (4 bits)  (4 bits)  (4 bits)│           │ │           │       │           │
A[3:0] ─►│1A          │       │           │
M[3:0] ─►│1B          │       │           │
│           │            │       │           │
A_OR_M ──►│SELECT      │◄──────┴───────────┘
│           │                (shared control)
GND ────►│/EN│           │            │
│         1Y│            │
└───────┬───┘            │
        │                │
        ▼                ▼
   X_IN[3:0]        X_IN[15:4]
        │                │
        └────────┬───────┘
         X Input to ALU
         (X_IN[15:0])
    ┌────────────────────┐
Zero Select (zx)    │ 74HC157 × 4    └────────┬───────────┘
    ┌────────▼───────────┐
Negate (nx)    │ 74HC86 × 4    └────────┬───────────┘
        X_PROC[15:0]
        (to ALU core)

Control Signal: A_OR_M

This comes from the instruction decoder:

Instruction Bit 12 (a-bit):
──────────────────────────

From Hack instruction format:
┌──┬──┬──┬─┬──────┬───┬───┐
151413│a│cccccc│ddd│jjj│
└──┴──┴──┴─┴──────┴───┴───┘
       This bit!

Bit 12 = a-bit = A_OR_M control

a = 0Use A register
a = 1Use Memory[A]

Examples:
─────────
D = A     → a = 0 (use A)
D = M     → a = 1 (use M)
D = A + 1 → a = 0 (use A)
D = M + 1 → a = 1 (use M)

Detailed Connection Schematic

Let me show ONE 4-bit slice in detail:

A Register outputs:        RAM outputs:
A_Out[0] ────────┐        M[0] ────────┐
A_Out[1] ────┐   │        M[1] ────┐   │
A_Out[2] ──┐ │   │        M[2] ──┐ │   │
A_Out[3] ┐ │ │   │        M[3] ┐ │ │   │
         │ │ │   │             │ │ │   │
         │ │ │   │             │ │ │   │
      ┌──▼─▼─▼───▼──┐       ┌──▼─▼─▼───▼──┐
      │             │       │             │
Pins 2,5,  │       │  Pins 3,6,11,14      │       │  10,13        (A inputs)  (B inputs)      │             │       │             │
U_MUX    │◄──────┤    A_OR_M      │   74HC157   │          (Pin 1)      │             │       │             │
GND ─►│  Pin 15 (/EN│       │             │
      │             │       │             │
Pins 4,7,  │       │             │
9,12       │       │             │
        (Y outputs)│       │             │
      └─────┬───────┘       └─────────────┘
            ├─ X_IN[0]
            ├─ X_IN[1]
            ├─ X_IN[2]
            └─ X_IN[3]
        To zx/nx processing
        (X input conditioning)

Pin Connections for 74HC157 (X input mux):
──────────────────────────────────────────
Pin 1:  SELECT (A_OR_M from instruction)
Pin 2:  1A (A_Out[0])
Pin 3:  1B (M[0])
Pin 4:  1Y (X_IN[0])
Pin 5:  2A (A_Out[1])
Pin 6:  2B (M[1])
Pin 7:  2Y (X_IN[1])
Pin 8:  GND
Pin 9:  3Y (X_IN[2])
Pin 10: 3B (M[2])
Pin 11: 3A (A_Out[2])
Pin 12: 4Y (X_IN[3])
Pin 13: 4B (M[3])
Pin 14: 4A (A_Out[3])
Pin 15: /EN (tied to GND - always enabled)
Pin 16: VCC

Repeat this 4 times for full 16 bits!

Timing Through the Multiplexer

Signal Propagation Path:

Step 1: Sources stable
──────────────────────
A_Out[15:0] valid at t=0
M[15:0] valid at t=0
A_OR_M control set at t=0

Step 2: Multiplexer selection
─────────────────────────────
74HC157 propagation: ~12ns

If A_OR_M = 0:
├─ A inputs selected
├─ A_OutY outputs
└─ X_IN = A_Out (after 12ns)

If A_OR_M = 1:
├─ B inputs selected
├─ MY outputs
└─ X_IN = M (after 12ns)

Step 3: To X conditioning
─────────────────────────
X_IN → zx/nx processing
Additional delay: ~15ns

Step 4: To ALU core
───────────────────
X_PROC arrives at ALU
Total delay: 27ns (12 + 15)

Timing Budget:
──────────────
A_Out stable:     t = 0ns
Through MUX:      t = 12ns
Through zx/nx:    t = 27ns
Ready for ALU:    t = 27ns ✓

This fits well within clock cycle!

Why Not Just Two Separate Inputs?

Bad Idea: Two separate X inputs to ALU

Hypothetical Design:
────────────────────
ALU with:
├─ X_A input (from A register)
├─ X_M input (from Memory)
└─ Select inside ALU

Problems:
─────────
Need to route TWO 16-bit buses to ALU
32 wires instead of 16
More complex ALU internal routing
Harder PCB layout
More crosstalk between buses
Larger board area needed

Good Design: Mux BEFORE ALU
───────────────────────────
Only ONE 16-bit bus to ALU
16 wires total
Simpler ALU (no internal mux)
Easier PCB routing
Cleaner signal integrity
Standard CPU design pattern

This is why ALL modern CPUs use
input multiplexing before ALU!

Connection 3: ALU Output Back to Registers

This is the feedback path - how results get saved!

The Result Distribution Network

ALU produces one result, but it can go to THREE places:

                ALU_Out[15:0]
                       (16-bit result bus)
         ┌────────────┼────────────┐
         │            │            │
         ▼            ▼            ▼
    ┌────────┐  ┌────────┐  ┌────────┐
A    │  │   D    │  │  RAM    │Register│  │Register│  │ [A]    │        │  │        │  │        │
D inputs│  │D inputs│  │D input │
    └────┬───┘  └────┬───┘  └────┬───┘
         │           │           │
    LOAD_A      LOAD_D      WRITE_M
    (control)   (control)   (control)

Only ONE destination active at a time!
(Controlled by instruction decode)

Physical Bus Connection

ALU Output Stage:
─────────────────

From ALU (U29-U32 output XOR gates):

U29.Q0 (pin 3)  ──► ALU_Out[0]  ───┐
U29.Q1 (pin 6)  ──► ALU_Out[1]  ───┤
U29.Q2 (pin 8)  ──► ALU_Out[2]  ───┤
U29.Q3 (pin 11) ──► ALU_Out[3]  ───┤
U30.Q0 (pin 3)  ──► ALU_Out[4]  ───┤
U30.Q1 (pin 6)  ──► ALU_Out[5]  ───┤
U30.Q2 (pin 8)  ──► ALU_Out[6]  ───┤
U30.Q3 (pin 11) ──► ALU_Out[7]  ───┤
U31.Q0 (pin 3)  ──► ALU_Out[8]  ───┤
U31.Q1 (pin 6)  ──► ALU_Out[9]  ───┤
U31.Q2 (pin 8)  ──► ALU_Out[10] ───┤
U31.Q3 (pin 11) ──► ALU_Out[11] ───┤
U32.Q0 (pin 3)  ──► ALU_Out[12] ───┤
U32.Q1 (pin 6)  ──► ALU_Out[13] ───┤
U32.Q2 (pin 8)  ──► ALU_Out[14] ───┤
U32.Q3 (pin 11) ──► ALU_Out[15] ───┤
                        16 wires    
                     (Result Bus)    ┌───────────────────────────────┘
    ├──► To A Register (U1.D0-7, U2.D0-7)
        (loaded when LOAD_A pulses)
    ├──► To D Register (U3.D0-7, U4.D0-7)
        (loaded when LOAD_D pulses)
    └──► To RAM input (via 74HC245 transceivers)
         (stored when WRITE_M pulses)

The Control Signals (Destination Select)

Instruction DecodeDestination Control:

From instruction bits [5:3] (destination bits):

Bit 5 (d1): destA → LOAD_A
Bit 4 (d2): destD → LOAD_D
Bit 3 (d3): destM → WRITE_M

Destination Decoder Logic:
─────────────────────────

Instruction[5] (destA) ──┐
                         ├─ AND ─► LOAD_A
C_INST (is C-inst?) ────┘
                              (only load on C-instruction)

Instruction[4] (destD) ──┐
                         ├─ AND ─► LOAD_D
C_INST ─────────────────┘

Instruction[3] (destM) ──┐
                         ├─ AND ─► WRITE_M
C_INST ─────────────────┘


Using 74HC08 (AND gates):

         ┌─────────┐
Inst[5]─►│1      3 │─► LOAD_A
C_INST──►│2  74HC08│
         └─────────┘

         ┌─────────┐
Inst[4]─►│4      6 │─► LOAD_D
C_INST──►│5  74HC08│
         └─────────┘

         ┌─────────┐
Inst[3]─►│10    11 │─► WRITE_M
C_INST──►│9  74HC08│
         └─────────┘

Multiple Destination Capability

Hack can write to multiple destinations simultaneously!

Example: D = A + 1; A = D

Assembly: AD=A+1

Instruction bits:
├─ destA = 1 (bit 5)
├─ destD = 1 (bit 4)
└─ destM = 0 (bit 3)

Result:
─────
Both LOAD_A and LOAD_D pulse simultaneously!

         ALU_Out = A + 1
              ├──────┬───────┐
              ▼      ▼       │
         ┌────────┐ ┌────────┐ │
A    │ │   D    │ │
         └────────┘ └────────┘ │
              │        │       │
         LOAD_A    LOAD_D    (WRITE_M=0)
           ║          ║
           ╚══════════╝
         Both pulse!

After one cycle:
├─ AALU_Out
├─ DALU_Out
└─ Both registers get same value!

This is efficient! One computation → two saves

Timing Diagram: Complete Feedback Path

Clock Cycle Breakdown:

Phase 1: Register Output (0-25ns)
──────────────────────────────────
t=0ns:   Clock rises
t=5ns:   Register output valid
t=25ns:  A_Out, D_Out stable

         ┌────────┐
A or D         │Register│
         └───┬────┘
             │ 25ns


Phase 2: Through Mux (25-40ns)
───────────────────────────────
t=25ns:  A_OUT or M arrives at mux
t=27ns:  A_OR_M control stable
t=37ns:  Mux output (X_IN) stable

         ┌────────┐
MUX         │ 74HC157│
         └───┬────┘
             │ 12ns


Phase 3: ALU Processing (40-100ns)
───────────────────────────────────
t=40ns:  X_PROC, Y_PROC ready
t=50ns:  Through zx/nx/zy/ny
t=65ns:  ADD/AND computed
t=80ns:  Function selected (f)
t=100ns: Output negated (no)
         ALU_Out stable ✓

         ┌────────┐
ALU          (34ICs)         └───┬────┘
             │ 60ns


Phase 4: Back to Register (100-110ns)
──────────────────────────────────────
t=100ns: ALU_Out drives result bus
t=105ns: Data stable at register inputs
t=110ns: Clock edge!
         ├─ LOAD_A/D pulses
         ├─ Data captured
         └─ New value stored

         ┌────────┐
         │Register│
Input         └───┬────┘
             │ 10ns setup
         CLOCK EDGE


Total Loop Time: 110ns
Maximum Frequency: ~9 MHz

Critical Path:
RegisterMuxALURegister
25ns + 12ns + 60ns + 10ns + 3ns = 110ns

Connection 4: Memory in the Loop

RAM Connection to ALU X Input

Memory Access Path:

A Register holds address
    ┌──────────┐
RAMAS6C4008[A14:0]    └────┬─────┘
D[15:0]
          (read data)
         ├───────────────────┐
         │                   │
         ▼                   ▼
    To X_IN mux         To Data Bus
    (for ALU)          (for registers)
   ALU computation
    Result back to
    RAM or registers

The Complete M (Memory) Path

Detailed Memory Interface:

A Register:
A_Out[14:0] ────────────────┬───► RAM Address
RAM Chip (AS6C4008 × 2):  ┌──────────────────┐      │
Address[14:0] ◄──┴──────┘
  │                 │
/OE ◄─── READ_M/WE ◄─── WRITE_M/CE ◄─── GND  │                 │
D[15:0] ◄──────┤
  └──────┬──────────┘
M[15:0] (read data)
         ├──────────────────────────┬────────────┐
         │                          │            │
         ▼                          ▼            ▼
    To X_IN MUX               To D_In      To A_In
    (ALU X input)             (D reg)      (A reg)
         │                         │            │
    Select via               Load via      Load via
    A_OR_M=1                 LOAD_D        LOAD_A

Memory Read Cycle

Operation: D = M

Timeline:
─────────

t=0ns: A register holds address
       A_Out = 0x0100 (example)

t=10ns: RAM receives address
        Address decode begins

t=60ns: RAM access complete
        M[0x0100] data valid
        M[15:0] = 0xABCD (example)

t=65ns: Data reaches X_IN mux
        A_OR_M = 1 (select M)
        X_IN = M = 0xABCD

t=80ns: Through zx/nx processing
        X_PROC = 0xABCD

t=90ns: ALU computes (might just pass through)
        ALU_Out = 0xABCD

t=110ns: Clock edge
         LOAD_D pulses
         D0xABCD

Timing Diagram:
───────────────

CLK:     ───╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝

A_Out:   ──── 0x0100 ─────────────  (stable)

M[15:0]: ─────┌─── 0xABCD ─────────  (after access)
              └─ 60ns access time

X_IN:    ─────┌─── 0xABCD ─────────  (mux selects M)
              └─ 65ns

ALU_Out: ─────────┌── 0xABCD ──────  (ALU passes through)
                  └─ 90ns

LOAD_D:  ─────────────╗    ╔═══════  (pulse at clock)
                      ╚════╝

D_Out:   ──── old ────┴─── 0xABCD ──  (updated)

Memory Write Cycle

Operation: M = D

Timeline:
─────────

t=0ns: A register holds address
       D register holds data
       A_Out = 0x0100
       D_Out = 0x1234

t=10ns: RAM receives address
        Address decode

t=50ns: Control signals stable
        WRITE_M = 1 (active)
        /WE goes LOW

t=60ns: D_Out drives data bus
        Data = 0x1234

t=110ns: Clock edge
         WRITE_M pulses LOW
         RAM captures data
         RAM[0x0100]0x1234

Timing Diagram:
───────────────

CLK:     ───╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝

A_Out:   ──── 0x0100 ─────────────  (address)

D_Out:   ──── 0x1234 ─────────────  (data to write)

WRITE_M: ───────╗         ╔═══════  (write enable)
                ╚═════════╝

/WE:     ════════╗         ╔═══════  (active LOW)
                ╚═════════╝

Data_Bus: ──── 0x1234 ───────────   (D drives bus)

RAM[100]: ──── old ───┴─── 0x1234 ── (written!)

The Tri-State Bus System

Why Tri-State Matters

Problem: Multiple Drivers
─────────────────────────

Without tri-state:
    A_Out ──► ║
    D_Out ──► ║──► CONFLICT!
                  (bus fight)
    M_Out ──► ║

All trying to drive bus simultaneously!


With tri-state:
    A_Out ──[/OE]──► ─┐
    D_Out ──[/OE]──► ─┤──► Bus (only one active)
    M_Out ──[/OE]──► ─┘

Only one driver active at a time
Others are "floating" (hi-Z)

Bus Arbitration

Data Bus Control (16-bit shared bus):

    ┌──────────────────────────────┐
Main Data Bus          (16 wires)    └─┬───────┬────────┬───────┬──┘
      │       │        │       │
      ▼       ▼        ▼       ▼
   ┌────┐  ┌────┐  ┌────┐  ┌────┐
A  │  │ D  │  │RAM │  │ALU   │Reg │  │Reg │  │I/F │  │Out │
   └─┬──┘  └─┬──┘  └─┬──┘  └─┬──┘
     │       │       │       │
    /OE_A  /OE_D  /OE_M   /OE_ALU

Control Logic:
──────────────
Only ONE /OE can be active (LOW) at a time

Example: Loading D from A
├─ /OE_A = 0 (A drives bus)
├─ /OE_D = 1 (D listening, not driving)
├─ /OE_M = 1 (RAM disconnected)
├─ /OE_ALU = 1 (ALU disconnected)
└─ LOAD_D pulses → D captures A's value

74HC245 Bus Transceiver (for RAM)

RAM needs bidirectional access:

         74HC245 (Bus Transceiver)
         ┌──────────────┐
         │              │
CPU Bus ◄┼──► A side    │
[15:0]   │              │
DIR ◄───┤─── READ/WRITE
       (direction control)
         │              │
RAM Bus ◄┼──► B side    │
[15:0]   │              │
/OE ◄────┤─── ENABLE
       (3-state control)
         └──────────────┘

DIR = 0: BA (read from RAM)
DIR = 1: AB (write to RAM)

/OE = 0: Transceiver active
/OE = 1: All outputs hi-Z


Pin Connections (74HC245):
──────────────────────────
Pin 1:  DIR (direction control)
        ├─ 0 = BA (RAM to CPU)
        └─ 1 = AB (CPU to RAM)

Pin 19: /OE (output enable)
        ├─ 0 = Active
        └─ 1 = Hi-Z (disabled)

Pins 2-9:   A side (CPU data bus)
Pins 11-18: B side (RAM data bus)

Control Logic:
──────────────
DIR = WRITE_M (1 when writing, 0 when reading)
/OE = 0 (always enabled when RAM selected)

Complete Operation Examples

Let me trace THREE complete operations showing all connections:

Example 1: D = D + 1 (Simple Feedback)

Instruction: D = D + 1

Control Signals:
────────────────
ALU_CTRL = 011111 (increment Y input)
A_OR_M = 0 (don't care, not using X)
LOAD_D = 1 (save to D)
LOAD_A = 0 (don't save to A)
WRITE_M = 0 (don't write memory)


Phase 1: D Register Output (0-25ns)
────────────────────────────────────
D_Out = 0x0005 (current value)

    ┌──────────┐
D RegU3, U4    │ 74HC574  │
    └────┬─────┘
Q outputs: 0x0005


Phase 2: To ALU Y Input (25-40ns)
──────────────────────────────────
D_OutY input processing

    D_Out[15:0] = 0x0005
    ┌──────────┐
U9-U16Zero select (zy=0, pass through)
    │ 74HC157  │
    └────┬─────┘
    Y_ZERO = 0x0005
    ┌──────────┐
U10,12,Negate (ny=1, increment needs this)
14,16    │ 74HC86   │
    └────┬─────┘
    Y_PROC = 0x0005


Phase 3: ALU Computation (40-100ns)
────────────────────────────────────
X input doesn't matter (using zx=1X=0)
Y_PROC = 0x0005

ALU operation: X + Y + 1
Since zx=1: X = 0
Result = 0 + 5 + 1 = 6

    ┌──────────────┐
ALU CoreU17-U20Adders compute: 0 + 5
    │ 74HC283      │  Result: 5
    └──────┬───────┘
    ┌──────────────┐
Output StageU29-U32      │  no=1 → negate
    │ 74HC86       │  !5 = ... wait
    └──────┬───────┘
    Actually, let me recalculate:

ALU_CTRL = 011111
zx=0, nx=1, zy=1, ny=1, f=1, no=1

Step by step:
1. X path: zx=0,nx=1X becomes !X
2. Y path: zy=1,ny=1Y becomes !0 = 0xFFFF
3. Add: !X + 0xFFFF
4. Negate output

Actually for D+1, control is:
zx=0, nx=1, zy=1, ny=1, f=1, no=1

Hmm, let me check the actual encoding...

For Y+1 operation:
zx=1, nx=1, zy=0, ny=1, f=1, no=1

Let me trace correctly:
1. zx=1X = 0
2. nx=1X = !0 = 0xFFFF
3. zy=0 → use Y (D value)
4. ny=1Y = !D = !0x0005 = 0xFFFA
5. f=1ADD: 0xFFFF + 0xFFFA = 0xFFF9
6. no=1!0xFFF9 = 0x0006
    ALU_Out = 0x0006


Phase 4: Back to D Register (100-110ns)
────────────────────────────────────────
ALU_Out drives data bus
D register inputs receive 0x0006

t=110ns: Clock edge!
         LOAD_D pulses

    ┌──────────┐
D RegU3, U4D inputs │ ◄─── 0x0006
    └────┬─────┘
      CLK pulses
    D_Out = 0x0006

Result Check:
─────────────
Before: D = 0x0005
After:  D = 0x0006Incremented successfully!

Example 2: D = A + D (Two Inputs)

Instruction: D = A + D

Initial Values:
───────────────
A = 0x0003
D = 0x0005

Control Signals:
────────────────
ALU_CTRL = 000010 (add X and Y)
zx=0, nx=0, zy=0, ny=0, f=1, no=0
A_OR_M = 0 (use A, not M)
LOAD_D = 1
LOAD_A = 0
WRITE_M = 0


Phase 1: Both Registers Output (0-25ns)
────────────────────────────────────────

    ┌──────────┐        ┌──────────┐
A Reg    │        │ D RegU1, U2   │        │ U3, U4    └────┬─────┘        └────┬─────┘
         │                   │
    A_Out = 0x0003      D_Out = 0x0005
         │                   │
         ▼                   ▼


Phase 2: AX Input MUX (25-40ns)
───────────────────────────────────

A_Out and M both available to mux
A_OR_M = 0 → select A

    ┌──────────────────────┐
X Input MUX4× 74HC157        │
    │                      │
A[15:0] ──┐          │
    │           ├─ MUX ──► │ X_IN = 0x0003
M[15:0] ──┘          │
    │                      │
A_OR_M = 0 (select A)    └──────────┬───────────┘
          X_IN = 0x0003


Phase 3: Input Conditioning (40-50ns)
──────────────────────────────────────

X path: zx=0, nx=0 → pass through
Y path: zy=0, ny=0 → pass through

    X_PROC = 0x0003
    Y_PROC = 0x0005


Phase 4: ALU Core (50-100ns)
─────────────────────────────

ADD operation (f=1):

    X_PROC = 0x0003
  + Y_PROC = 0x0005
  ─────────────────
    SUM    = 0x0008

    ┌───────────────────┐
Adder ChainU17-U20    │ 74HC283 × 4    │                   │
0x0003 + 0x0005= 0x0008    └─────────┬─────────┘
         SUM = 0x0008
    ┌───────────────────┐
Function SelectU25-U28    │ 74HC157 × 4    │                   │
    │ f=1 → select ADD    └─────────┬─────────┘
        F_OUT = 0x0008
    ┌───────────────────┐
Output NegationU29-U32    │ 74HC86 × 4    │                   │
    │ no=0 → pass thru  │
    └─────────┬─────────┘
        ALU_Out = 0x0008


Phase 5: Save to D (100-110ns)
───────────────────────────────

ALU_OutData BusD_In

t=110ns: LOAD_D pulses

    ┌──────────┐
D Reg    │          │
D0x0008    └──────────┘


Result Check:
─────────────
Before: A = 0x0003, D = 0x0005
After:  A = 0x0003, D = 0x0008Sum computed and saved!


Connection Summary:
───────────────────
A_OutMUXX_PROCALU ──┐
                              ├─► ALU_OutD_In
D_Out ──────► Y_PROCALU ──┘

Both inputs used simultaneously!

Example 3: M = D + M (Memory Involved)

Instruction: M = D + M

Initial Values:
───────────────
A = 0x0100 (points to RAM address 256)
D = 0x0007
RAM[256] = 0x0003

Control Signals:
────────────────
ALU_CTRL = 000010 (add)
A_OR_M = 1 (use M, not A)
LOAD_D = 0
LOAD_A = 0
WRITE_M = 1 (save to memory!)


Phase 1: Memory Read (0-60ns)
──────────────────────────────

A register outputs address:

    ┌──────────┐
A Reg    │          │
A = 0x0100    └────┬─────┘
    A_Out = 0x0100
    ┌──────────┐
RAMAS6C4008    │          │
Addr ←───┤ 0x0100
    │          │
/OE = 0   (read enabled)
/WE = 1   (not writing yet)
    └────┬─────┘
         
    (60ns access time)
    M[256] = 0x0003


Phase 2: MX Input MUX (60-75ns)
───────────────────────────────────

A_Out and M both available
A_OR_M = 1 → select M

    ┌──────────────────────┐
X Input MUX    │                      │
A = 0x0100 ──┐       │
    │              ├─MUX ─►│ X_IN = 0x0003
M = 0x0003 ──┘       │
    │                      │
A_OR_M = 1 (select M)    └──────────┬───────────┘
          X_IN = 0x0003
          (memory value!)


Phase 3: Both Inputs to ALU (75-100ns)
───────────────────────────────────────

X path: M value (via mux)
Y path: D value (direct)

    X_PROC = 0x0003 (from M)
  + Y_PROC = 0x0007 (from D)
  ─────────────────
    ALU_Out = 0x000A


Phase 4: Write to Memory (100-150ns)
─────────────────────────────────────

ALU_Out goes to data bus
A still holds address (0x0100)

t=110ns: Clock edge
         WRITE_M pulses LOW
         RAM /WE goes LOW

    ┌──────────┐
RAM    │          │
Addr = 0x0100 (from A)
Data = 0x000A (from ALU)
    │          │
/WE ─────┘  LOW (write!)
    │          │
RAM[256]0x000A    └──────────┘


Result Check:
─────────────
Before:
├─ A = 0x0100 (address)
├─ D = 0x0007 (data)
└─ RAM[256] = 0x0003

After:
├─ A = 0x0100 (unchanged)
├─ D = 0x0007 (unchanged)
└─ RAM[256] = 0x000A 
    (0x0003 + 0x0007 = 0x000A)


Connection Summary:
───────────────────
RAM[A]MMUXX_PROCALU ──┐
                                   ├─► ALU_Out
D_Out ─────────► Y_PROCALU ────┘         │
A_Out ──────────────────────────────┐       │
                                    ▼       │
                              RAM Address                                    │       │
                              RAM ◄─────────┘
                              (write back)

Read from memory, compute, write back!

PCB Routing Strategies

Layer Assignment Philosophy

TOP LAYER (F.Cu): Signal Routing
─────────────────────────────────
├─ Register outputs to ALU inputs
├─ ALU output to register inputs  
├─ Control signals (LOAD_A, LOAD_D, etc)
├─ Short, direct point-to-point connections
└─ Horizontal preference

BOTTOM LAYER (B.Cu): Power + Vertical Signals
──────────────────────────────────────────────
├─ Ground plane (large pour)
├─ +5V plane (where needed)
├─ Vertical signal crossings
├─ Return paths for all signals
└─ Memory data bus (needs more routing space)

Critical Trace Groups

Group 1: A_Out to X MUX (16 traces)
────────────────────────────────────
From: A Register (U1, U2)
To:   X Input MUX (74HC157 × 4)

Routing:
├─ Length: 40-60mm typical
├─ Width: 0.4mm (15 mil)
├─ Spacing: 0.3mm (12 mil)
├─ Layer: Top (F.Cu)
├─ Keep parallel
└─ No vias

Termination: None needed (short traces)


Group 2: D_Out to Y Processing (16 traces)
───────────────────────────────────────────
From: D Register (U3, U4)
To:   Y Input Processing (74HC157 × 4)

Routing:
├─ Length: 40-60mm typical
├─ Width: 0.4mm
├─ Spacing: 0.3mm
├─ Layer: Top (F.Cu)
├─ Straight run
└─ Shortest path possible


Group 3: ALU_Out to Registers (16 traces)
──────────────────────────────────────────
From: ALU Output (U29-U32)
To:   A_In, D_In, M_In

Routing:
├─ Length: 50-80mm
├─ Width: 0.4mm
├─ Spacing: 0.3mm
├─ Layer: Top and Bottom (may need vias)
├─ Fan-out to three destinations
└─ Star topology from ALU


Group 4: Control Signals (6 traces)
────────────────────────────────────
Signals: LOAD_A, LOAD_D, WRITE_M, A_OR_M, etc.

Routing:
├─ Length: Variable (star topology)
├─ Width: 0.4mm
├─ Layer: Top (F.Cu)
├─ Keep away from data buses
├─ No crosstalk with data
└─ Buffer if fanout > 4

Example Board Layout

Physical arrangement (10cm × 15cm board):

  0mm ┌────────────────────────────────────┐
[Connectors: J1-J6]      ├────────────────────────────────────┤
 20mm │ [A Register: U1-U2]      │          ║ 16 traces                │
 30mm │          ║                          │
      │          ▼                          │
 40mm │ [X Input MUX: 74HC157 × 4]      │                                     │
[D Register: U3-U4]      │          ║ 16 traces                │
 60mm │          ▼                          │
[Y Input Processing]      ├────────────────────────────────────┤
 80mm │        [ALU CORE]Adders, AND gates, MUXes         (largest section - 20+ ICs)120mm ├────────────────────────────────────┤
[Output Stage]      │          ║                          │
      │          ║ 16 traces (result bus)      │          ║                          │
      │          ╚═══╦═══╗                  │
      │              ║   ║                  │
140mm │         To A  To D  To RAM150mm └────────────────────────────────────┘

Trace Routing:
──────────────
AMUX:    Vertical, left side
DY:      Vertical, right side
ALU→Regs: Fan-out from center
Control:  Horizontal, top layer
Power:    Bottom layer (flood)

Signal Integrity Considerations

Capacitive Loading

Each output drives multiple inputs:

Example: D_Out[0] fanout
────────────────────────
D Register U3.Q0 (pin 6)
        ├─► Y Input MUX (U9 pin 2)
Load: 1pF
        ├─► (Future expansion points)
Total capacitance: ~5pF

Output drive capability:
74HC574 can drive 50pF easily
No buffer needed!


Critical case: ALU_Out fanout
──────────────────────────────
ALU_Out[0] from U29 (pin 3)
        ├─► A Register (U1 pin 2)      1pF
        ├─► D Register (U3 pin 2)      1pF
        ├─► RAM interface (transceiver) 2pF
        ├─► Test point                 1pF
Total: ~5pF + trace capacitance

Trace capacitance:
50mm trace @ 100pF/m = 5pF

Total: 10pF - still okay!

Rise time calculation:
R_source × C_load = 100Ω × 10pF = 1ns
Fast enough for MHz operation ✓

Crosstalk Prevention

Problem: Parallel traces coupling
──────────────────────────────────

When traces run parallel:

    Signal A ────────────────
          (coupled capacitance)
    Signal B ────────────────

Fast edge on A induces noise on B!

Solution: Proper spacing
────────────────────────

    Signal A ────────────────
                 (0.3mm gap)
    Signal B ────────────────

Crosstalk < 5% with 0.3mm spacing
(at 1MHz, 0.4mm traces)


Additional techniques:
──────────────────────
1. Ground trace between critical signals
   Data0 ──── GND ──── Data1 ────

2. Different layers for different buses
   Top: A_Out bus
   Bottom: D_Out bus (no parallel run)

3. Perpendicular crossing when needed
   ────────── (A_Out, horizontal)
         (D_Out, vertical)

Ground Bounce (Power Integrity)

Problem: Simultaneous Switching
────────────────────────────────

When all 16 ALU outputs switch:
├─ 16 × 5mA = 80mA current spike
├─ Through ground inductance
├─ Creates voltage bounce
└─ Can cause false triggering!

Ground bounce voltage:
V = L × (di/dt)
  = 10nH × (80mA / 2ns)
  = 0.4V bounce!

Solution: Decoupling Capacitors
────────────────────────────────

Place 0.1µF cap at EVERY IC:

    VCC ─────┬───── IC
            ┌┴┐ 0.1µF
 (very close!)
            └┬┘
    GND ─────┴───── IC

Capacitor provides local current:
├─ IC switches → draws current from cap
├─ Cap is close (low inductance)
├─ Ground bounce reduced to ~50mV
└─ Safe for operation!
Additional bulk caps:
├─ 10µF at power entry
├─ 100µF for entire board
└─ Stabilizes power supply

Performance Analysis

Maximum Operating Frequency

Critical Path Delay Budget:
───────────────────────────

Register output:        25ns
├─ 74HC574 CLK-to-Q

Through input mux:      12ns
├─ 74HC157 propagation

Input conditioning:     15ns
├─ Zero select (74HC157)
├─ Negate (74HC86)

ALU core:              60ns
├─ Preprocessing (already counted)
├─ Adder carry chain: 40ns
├─ Function select: 10ns
├─ Output negate: 10ns

Register setup:         12ns
├─ 74HC574 setup time

Clock skew allowance:   6ns
├─ Clock distribution variation

TOTAL:                 130ns
Maximum frequency:     7.7 MHz

Conservative design:   100ns cycle (10 MHz)
Typical operation:     125ns cycle (8 MHz)

Instruction Throughput

Instructions Per Second:
────────────────────────

At 8 MHz clock:
├─ 8,000,000 cycles/second
├─ Typical instruction: 1 cycle
└─ 8 MIPS (Million Instructions Per Second)

Some instructions need multiple cycles:
├─ Memory access: 1 cycle
├─ Jump: 1 cycle (PC load)
├─ Complex calculations: 1 cycle
└─ Almost everything: 1 cycle!

Effective throughput: ~7-8 MIPS


Comparison:
───────────
Architecture        Frequency    MIPS
────────────────────────────────────────
Hack (our design)   8 MHz        ~8
Original 8080       2 MHz        ~0.5
MOS 6502           1 MHz        ~1
Z80                4 MHz        ~1
68000              8 MHz        ~1

Our Hack is competitive!
(Simple architecture = high efficiency)

Summary: The Connection Philosophy

Key Design Principles

1. Direct Paths Where Possible

DALU Y input: Direct
└─ No multiplexing needed
└─ Simplest, fastest path

2. Multiplex Only When Necessary

A vs MALU X input: Multiplexed
└─ Need to choose between two sources
└─ One 74HC157 mux per 4 bits

3. Shared Result Bus

ALURegisters: Common bus
└─ Time-multiplexed (control signals)
└─ Only one destination active at once

4. Synchronous Operation

Everything clocked together
└─ Predictable timing
└─ No race conditions

5. Feedback Loop

RegisterALURegister
└─ Computational loop
└─ Enables iterative calculations

The Elegant Integration

The register-ALU connections show how simple components create complex computation:

Just three types of connections:
├─ Register outputs (sources)
├─ ALU inputs (via mux when needed)
└─ ALU output (feedback to registers)

These three connections enable:
├─ Arithmetic (+, -, increment, decrement)
├─ Logic (AND, OR, NOT)
├─ Memory access (load, store)
├─ Program flow (jumps based on ALU flags)
└─ Complete computation!

The connections are the nervous system of the computer - they carry data and enable the computational feedback loop that makes programming possible!


Would you like me to explain:

  1. The memory interface connections in the same detail?
  2. How the control unit generates all these signals?
  3. The complete fetch-decode-execute cycle with all connections active?
  4. How to optimize this design for higher speeds?