Y
Published on

The Hack Registers Explained A Deep Dive into Memory and Control

Authors
  • avatar
    Name
    Yinhuan Yuan
    Twitter

The Hack Registers Explained: A Deep Dive into Memory and Control

Let me explain registers from the ground up, building a complete understanding of how the Hack computer stores and manages data.


What is a Register? (The Foundation)

The Basic Concept

Think of a register as a sticky note for the computer:

  • It remembers a number
  • It holds that number until you tell it to change
  • It can give you back that number instantly
  • It's extremely fast (nanoseconds)
Analogy: Your Calculator's "Memory" Button

When you press [M+] on a calculator:
├─ Calculator STORES current number
├─ Number STAYS there (even if you clear screen)
├─ Press [MR] to RECALL that number
└─ Press [M+] again to REPLACE it

A register does exactly this - but in hardware!

Why Do We Need Registers?

The Problem:

Imagine trying to compute: (5 + 3) × 2

Without registers:
1. Add 5 + 3 = 8
2. Uhh... where did the 8 go?
3. Can't multiply it - we lost it!

With registers:
1. Add 5 + 3 = 8
2. STORE 8 in register
3. Load 8 from register
4. Multiply 8 × 2 = 16

Registers are the working memory of the CPU. Without them, you can only do one operation and then lose the result!

The Speed Hierarchy

Speed vs Capacity tradeoff:

Registers    │ ████████████ Fastest (< 1ns)Tiny (a few numbers)
             │                                  │
Cache        │ ██████████ Very Fast (1-10ns)Small (KB-MB)
             │                                  │
RAM          │ ████ Fast (50-100ns)Medium (GB)
             │                                  │
SSD          │ ██ Slow (microseconds)Large (TB)
             │                                  │
Hard Drive   │ █ Very Slow (milliseconds)Huge (TB)

Registers are at the TOP of the pyramid!

Why registers are fast:

  • Right next to the ALU (no distance to travel)
  • Direct electrical connection
  • No addressing needed
  • Always powered and ready

The Three Registers in Hack Computer

The Hack computer has exactly three registers:

Overview

┌────────────────────────────────────────────────────┐
│                                                    │
A Register (16-bit)│  ├─ Address register                               │
│  ├─ Also used for data                             │
│  └─ Dual purpose!│                                                    │
D Register (16-bit)│  ├─ Data register                                  │
│  ├─ Only stores data                               │
│  └─ ALU input                                      │
│                                                    │
PC - Program Counter (15-bit)│  ├─ Points to next instruction                     │
│  ├─ Auto-increments                                │
│  └─ Controls program flow                          │
│                                                    │
└────────────────────────────────────────────────────┘

Why Only Three Registers?

Design philosophy of Hack computer:

  • Simplicity over speed
  • Easier to understand with fewer registers
  • Matches the Nand2Tetris educational goals
  • Still Turing-complete (can compute anything!)

Comparison:

Architecture    Registers    Philosophy
────────────────────────────────────────────────
Hack            3            Educational simplicity
8080            7            Early microprocessor
x86-64          16           Performance
ARM             31           RISC efficiency
GPU             1000s        Massive parallelism

Now let's dive deep into each register!


The A Register (Address/Data Register)

The Dual Purpose Design

The A register is special - it has two jobs:

Job 1: Address Register
├─ Holds memory addresses
├─ Points to RAM locations
├─ Used for: @commands in Hack assembly
└─ Example: @100 means "A = 100"

Job 2: Data Register
├─ Holds regular data values
├─ Can be used in ALU operations
├─ Alternative to D register
└─ Example: A = A + 1

Why dual purpose?

  • Saves a register!
  • Many instructions need an address anyway
  • Flexible programming model

Real-World Use Examples

Example 1: Loading a constant
──────────────────────────────
Assembly:  @17
           D=A

What happens:
1. A Register17 (constant loaded)
2. D RegisterA (copy to D)

Result: D = 17


Example 2: Accessing memory
───────────────────────────
Assembly:  @100
           M=D

What happens:
1. A Register100 (address)
2. RAM[100]D (store D at address 100)

Result: RAM[100] now contains value from D


Example 3: Jump to address
──────────────────────────
Assembly:  @LOOP
           0;JMP

What happens:
1. A Register ← address of LOOP label
2. PCA (program counter jumps to A)

Result: Program continues from LOOP

Hardware Implementation: 74HC574

The A register uses the 74HC574 integrated circuit.

What is a 74HC574?

74HC574: Octal D-Type Flip-Flop with 3-State Output

"Octal" = 8 bits
"D-Type Flip-Flop" = basic memory element
"3-State Output" = can be enabled/disabled

One IC handles 8 bits, so we need two 74HC574s for 16 bits:

  • U1: Handles bits [7:0] (low byte)
  • U2: Handles bits [15:8] (high byte)

Inside the 74HC574

Let me explain what's inside this chip:

Simplified view of ONE bit (multiply by 8):

         Data In
       ┌────────┐
D  CLK─►│   Q  ──┼──► Data Out
       │        │
       └────────┘
     D Flip-Flop
     (The memory!)

When CLK goes HIGHLOW (falling edge):
├─ Whatever is on D gets CAPTURED
├─ Stored in the flip-flop
├─ Appears on Q
└─ STAYS there until next clock edge

The data is "latched" - it remembers!

The D Flip-Flop: How Bits Remember

This is fundamental - let me explain carefully:

The Master-Slave D Flip-Flop

        Data In (D)
      ┌──────────┐        ┌──────────┐
MASTER  │        │  SLAVELATCH   │───────►│  LATCH   │──► Q (Output)
      └──────────┘        └──────────┘
            ▲                   ▲
            │                   │
         CLK (edge triggered)

Operation:
1. Rising edge: Master captures D
2. Falling edge: Slave captures Master
3. Output Q shows saved value
4. Stays there UNTIL next clock cycle

This is how computers REMEMBER!

The transistor-level truth:

A flip-flop is made of cross-coupled gates that form a feedback loop:

Simplified SR Latch (Set-Reset):

    S ─────┐
           NOR ──┬─── Q
       ┌───┘     │
       │         │
       └─────────┤
               NOR ──── !Q
    R ─────┐    │
           └────┘

When S=1, Q=1 (Set)
When R=1, Q=0 (Reset)
When both 0, Q REMEMBERS!

The feedback loop creates the memory effect!

Why this works:

  • The output feeds back to the input
  • Creates a bistable state (stable at 0 or 1)
  • Requires continuous power to maintain
  • This is volatile memory (loses data when power off)

74HC574 Pinout and Connections

         74HC574 (DIP-20 package)
         ┌──────┴──────┐
    /OE1          20VCC (+5V)
    D02          19D7   ◄── Data inputs
    D13          18D6
    D24          17D5
    D35          16D4
    Q06          15Q7   ◄── Data outputs
    Q17          14Q6
    Q28          13Q5
    Q39          12Q4
    GND10         11CLK  ◄── Clock input
         └─────────────┘

Pin Functions:
─────────────
D0-D7:  Data inputs (what you want to store)
Q0-Q7:  Data outputs (what's currently stored)
CLK:    Clock - captures data on HIGHLOW edge
/OE:    Output Enable (active LOW)
        └─ When LOW: outputs enabled
        └─ When HIGH: outputs go hi-Z (disconnected)
VCC:    Power (+5V)
GND:    Ground (0V)

How the A Register Stores Data

Let's trace through a complete storage operation:

SCENARIO: Store 0xABCD in A register

Initial state:
├─ A register contains: 0x0000
├─ Data bus has: 0xABCD
└─ LOAD_A signal is about to pulse

Step 1: Setup (before clock edge)
──────────────────────────────────
Data Bus[15:8] = 0xAB ──► U2.D7-D0
Data Bus[7:0]  = 0xCD ──► U1.D7-D0

      0xAB                    0xCD
        │                       │
     ┌──▼──────────┐       ┌───▼─────────┐
U2 (Hi)   │       │   U1 (Lo)     │  74HC574    │       │  74HC574    │
     │             │       │             │
D7..D0  Q7..Q0│     │ D7..D0  Q7..Q010101011│     │     │ 11001101│     │
      (waiting)│     │      (waiting)│     │
     └───────────┘       └─────────────┘

LOAD_A signal: ─────────╗
                          (high, about to fall)


Step 2: Clock edge (falling edge)
──────────────────────────────────
LOAD_A signal: ────────╗
                       ╚═══════  ◄── Falling edge HERE!

The moment the clock falls:
├─ U1 captures 0xCD into its flip-flops
├─ U2 captures 0xAB into its flip-flops
├─ Data is now STORED inside the ICs
└─ Takes only ~5ns!


Step 3: After clock (data retained)
────────────────────────────────────
     ┌────────────┐       ┌─────────────┐
U2 (Hi)  │       │   U1 (Lo)     │  74HC574   │       │  74HC574    │
     │            │       │             │
Stored: AB │       │ Stored: CDQ7..Q0     │       │ Q7..Q010101011───┼──►    │ 11001101────┼──►
     └────────────┘       └─────────────┘
            │                     │
            └──────┬──────────────┘
         A_Out[15:0] = 0xABCD
Data bus can now change - doesn't matter!
The A register REMEMBERS 0xABCD


Step 4: Later - reading the value
──────────────────────────────────
No clock needed!

Output is ALWAYS available:
├─ A_Out[15:0] continuously shows 0xABCD
├─ Can be read by ALU any time
├─ Can be used as address any time
├─ Stays there until next LOAD_A pulse
└─ Even if data bus changes!

This is "transparent" output

Timing Diagram (Critical Understanding)

Time:    │←─ 1 clock cycle ─→│←─ 1 clock cycle ─→│
         │                   │                   │

CLOCK:   ───╗     ╔═══╗     ╔═══╗     ╔═══
            ╚═════╝   ╚═════╝   ╚═════╝
                ▲           ▲           ▲
            Falling edge (captures data)

Data_In: ─── 0x1234 ────── 0x5678 ────── 0xABCD ───
                           
LOAD_A:  ──────────╗                   ╔══════════
                   ╚═══════════════════╝
                   │                   │
                   └─ Enabled ─────────┘

A_Out:   ─── 0x0000 ─────── 0x1234 ────── 0x5678 ───
                           ▲               ▲
                           │               │
                     Captured here    Captured here


Critical Timing Parameters (74HC574):
─────────────────────────────────────
Setup time:    12ns  (data must be stable BEFORE clock)
Hold time:     3ns   (data must be stable AFTER clock)
Propagation:   25ns  (CLK to Q output change)
Clock-to-Q:    20ns  (how fast output updates)

If timing violated → DATA CORRUPTION!

Why We Use 74HC574 (Design Choice)

Advantages:

Simple interface (just clock it)
Transparent outputs (always readable)
3-state outputs (can disconnect from bus)
Edge-triggered (clean data capture)
Fast (25ns max)
Low power (CMOS)
Cheap (~$0.50 per IC)
Available everywhere

Alternatives considered:

Option 1: 74HC374 (similar but different pinout)
├─ Functionally identical
├─ Different pin arrangement
└─ Either works fine

Option 2: 74HC273 (no 3-state)
├─ Cannot disconnect from bus
├─ Less flexible
└─ Not chosen

Option 3: 74HC377 (with enable)
├─ Has separate enable pin
├─ More control, more complex
└─ Overkill for Hack

Option 4: SRAM (6264 or similar)
├─ Overkill (we only need 16 bits!)
├─ Requires addressing logic
├─ Slower
└─ Not practical for registers

The Output Enable Feature

The 74HC574 has a special feature: 3-state output

What is 3-state (tri-state)?

Normal output:
├─ HIGH (1, ~5V)
└─ LOW (0, ~0V)

3-state output adds:
└─ Hi-Z (high impedance, "floating")

Pin 1 (/OE - Output Enable, active LOW):
─────────────────────────────────────────
/OE = 0Outputs ENABLED (normal operation)
/OE = 1Outputs DISABLED (hi-Z)


Why this matters:

With /OE feature:
┌─────────┐         ┌─────────┐
Reg A  │─────┬───│  Reg D/OE=1   │     │   │ /OE=0└─────────┘     │   └─────────┘
    (off)             (on)
           Shared Bus
           Only D drives it!

Without /OE (collision!):
┌─────────┐         ┌─────────┐
Reg A  │─────┬───│  Reg D│ outputs │     │   │ outputs │
└─────────┘     │   └─────────┘
    ║           │       ║
    ╚═══════════╪═══════╝
              FIGHT!
         (both trying to drive bus)
         (can damage chips!)

In the Hack computer:

  • For A and D registers, /OE is tied to GND
  • This means outputs are always enabled
  • This is fine because each register has dedicated output wires
  • No bus sharing for these registers

When you WOULD use /OE:

  • Multiple devices on shared bus
  • Memory banks (only one enabled at a time)
  • I/O port multiplexing

The D Register (Data Register)

Purpose and Design

The D register is the pure data storage register:

Job: Store intermediate calculation results
├─ NOT used for addressing
├─ Only used for data
├─ Primary ALU input
└─ Simplest register

Why separate from A?
├─ A is busy being an address
├─ Need somewhere to save ALU results
├─ Allows more complex calculations
└─ Standard CPU architecture

Hardware Implementation

Exactly the same as A register!

  • 2× 74HC574 chips (U3, U4)
  • Same connections
  • Same operation
  • Different control signal (LOAD_D instead of LOAD_A)
         D Register Schematic

Data_In[7:0] ────────┐
                ┌────▼─────┐
LOAD_D ────────►│ U3                │ 74HC574  │
GND ────────────►│ /OE                │          │──► D_Out[7:0]
                └──────────┘

Data_In[15:8] ───────┐
                ┌────▼─────┐
LOAD_D ────────►│ U4                │ 74HC574  │
GND ────────────►│ /OE                │          │──► D_Out[15:8]
                └──────────┘

Power & Ground omitted for clarity

Usage Patterns

Typical D Register Operations:

Pattern 1: Store ALU result
────────────────────────────
D = D + 1

How it works:
1. D_OutALU input
2. ALU computes D + 1
3. ALU_OutData_In
4. LOAD_D pulse
5. Result stored in D

Timeline:
    t=0ns:  D_Out = 5
    t=100ns: ALU_Out = 6
    t=110ns: LOAD_D pulses
    t=111ns: D_Out = 6

Pattern 2: Save calculation
───────────────────────────
D = A + D

1. A_OutALU X input
2. D_OutALU Y input
3. ALU computes A + D
4. ResultData_In
5. LOAD_D pulse
6. Sum saved in D


Pattern 3: Load from memory
───────────────────────────
D = M

1. A has address
2. RAM[A]Data_In
3. LOAD_D pulse
4. D now contains RAM value

Why Not Just Use A for Everything?

Good question! Here's why we need D:

Scenario: Compute (RAM[5] + RAM[6]) and store in RAM[7]

With A and D (actual design):
────────────────────────────
@5          // A = 5
D = M       // D = RAM[5] (saved!)
@6          // A = 6
D = D + M   // D = RAM[5] + RAM[6]
@7          // A = 7
M = D       // RAM[7] = result ✓

Total: 6 instructions


Without D register (hypothetical):
──────────────────────────────────
@5          // A = 5
A = M       // A = RAM[5]... but wait!
@6          // A = 6, LOST previous value!
// Can't do it! Need somewhere to save RAM[5]

IMPOSSIBLE without D register!

The D register provides "scratch space" for calculations.


The PC Register (Program Counter)

This is the most complex and most interesting register!

What is a Program Counter?

The Program Counter (PC) is a special register that:
├─ Points to the NEXT instruction to execute
├─ Automatically increments after each instruction
├─ Can JUMP to different addresses
└─ Controls program flow

Analogy: Reading a Book
────────────────────────
PC is like your finger pointing at:
├─ Current line you're reading
├─ Moves down one line after reading (auto-increment)
├─ Can jump to a different page (jump instruction)
└─ Bookmarks your place

Why PC is Special

Unlike A and D, the PC must:

  1. Auto-increment (add 1 automatically)
  2. Load new values (for jumps)
  3. Reset to zero (at startup)

This needs different hardware than simple storage!

Hardware Implementation: 74HC161

The PC uses the 74HC161 integrated circuit.

What is a 74HC161?

74HC161: 4-bit Synchronous Binary Counter

"4-bit" = counts 0-15 in one IC
"Synchronous" = all bits change together (not ripple)
"Binary Counter" = increments by 1 each clock
"Parallel Load" = can load any value directly

We need FOUR of these for 15 bits:

  • U6: Bits [3:0]
  • U7: Bits [7:4]
  • U8: Bits [11:8]
  • U9: Bits [14:12] + one unused bit

Why 15 bits, not 16?

  • Hack addressing is 0-32767 (32K)
  • 2^15 = 32768 addresses
  • Don't need 16th bit
  • Saves hardware!

Inside the 74HC161

Simplified 4-bit counter logic:

Data Inputs (A, B, C, D)
    │  │  │  │
    ▼  ▼  ▼  ▼
┌──────────────────┐
4 D Flip-Flops  │ ◄── Storage
  (Like 74HC574)└────┬──┬──┬──┬────┘
     │  │  │  │
  ┌──▼──▼──▼──▼──┐
Increment   │ ◄── +1 Logic
Logic  └───────────────┘
     │  │  │  │
     ▼  ▼  ▼  ▼
   QA QB QC QD (Outputs)
     ┌──▼──┐
RCO │ ◄── Ripple Carry Out (to next stage)
     └─────┘

Control Signals:
├─ CLK: When to count
├─ ENP, ENT: Enable counting (both must be HIGH)
├─ /LOAD: Load parallel data (active LOW)
└─ /CLR: Clear to zero (active LOW)

The Increment Logic

How does it add 1? Here's the actual logic:

4-bit Binary Counter Logic:

CurrentNext (increment by 1):

Q3 Q2 Q1 Q0Q3 Q2 Q1 Q0
─────────────────────────────
0  0  0  00  0  0  1   (01)
0  0  0  10  0  1  0   (12)
0  0  1  00  0  1  1   (23)
0  0  1  10  1  0  0   (34)
...
1  1  1  10  0  0  0   (150, overflow!)
                      └─ RCO (Ripple Carry Out) = 1

Logic equations:
────────────────
Q0_next = !Q0                    (toggle every time)
Q1_next = Q1 XOR Q0              (toggle when Q0=1)
Q2_next = Q2 XOR (Q1 AND Q0)     (toggle when Q1,Q0=1)
Q3_next = Q3 XOR (Q2 AND Q1 AND Q0)

RCO = Q3 AND Q2 AND Q1 AND Q0    (all bits high?)

This is implemented with:
├─ XOR gates (for toggle)
├─ AND gates (for carry propagation)
└─ D flip-flops (for storage)

74HC161 Pinout and Connections

         74HC161 (DIP-16 package)
         ┌──────┴──────┐
   /CLR1          16VCC (+5V)
    CLK2          15RCO (Ripple Carry Out)
     A3          14QA  ◄── Counter outputs
     B4          13QB
     C5          12QC
     D6          11QD
    ENP7          10ENT
    GND8           9/LOAD
         └─────────────┘

Pin Functions:
──────────────
A, B, C, D:  Parallel load inputs (data to load)
QA-QD:       Counter outputs (current count)
CLK:         Clock input (count on rising edge)
ENP, ENT:    Enable inputs (both must be HIGH to count)
/LOAD:       Load enable (active LOW - loads A,B,C,D)
/CLR:        Clear (active LOW - resets to 0000)
RCO:         Ripple Carry Out (goes HIGH when count = 1111)

Chaining Multiple 74HC161s for 15-bit Counter

This is the critical part - connecting four counters:

     U6 [3:0]        U7 [7:4]        U8 [11:8]       U9 [14:12]
     ┌────────┐      ┌────────┐      ┌────────┐      ┌────────┐
A[0]─┤3     14├─PC[0]3     14├─PC[4]3     14├─PC[8]3     14├─PC[12]
A[1]─┤4     13├─PC[1]4     13├─PC[5]4     13├─PC[9]4     13├─PC[13]
A[2]─┤5     12├─PC[2]5     12├─PC[6]5     12├─PC[10]5     12├─PC[14]
A[3]─┤6     11├─PC[3]6     11├─PC[7]6     11├─PC[11]6      11├─(unused)
     │        │      │        │      │        │      │        │
CLK─►│2      15│     │2      15│     │2      15│     │2      15RCO├────►│7,10  RCO├────►│7,10  RCO├────►│7,10  RCO├──(unused)
     │        │  │  │        │  │  │        │  │  │        │
7,10    │  │  │        │  │  │        │  │  │        │
     └────────┘  │  └────────┘  │  └────────┘  │  └────────┘
                 │              │              │
         Carry chain: enables next stage only when previous overflows

Control signals (shared by all):
────────────────────────────────
CLKAll counters clock together
/LOADAll counters load together (for jumps)
/CLRAll counters clear together (reset)

ENP, ENT on U6Controls counting (PC_INC signal)
ENP, ENT on U7,U8,U9Connected to previous RCO

The Carry Chain Explained

How the 15-bit counter works:

Stage 1 (U6, bits 0-3):
────────────────────────
Counts: 0, 1, 2, 3, ..., 14, 15, 0, 1, 2...
RCO is HIGH when count = 15 (1111)

Stage 2 (U7, bits 4-7):
────────────────────────
Only increments when U6.RCO = HIGH
This happens every 16 counts
So U7 counts: 0, 0, 0, ..., 1, 1, 1, ...
(Changes every 16 clock cycles)

Stage 3 (U8, bits 8-11):
────────────────────────
Only increments when U7.RCO = HIGH
This happens every 256 counts
Counts the "pages" of memory

Stage 4 (U9, bits 12-14):
─────────────────────────
Only increments when U8.RCO = HIGH
This happens every 4096 counts
Provides the highest address bits

Maximum count:
──────────────
Binary: 111111111111111 (15 ones)
Decimal: 32767
Hex: 0x7FFF

Then wraps to 0 (overflow)

Visual representation of counting:

CountU9[14:12]U8[11:8]U7[7:4]U6[3:0]           (4096s)    (256s)    (16s)    (1s)─────────┼───────────┼──────────┼─────────┼─────────┤
00000000000000001000000000000001..........     ....     ....   
15000000000001111   │◄─U6 about to overflow
16000000000010000   │◄─U7 increments!
..........     ....     ....   
255000000011111111256000000100000000   │◄─U8 increments!
..........     ....     ....   
32767111111111111111   │◄─Maximum
32768000000000000000   │◄─Wraps around!

PC Control Logic

The PC has three modes of operation:

Mode 1: INCREMENT (normal operation)
─────────────────────────────────────
Condition: No jump instruction
Action: PC = PC + 1
Control: PC_INC = HIGH, /LOAD = HIGH

Mode 2: JUMP (load new address)
────────────────────────────────
Condition: Jump instruction taken
Action: PC = A (load from A register)
Control: PC_INC = LOW, /LOAD = LOW

Mode 3: RESET (startup)
───────────────────────
Condition: System reset
Action: PC = 0
Control: /CLR = LOW

The Control Circuit

We need logic to decide: increment or load?

Using 74HC00 (NAND gates):

Inputs:
├─ JUMP (from control unit)
├─ RESET (from reset button)
└─ CLK (system clock)

Outputs:
├─ PC_INC (enable counting)
├─ PC_LOAD (/LOAD signal)
└─ PC_CLR (/CLR signal)

Logic:
──────
PC_LOAD = JUMP (when jumping, load new address)
PC_INC = !JUMP AND !RESET (count when not jumping/reset)
PC_CLR = RESET (clear on reset)


Circuit with 74HC00:
────────────────────
       ┌────┐
JUMP──►│1  3├──► PC_LOAD (inverted for /LOAD)
JUMP──►│2       └────┘
       74HC00 (NAND gate used as NOT)

       ┌────┐
JUMP──►│4  6├──┐
RESET─►│5   │  │
       └────┘  │
             PC_INC (HIGH when both inputs LOW)

PC Operation Examples

Let me trace through complete operations:

Example 1: Normal Increment (Sequential Execution)

Initial state:
├─ PC = 0x0100 (256)
├─ Next instruction is at address 0x0101
└─ No jump condition

Step-by-step:
─────────────

t=0ns: Fetch instruction from ROM[0x0100]
├─ PC_Out = 0x0100
├─ ROM receives address
└─ Instruction retrieved

t=50ns: Execute instruction (say, D=D+1)
├─ ALU does computation
├─ Result stored in D
└─ No jump (JUMP=0)

t=100ns: Clock rises (increment PC)
├─ JUMP=0, so PC_INC=1
├─ ENP=1, ENT=1 on U6
├─ Counter increments
└─ PC becomes 0x0101

t=110ns: New PC value stable
├─ PC = 0x0101├─ Points to next instruction
└─ Ready for next cycle

Timeline:
────────
CLK:    ────╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝

PC_INC: ═══════════════════════════  (HIGH, counting enabled)

PC_Out: ──── 0x0100 ────── 0x0101 ──
                    Incremented here

Example 2: Jump Instruction (Non-Sequential)

Scenario: Execute "0;JMP" (unconditional jump)

Initial state:
├─ PC = 0x0100
├─ A = 0x0200 (jump target)
└─ Instruction says: always jump

Step-by-step:
─────────────

t=0ns: Fetch JMP instruction
├─ ROM[0x0100] contains jump instruction
├─ Control unit decodes it
└─ Determines: JUMP=1

t=50ns: Control signals asserted
├─ JUMP=1 (jump taken)
├─ PC_INC=0 (don't increment)
├─ /LOAD=0 (load mode)
└─ PC_CLR=1 (not clearing)

t=100ns: Clock rises
├─ /LOAD is LOW → parallel load enabled
├─ A[14:0]Counter inputs
├─ All flip-flops load: PCA
└─ PC becomes 0x0200

t=110ns: Jump complete
├─ PC = 0x0200├─ Now pointing at jump target
└─ Next instruction fetched from 0x0200

Timeline:
────────
CLK:    ────╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝
              Jump loads here

JUMP:   ════════════════════════  (HIGH during jump)

/LOAD:  ───────╗    ╔══════════  (LOW to load)
               ╚════╝

A[14:0]: ───── 0x0200 ──────────  (jump target)

PC_Out: ──── 0x0100 ────── 0x0200
                 Jumped! (not incremented)

Example 3: Conditional Jump Not Taken

Scenario: "D;JGT" (jump if D > 0), but D=0

Initial state:
├─ PC = 0x0100
├─ A = 0x0200
├─ D = 0
└─ Instruction: jump if D > 0

Step-by-step:
─────────────

t=0ns: Check condition
├─ ALU computes: is D > 0?
├─ ZR flag = 1 (zero)
├─ Condition FALSE
└─ Jump NOT taken

t=50ns: Control signals
├─ JUMP=0 (don't jump)
├─ PC_INC=1 (increment instead)
├─ /LOAD=1 (don't load)
└─ Normal increment mode

t=100ns: Clock rises
├─ Counters increment
├─ PC = PC + 1
└─ PC becomes 0x0101

Result:
├─ PC = 0x0101 (incremented)
├─ Jump was ignored
└─ Execution continues sequentially

This shows conditional control flow!

Example 4: System Reset

Scenario: Reset button pressed

All current state:
├─ PC = 0x1234 (random value)
├─ Various program running
└─ Need to restart from beginning

Step-by-step:
─────────────

t=0ns: Reset button pressed
├─ RESET signal goes LOW
├─ PC_CLR=0 (active)
└─ Immediately affects counters

t=5ns: Counters clear
├─ /CLR pin goes LOW on all 74HC161s
├─ Asynchronous clear (doesn't wait for clock!)
├─ All flip-flops reset to 0
└─ PC becomes 0x0000

t=10ns: Reset released
├─ RESET signal goes HIGH
├─ PC_CLR=1 (inactive)
└─ System ready

t=100ns: First instruction fetch
├─ PC = 0x0000
├─ Fetches ROM[0]
└─ Program starts from beginning

Timeline:
────────
RESET:  ────╗         ╔════════  
            ╚═════════╝          
             ◄──10ns──►          

PC_Out: ──── 0x1234 ─── 0x0000 ──
                  Cleared instantly
                  (no clock needed!)

This is why /CLR is "asynchronous"

Why 74HC161 Instead of 74HC574?

Could we build PC with 74HC574 + adder?

Hypothetical PC with 74HC574:

PC_Out[+1 Adder][74HC574]PC_Out
                         CLK

Problems:
├─ Need separate 16-bit adder (4× 74HC283)
├─ Need multiplexer to select increment vs load
├─ More ICs (8+ instead of 4)
├─ More complex control logic
└─ Slower (more gate delays)

74HC161 advantages:
├─ Built-in counter (no external adder)
├─ Built-in load (no external mux)
├─ Fewer ICs (4 instead of 8+)
├─ Simpler control (just 3 signals)
└─ Faster (optimized internally)

The 74HC161 is PERFECT for PC!

Register Timing and Synchronization

The Clock Signal (Master Timekeeper)

All registers operate from the same clock:

         555 Timer (Clock Generator)
         [Buffer: 74HC04]
        ┌─────┴─────┬──────────┬────────┐
        │           │          │        │
        ▼           ▼          ▼        ▼
     LOAD_A      LOAD_D    PC_CLK   (ALU)
        │           │          │
        ▼           ▼          ▼
    A Register  D Register  PC Counter

All synchronized to same clock!

Why synchronization matters:

Bad (no clock):
───────────────
Data changes randomly → chaos!
Register captures wrong data
ALU sees partial results
Program jumps to random addresses

Good (with clock):
──────────────────
Everything changes at same instant
Data stable before capture
Predictable behavior
Deterministic operation

Critical Timing Paths

Path 1: ALURegisterALU (feedback loop)
────────────────────────────────────────────

Longest path through system:

  t=0ns:  Register outputs → ALU inputs
  t=10ns:  ALU preprocessing complete
  t=50ns:  ALU carry chain settles
  t=100ns: ALU output stable
  t=110ns: Register setup time met
  t=110ns: CLOCK EDGE (safe to capture)

Minimum clock period: 110ns
Maximum frequency: ~9 MHz


Path 2: MemoryRegister
──────────────────────────
ROM[PC]Instruction RegisterControl signals

  t=0ns:  PC outputs address
  t=50ns: ROM access time
  t=80ns: Data valid
  t=100ns: Setup time met
  t=110ns: CLOCK EDGE

Memory is slower than ALU!
Usually limits clock speed


Path 3: RegisterMemory
──────────────────────────
A RegisterRAM address → Data read

  t=0ns:  A outputs address
  t=10ns: Address decode
  t=60ns: RAM access time
  t=70ns: Data valid at output

Can complete in one cycle if slow enough

Setup and Hold Times (Critical!)

Every register has timing requirements:

         ┌── Setup Time ──┐ ┌─ Hold Time ─┐
         │                 │ │             │
Data: ───┴─────────────────┴─┴─────────────┴───
         │                 │ │             │
Must be stable │ │Must not     │
         │  before edge    │ │change yet   │
         │                 │ │             │
CLK:  ───────────╗          ╚═══╗          ────
                 ╚═════════════╝
                   Clock edge

74HC574 Timing (typical):
─────────────────────────
Setup time:    12ns
Hold time:     3ns
Clock-to-Q:    20ns

74HC161 Timing (typical):
─────────────────────────
Setup time:    15ns
Hold time:     5ns
Clock-to-Q:    25ns

If violated → data corruption!

Real-world example of timing violation:

Bad Design:
───────────
Data changes: ───────╗   ╔════
                     ╚═══╝
                     │   │
5ns (too short!)
                     │   │
CLK edge:    ────────────╗
                         ╚═══

Result: Glitch captured! Random 1 or 0


Good Design:
────────────
Data stable: ═══════════════════
                   │      │
                  20ns  5ns
                   │      │
CLK edge:    ──────╗      ╚═══
                   ╚══════════

Result: Clean capture!

The Complete Register System

How They Work Together

Instruction Execution Cycle:

Step 1: FETCH
─────────────
PCROM address
ROMInstruction
InstructionControl Unit

Step 2: DECODE
──────────────
Control UnitSignals
├─ ALU control
├─ Register control
└─ Memory control

Step 3: EXECUTE
───────────────
A, DALU
ALUResult
ResultRegister (if needed)

Step 4: UPDATE PC
─────────────────
If jump: PCA
Else: PCPC + 1

All synchronized by clock!

Example: Complete Instruction

Let's trace D = M + 1 through the hardware:

Assembly: @100     // First, load address
          D=M+1    // Then, this instruction

Focus on: D=M+1

Initial State:
├─ PC = 0x0050 (pointing to this instruction)
├─ A = 0x0064 (100 in hex)
├─ D = 0x0000 (current D value)
└─ RAM[100] = 0x0042 (66 decimal)


Clock Cycle 1: Fetch & Decode
──────────────────────────────
t=0ns: PC outputs 0x0050
       ROM address = 0x0050

t=50ns: ROM outputs instruction
        Instruction = 1111110111010000
        (binary for "D=M+1")

t=60ns: Control unit decodes
        ├─ ALU_CTRL = 011111 (increment)
        ├─ A_OR_M = 1 (use M, not A)
        ├─ LOAD_D = 1 (will store to D)
        └─ JUMP = 0 (no jump)


Clock Cycle 2: Execute
──────────────────────
t=0ns: A register outputs address
       A_Out = 0x0064 (100)
       RAM address = 100

t=50ns: RAM outputs data
        RAM[100] = 0x0042 (66)
        M data → ALU Y input

t=60ns: ALU computes M+1
        Y = 0x0042
        Control = increment Y
        Result = 0x0043 (67)

t=100ns: ALU output stable
         ALU_Out = 0x0043

t=110ns: Clock edge!
         LOAD_D pulses
         D0x0043

t=120ns: D register updated
         D_Out = 0x0043

Clock Cycle 3: Update PC
────────────────────────
t=0ns: JUMP=0, so increment

t=110ns: Clock edge
         PC increments
         PC = 0x0051

Result:
├─ D = 0x0043 (67 = 66 + 1)├─ PC = 0x0051 (next instruction)└─ Took 3 clock cycles total

Register Interaction Patterns

Pattern 1: Register to Register
────────────────────────────────
D = A

Flow:
A_OutData_BusD_In
LOAD_D pulse
Done in 1 cycle!


Pattern 2: ALU Feedback
───────────────────────
D = D + 1

Flow:
D_OutALU Y input
ALU computes D + 1
ALU_OutD_In
LOAD_D pulse
D updated in 1 cycle


Pattern 3: Memory Access
────────────────────────
D = M

Flow:
A_OutRAM address
RAMData
DataD_In
LOAD_D pulse
Need 2-3 cycles (memory slower)


Pattern 4: Complex Calculation
───────────────────────────────
D = (A + D) & M

Flow:
Cycle 1: Compute A + D → temp
Cycle 2: Load temp to D
Cycle 3: Compute D & M → result
Cycle 4: Load result to D

Multi-cycle operation!

Design Decisions Explained

Why These Specific ICs?

74HC574 for A and D:

Pros:
Simple edge-triggered operation
Separate input/output pins
3-state capability (future expansion)
Very common and cheap
Well understood

Cons:
No built-in increment (but don't need it)
Requires external adder for arithmetic

Verdict: Perfect for simple storage registers

74HC161 for PC:

Pros:
Built-in counter (auto-increment)
Synchronous operation (no glitches)
Parallel load capability (for jumps)
Clear function (for reset)
Carry chain support (for multi-byte)

Cons:
Cannot use for general storage
More complex than 74HC574

Verdict: Perfect for program counter

Alternative Designs Considered

Option 1: Use SRAM for all registers
─────────────────────────────────────
Pros: Simpler (just address decode)
Cons: Slower, needs addressing logic
Verdict: Overkill for 3 registers


Option 2: Use shift registers
─────────────────────────────
Pros: Serial data loading
Cons: Too slow (16 clocks to load)
Verdict: Wrong tool for the job


Option 3: Use larger counters (74HC193)
───────────────────────────────────────
Pros: Up/down counting
Cons: Don't need down counting
Verdict: Unnecessary complexity


Option 4: Use 74HC377 (with enable)
───────────────────────────────────
Pros: Extra enable control
Cons: Same function as 74HC574 + /OE
Verdict: No advantage


Final Choice: 74HC574 + 74HC161
────────────────────────────────
├─ Minimum IC count
├─ Standard, available parts
├─ Well-documented behavior
├─ Easy to understand
└─ Perfect match for requirements

Power Consumption

Per IC power consumption (typical):

74HC574 (idle):    ~10µA
74HC574 (active):  ~5mA at 1MHz

74HC161 (idle):    ~10µA
74HC161 (active):  ~6mA at 1MHz

Register module total:
├─ 4× 74HC574 = 20mA
├─ 4× 74HC161 = 24mA
└─ Total: ~44mA at 1MHz

Very low power!
Can run from batteries
Much better than old 74LS series

Testing Registers

Testing A and D Registers

TEST 1: Basic Storage
─────────────────────
1. Apply data: 0xAAAA to Data_In
2. Pulse LOAD_A
3. Check: A_Out should be 0xAAAA
4. Change Data_In to 0x5555
5. Don't pulse LOAD_A
6. Check: A_Out still 0xAAAA (retained)

Pass criteria: Data retained until next load


TEST 2: All Bits
────────────────
Test patterns:
├─ 0x0000 (all zeros)
├─ 0xFFFF (all ones)
├─ 0xAAAA (alternating: 1010...)
├─ 0x5555 (alternating: 0101...)
├─ Walking 1s: 0x0001, 0x0002, 0x0004...
└─ Walking 0s: 0xFFFE, 0xFFFD, 0xFFFB...

Verifies: All flip-flops work


TEST 3: Timing
──────────────
1. Set Data_In = 0x1234
2. Pulse LOAD_A (20ns wide)
3. Measure Clock-to-Q delay
4. Should be < 25ns

Verifies: IC speed within spec


TEST 4: Multiple Registers
───────────────────────────
1. Load A = 0xABCD
2. Load D = 0x1234
3. Verify both retained independently
4. No crosstalk between registers

Verifies: Isolation between registers

Testing Program Counter

TEST 1: Increment
─────────────────
1. Clear PC (PC = 0)
2. Set PC_INC = HIGH
3. Apply 10 clock pulses
4. Check: PC should be 10

Pass: Counts correctly


TEST 2: Parallel Load
─────────────────────
1. Set A = 0x1234
2. Assert /LOAD (LOW)
3. Clock pulse
4. Check: PC = 0x1234

Pass: Load function works


TEST 3: Carry Chain
───────────────────
1. Set PC = 0x000F (15)
2. Increment once
3. Check: PC = 0x0010 (16)
   (Verify bit 4 changes, not just bit 0)

Pass: Carry propagates correctly


TEST 4: Maximum Count
─────────────────────
1. Set PC = 0x7FFF (32767)
2. Increment once
3. Check: PC = 0x0000 (wraps)

Pass: Overflow handling correct


TEST 5: Reset
─────────────
1. Set PC = random value
2. Assert /CLR
3. Check: PC = 0x0000 immediately
   (no clock needed)

Pass: Asynchronous clear works

Common Problems and Solutions

Problem 1: Register always outputs 0
────────────────────────────────────
Likely causes:
├─ No power to IC
├─ /OE pin not grounded (hi-Z mode)
├─ Clock not reaching IC
└─ Damaged IC

Tests:
├─ Check VCC at pin 16/20
├─ Check GND at pin 8/10
├─ Verify /OE = 0V
└─ Replace IC if needed


Problem 2: Register doesn't update
───────────────────────────────────
Likely causes:
├─ No clock signal
├─ Clock pulse too short
├─ Setup time violated
└─ Timing issue

Tests:
├─ Probe CLK pin with scope
├─ Verify pulse width > 15ns
├─ Check data stable before clock
└─ Slow down clock if needed


Problem 3: Random values stored
────────────────────────────────
Likely causes:
├─ Missing decoupling cap
├─ Setup/hold time violation
├─ Noise on data bus
└─ Clock edge too fast

Solutions:
├─ Add 0.1µF cap next to IC
├─ Add buffer if loading heavy
├─ Slow down clock transitions
└─ Check PCB routing


Problem 4: PC counts wrong
──────────────────────────
Likely causes:
├─ Carry chain broken
├─ ENP/ENT not connected
├─ RCO not reaching next stage
└─ One IC not counting

Tests:
├─ Check each counter individually
├─ Verify RCO connections
├─ Probe each stage with scope
└─ Replace bad counter IC


Problem 5: PC doesn't jump
──────────────────────────
Likely causes:
├─ /LOAD not reaching ICs
├─ Parallel load data wrong
├─ /LOAD pulse too short
└─ Control logic error

Tests:
├─ Probe /LOAD signal
├─ Check A register value
├─ Verify timing of /LOAD
└─ Check control circuit

Understanding Through Analogy

The Register as a Mailbox

A and D Registers = Post Office Boxes
──────────────────────────────────────
├─ Box #A and Box #D
├─ You can PUT mail in (write)
├─ You can GET mail out (read)
├─ Mail STAYS until replaced
└─ Need key to change (clock signal)

Program Counter = Street Address
─────────────────────────────────
├─ Walking down street (incrementing)
├─ Each building is instruction (ROM)
├─ Sometimes skip buildings (jump)
├─ Automated walk (counter)
└─ Can teleport (parallel load)


The Clock = Postal Worker's Route
──────────────────────────────────
├─ Comes at same time each day
├─ Everything synchronized to schedule
├─ Reliable, predictable timing
└─ System works because of routine

The Complete Picture

         ┌───────────────────────────┐
Instruction Memory                 (ROM)         └──────────┬────────────────┘
Instruction
         ┌──────────────────────────┐
Control Unit           (Decodes instruction)         └─────┬────────────────────┘
    ┌──────────┼──────────┬─────────┐
    │          │          │         │
    ▼          ▼          ▼         ▼
┌────────┐ ┌────────┐ ┌───────┐ ┌──────┐
A    │ │   D    │ │  PC   │ │ ALUReg   │ │  Reg   │ │Counter│ │      │
└────┬───┘ └────┬───┘ └───┬───┘ └──┬───┘
     │          │         │        │
     └──────────┴─────────┴────────┘
           Data flows between all

Summary: The Elegant Register Design

Key Insights:

  1. Registers are memory - but fast, local, and simple

  2. Two types needed:

    • Storage (A, D) - just remember
    • Counter (PC) - remember AND increment
  3. Perfect IC match:

    • 74HC574: optimal for storage
    • 74HC161: optimal for counting
  4. Synchronization critical:

    • All on same clock
    • Predictable timing
    • No race conditions
  5. Simple yet complete:

    • Only 3 registers
    • Enough for full computer
    • Turing-complete!

The register design shows that simplicity works. You don't need dozens of registers like modern CPUs. Three carefully designed registers, with the right hardware support, are enough to build a working computer!


Would you like me to explain:

  1. How registers connect to the ALU in detail?
  2. The complete fetch-execute cycle with timing diagrams?
  3. How to expand to more registers if desired?
  4. The memory interface in the same level of detail?