Blog Logo

2025-11-21 ~ 31 min read

The Hack Registers Explained A Deep Dive into Memory and Control


The Hack Registers Explained: A Deep Dive into Memory and Control

Let me explain registers from the ground up, building a complete understanding of how the Hack computer stores and manages data.


What is a Register? (The Foundation)

The Basic Concept

Think of a register as a sticky note for the computer:

  • It remembers a number
  • It holds that number until you tell it to change
  • It can give you back that number instantly
  • It’s extremely fast (nanoseconds)
Analogy: Your Calculator's "Memory" Button

When you press [M+] on a calculator:
├─ Calculator STORES current number
├─ Number STAYS there (even if you clear screen)
├─ Press [MR] to RECALL that number
└─ Press [M+] again to REPLACE it

A register does exactly this - but in hardware!

Why Do We Need Registers?

The Problem:

Imagine trying to compute: (5 + 3) × 2

Without registers:
1. Add 5 + 3 = 8
2. Uhh... where did the 8 go?
3. Can't multiply it - we lost it!

With registers:
1. Add 5 + 3 = 8
2. STORE 8 in register
3. Load 8 from register
4. Multiply 8 × 2 = 16 ✓

Registers are the working memory of the CPU. Without them, you can only do one operation and then lose the result!

The Speed Hierarchy

Speed vs Capacity tradeoff:

Registers    │ ████████████ Fastest (< 1ns)    │ Tiny (a few numbers)
             │                                  │
Cache        │ ██████████ Very Fast (1-10ns)   │ Small (KB-MB)
             │                                  │
RAM          │ ████ Fast (50-100ns)            │ Medium (GB)
             │                                  │
SSD          │ ██ Slow (microseconds)          │ Large (TB)
             │                                  │
Hard Drive   │ █ Very Slow (milliseconds)      │ Huge (TB)

Registers are at the TOP of the pyramid!

Why registers are fast:

  • Right next to the ALU (no distance to travel)
  • Direct electrical connection
  • No addressing needed
  • Always powered and ready

The Three Registers in Hack Computer

The Hack computer has exactly three registers:

Overview

┌────────────────────────────────────────────────────┐
│                                                    │
│  A Register (16-bit)                               │
│  ├─ Address register                               │
│  ├─ Also used for data                             │
│  └─ Dual purpose!                                  │
│                                                    │
│  D Register (16-bit)                               │
│  ├─ Data register                                  │
│  ├─ Only stores data                               │
│  └─ ALU input                                      │
│                                                    │
│  PC - Program Counter (15-bit)                     │
│  ├─ Points to next instruction                     │
│  ├─ Auto-increments                                │
│  └─ Controls program flow                          │
│                                                    │
└────────────────────────────────────────────────────┘

Why Only Three Registers?

Design philosophy of Hack computer:

  • Simplicity over speed
  • Easier to understand with fewer registers
  • Matches the Nand2Tetris educational goals
  • Still Turing-complete (can compute anything!)

Comparison:

Architecture    Registers    Philosophy
────────────────────────────────────────────────
Hack            3            Educational simplicity
8080            7            Early microprocessor
x86-64          16           Performance
ARM             31           RISC efficiency
GPU             1000s        Massive parallelism

Now let’s dive deep into each register!


The A Register (Address/Data Register)

The Dual Purpose Design

The A register is special - it has two jobs:

Job 1: Address Register
├─ Holds memory addresses
├─ Points to RAM locations
├─ Used for: @commands in Hack assembly
└─ Example: @100 means "A = 100"

Job 2: Data Register
├─ Holds regular data values
├─ Can be used in ALU operations
├─ Alternative to D register
└─ Example: A = A + 1

Why dual purpose?

  • Saves a register!
  • Many instructions need an address anyway
  • Flexible programming model

Real-World Use Examples

Example 1: Loading a constant
──────────────────────────────
Assembly:  @17
           D=A

What happens:
1. A Register ← 17 (constant loaded)
2. D Register ← A (copy to D)

Result: D = 17


Example 2: Accessing memory
───────────────────────────
Assembly:  @100
           M=D

What happens:
1. A Register ← 100 (address)
2. RAM[100] ← D (store D at address 100)

Result: RAM[100] now contains value from D


Example 3: Jump to address
──────────────────────────
Assembly:  @LOOP
           0;JMP

What happens:
1. A Register ← address of LOOP label
2. PC ← A (program counter jumps to A)

Result: Program continues from LOOP

Hardware Implementation: 74HC574

The A register uses the 74HC574 integrated circuit.

What is a 74HC574?

74HC574: Octal D-Type Flip-Flop with 3-State Output

"Octal" = 8 bits
"D-Type Flip-Flop" = basic memory element
"3-State Output" = can be enabled/disabled

One IC handles 8 bits, so we need two 74HC574s for 16 bits:

  • U1: Handles bits [7:0] (low byte)
  • U2: Handles bits [15:8] (high byte)

Inside the 74HC574

Let me explain what’s inside this chip:

Simplified view of ONE bit (multiply by 8):

         Data In


       ┌────────┐
       │   D    │
  CLK─►│   Q  ──┼──► Data Out
       │        │
       └────────┘
     D Flip-Flop
     (The memory!)

When CLK goes HIGH → LOW (falling edge):
├─ Whatever is on D gets CAPTURED
├─ Stored in the flip-flop
├─ Appears on Q
└─ STAYS there until next clock edge

The data is "latched" - it remembers!

The D Flip-Flop: How Bits Remember

This is fundamental - let me explain carefully:

The Master-Slave D Flip-Flop

        Data In (D)


      ┌──────────┐        ┌──────────┐
      │  MASTER  │        │  SLAVE   │
      │  LATCH   │───────►│  LATCH   │──► Q (Output)
      └──────────┘        └──────────┘
            ▲                   ▲
            │                   │
         CLK (edge triggered)

Operation:
1. Rising edge: Master captures D
2. Falling edge: Slave captures Master
3. Output Q shows saved value
4. Stays there UNTIL next clock cycle

This is how computers REMEMBER!

The transistor-level truth:

A flip-flop is made of cross-coupled gates that form a feedback loop:

Simplified SR Latch (Set-Reset):

    S ─────┐
           NOR ──┬─── Q
       ┌───┘     │
       │         │
       └─────────┤
               NOR ──── !Q
    R ─────┐    │
           └────┘

When S=1, Q=1 (Set)
When R=1, Q=0 (Reset)
When both 0, Q REMEMBERS!

The feedback loop creates the memory effect!

Why this works:

  • The output feeds back to the input
  • Creates a bistable state (stable at 0 or 1)
  • Requires continuous power to maintain
  • This is volatile memory (loses data when power off)

74HC574 Pinout and Connections

         74HC574 (DIP-20 package)
         ┌──────┴──────┐
    /OE  │1          20│ VCC (+5V)
    D0   │2          19│ D7   ◄── Data inputs
    D1   │3          18│ D6
    D2   │4          17│ D5
    D3   │5          16│ D4
    Q0   │6          15│ Q7   ◄── Data outputs
    Q1   │7          14│ Q6
    Q2   │8          13│ Q5
    Q3   │9          12│ Q4
    GND  │10         11│ CLK  ◄── Clock input
         └─────────────┘

Pin Functions:
─────────────
D0-D7:  Data inputs (what you want to store)
Q0-Q7:  Data outputs (what's currently stored)
CLK:    Clock - captures data on HIGH→LOW edge
/OE:    Output Enable (active LOW)
        └─ When LOW: outputs enabled
        └─ When HIGH: outputs go hi-Z (disconnected)
VCC:    Power (+5V)
GND:    Ground (0V)

How the A Register Stores Data

Let’s trace through a complete storage operation:

SCENARIO: Store 0xABCD in A register

Initial state:
├─ A register contains: 0x0000
├─ Data bus has: 0xABCD
└─ LOAD_A signal is about to pulse

Step 1: Setup (before clock edge)
──────────────────────────────────
Data Bus[15:8] = 0xAB ──► U2.D7-D0
Data Bus[7:0]  = 0xCD ──► U1.D7-D0

      0xAB                    0xCD
        │                       │
     ┌──▼──────────┐       ┌───▼─────────┐
     │   U2 (Hi)   │       │   U1 (Lo)   │
     │  74HC574    │       │  74HC574    │
     │             │       │             │
     │ D7..D0  Q7..Q0│     │ D7..D0  Q7..Q0│
     │ 10101011│     │     │ 11001101│     │
     │ (waiting)│     │     │ (waiting)│     │
     └───────────┘       └─────────────┘

LOAD_A signal: ─────────╗
                        ║  (high, about to fall)


Step 2: Clock edge (falling edge)
──────────────────────────────────
LOAD_A signal: ────────╗
                       ╚═══════  ◄── Falling edge HERE!

The moment the clock falls:
├─ U1 captures 0xCD into its flip-flops
├─ U2 captures 0xAB into its flip-flops
├─ Data is now STORED inside the ICs
└─ Takes only ~5ns!


Step 3: After clock (data retained)
────────────────────────────────────
     ┌────────────┐       ┌─────────────┐
     │   U2 (Hi)  │       │   U1 (Lo)   │
     │  74HC574   │       │  74HC574    │
     │            │       │             │
     │ Stored: AB │       │ Stored: CD  │
     │ Q7..Q0     │       │ Q7..Q0      │
     │ 10101011───┼──►    │ 11001101────┼──►
     └────────────┘       └─────────────┘
            │                     │
            └──────┬──────────────┘

         A_Out[15:0] = 0xABCD ✓

Data bus can now change - doesn't matter!
The A register REMEMBERS 0xABCD


Step 4: Later - reading the value
──────────────────────────────────
No clock needed!

Output is ALWAYS available:
├─ A_Out[15:0] continuously shows 0xABCD
├─ Can be read by ALU any time
├─ Can be used as address any time
├─ Stays there until next LOAD_A pulse
└─ Even if data bus changes!

This is "transparent" output

Timing Diagram (Critical Understanding)

Time:    │←─ 1 clock cycle ─→│←─ 1 clock cycle ─→│
         │                   │                   │

CLOCK:   ───╗     ╔═══╗     ╔═══╗     ╔═══
            ╚═════╝   ╚═════╝   ╚═════╝
                ▲           ▲           ▲
            Falling edge (captures data)

Data_In: ─── 0x1234 ────── 0x5678 ────── 0xABCD ───
                           
LOAD_A:  ──────────╗                   ╔══════════
                   ╚═══════════════════╝
                   │                   │
                   └─ Enabled ─────────┘

A_Out:   ─── 0x0000 ─────── 0x1234 ────── 0x5678 ───
                           ▲               ▲
                           │               │
                     Captured here    Captured here


Critical Timing Parameters (74HC574):
─────────────────────────────────────
Setup time:    12ns  (data must be stable BEFORE clock)
Hold time:     3ns   (data must be stable AFTER clock)
Propagation:   25ns  (CLK to Q output change)
Clock-to-Q:    20ns  (how fast output updates)

If timing violated → DATA CORRUPTION!

Why We Use 74HC574 (Design Choice)

Advantages:

✓ Simple interface (just clock it)
✓ Transparent outputs (always readable)
✓ 3-state outputs (can disconnect from bus)
✓ Edge-triggered (clean data capture)
✓ Fast (25ns max)
✓ Low power (CMOS)
✓ Cheap (~$0.50 per IC)
✓ Available everywhere

Alternatives considered:

Option 1: 74HC374 (similar but different pinout)
├─ Functionally identical
├─ Different pin arrangement
└─ Either works fine

Option 2: 74HC273 (no 3-state)
├─ Cannot disconnect from bus
├─ Less flexible
└─ Not chosen

Option 3: 74HC377 (with enable)
├─ Has separate enable pin
├─ More control, more complex
└─ Overkill for Hack

Option 4: SRAM (6264 or similar)
├─ Overkill (we only need 16 bits!)
├─ Requires addressing logic
├─ Slower
└─ Not practical for registers

The Output Enable Feature

The 74HC574 has a special feature: 3-state output

What is 3-state (tri-state)?

Normal output:
├─ HIGH (1, ~5V)
└─ LOW (0, ~0V)

3-state output adds:
└─ Hi-Z (high impedance, "floating")

Pin 1 (/OE - Output Enable, active LOW):
─────────────────────────────────────────
/OE = 0 → Outputs ENABLED (normal operation)
/OE = 1 → Outputs DISABLED (hi-Z)


Why this matters:

With /OE feature:
┌─────────┐         ┌─────────┐
│  Reg A  │─────┬───│  Reg D  │
│ /OE=1   │     │   │ /OE=0   │
└─────────┘     │   └─────────┘
    (off)       │      (on)

           Shared Bus
           Only D drives it!

Without /OE (collision!):
┌─────────┐         ┌─────────┐
│  Reg A  │─────┬───│  Reg D  │
│ outputs │     │   │ outputs │
└─────────┘     │   └─────────┘
    ║           │       ║
    ╚═══════════╪═══════╝
              FIGHT!
         (both trying to drive bus)
         (can damage chips!)

In the Hack computer:

  • For A and D registers, /OE is tied to GND
  • This means outputs are always enabled
  • This is fine because each register has dedicated output wires
  • No bus sharing for these registers

When you WOULD use /OE:

  • Multiple devices on shared bus
  • Memory banks (only one enabled at a time)
  • I/O port multiplexing

The D Register (Data Register)

Purpose and Design

The D register is the pure data storage register:

Job: Store intermediate calculation results
├─ NOT used for addressing
├─ Only used for data
├─ Primary ALU input
└─ Simplest register

Why separate from A?
├─ A is busy being an address
├─ Need somewhere to save ALU results
├─ Allows more complex calculations
└─ Standard CPU architecture

Hardware Implementation

Exactly the same as A register!

  • 2× 74HC574 chips (U3, U4)
  • Same connections
  • Same operation
  • Different control signal (LOAD_D instead of LOAD_A)
         D Register Schematic

Data_In[7:0] ────────┐

                ┌────▼─────┐
LOAD_D ────────►│ U3       │
                │ 74HC574  │
GND ────────────►│ /OE      │
                │          │──► D_Out[7:0]
                └──────────┘

Data_In[15:8] ───────┐

                ┌────▼─────┐
LOAD_D ────────►│ U4       │
                │ 74HC574  │
GND ────────────►│ /OE      │
                │          │──► D_Out[15:8]
                └──────────┘

Power & Ground omitted for clarity

Usage Patterns

Typical D Register Operations:

Pattern 1: Store ALU result
────────────────────────────
D = D + 1

How it works:
1. D_Out → ALU input
2. ALU computes D + 1
3. ALU_Out → Data_In
4. LOAD_D pulse
5. Result stored in D

Timeline:
    t=0ns:  D_Out = 5
    t=100ns: ALU_Out = 6
    t=110ns: LOAD_D pulses
    t=111ns: D_Out = 6 ✓


Pattern 2: Save calculation
───────────────────────────
D = A + D

1. A_Out → ALU X input
2. D_Out → ALU Y input
3. ALU computes A + D
4. Result → Data_In
5. LOAD_D pulse
6. Sum saved in D


Pattern 3: Load from memory
───────────────────────────
D = M

1. A has address
2. RAM[A] → Data_In
3. LOAD_D pulse
4. D now contains RAM value

Why Not Just Use A for Everything?

Good question! Here’s why we need D:

Scenario: Compute (RAM[5] + RAM[6]) and store in RAM[7]

With A and D (actual design):
────────────────────────────
@5          // A = 5
D = M       // D = RAM[5] (saved!)
@6          // A = 6
D = D + M   // D = RAM[5] + RAM[6]
@7          // A = 7
M = D       // RAM[7] = result ✓

Total: 6 instructions


Without D register (hypothetical):
──────────────────────────────────
@5          // A = 5
A = M       // A = RAM[5]... but wait!
@6          // A = 6, LOST previous value!
// Can't do it! Need somewhere to save RAM[5]

IMPOSSIBLE without D register!

The D register provides “scratch space” for calculations.


The PC Register (Program Counter)

This is the most complex and most interesting register!

What is a Program Counter?

The Program Counter (PC) is a special register that:
├─ Points to the NEXT instruction to execute
├─ Automatically increments after each instruction
├─ Can JUMP to different addresses
└─ Controls program flow

Analogy: Reading a Book
────────────────────────
PC is like your finger pointing at:
├─ Current line you're reading
├─ Moves down one line after reading (auto-increment)
├─ Can jump to a different page (jump instruction)
└─ Bookmarks your place

Why PC is Special

Unlike A and D, the PC must:

  1. Auto-increment (add 1 automatically)
  2. Load new values (for jumps)
  3. Reset to zero (at startup)

This needs different hardware than simple storage!

Hardware Implementation: 74HC161

The PC uses the 74HC161 integrated circuit.

What is a 74HC161?

74HC161: 4-bit Synchronous Binary Counter

"4-bit" = counts 0-15 in one IC
"Synchronous" = all bits change together (not ripple)
"Binary Counter" = increments by 1 each clock
"Parallel Load" = can load any value directly

We need FOUR of these for 15 bits:

  • U6: Bits [3:0]
  • U7: Bits [7:4]
  • U8: Bits [11:8]
  • U9: Bits [14:12] + one unused bit

Why 15 bits, not 16?

  • Hack addressing is 0-32767 (32K)
  • 2^15 = 32768 addresses
  • Don’t need 16th bit
  • Saves hardware!

Inside the 74HC161

Simplified 4-bit counter logic:

Data Inputs (A, B, C, D)
    │  │  │  │
    ▼  ▼  ▼  ▼
┌──────────────────┐
│  4 D Flip-Flops  │ ◄── Storage
│  (Like 74HC574)  │
└────┬──┬──┬──┬────┘
     │  │  │  │
  ┌──▼──▼──▼──▼──┐
  │   Increment   │ ◄── +1 Logic
  │     Logic     │
  └───────────────┘
     │  │  │  │
     ▼  ▼  ▼  ▼
   QA QB QC QD (Outputs)

     ┌──▼──┐
     │ RCO │ ◄── Ripple Carry Out (to next stage)
     └─────┘

Control Signals:
├─ CLK: When to count
├─ ENP, ENT: Enable counting (both must be HIGH)
├─ /LOAD: Load parallel data (active LOW)
└─ /CLR: Clear to zero (active LOW)

The Increment Logic

How does it add 1? Here’s the actual logic:

4-bit Binary Counter Logic:

Current → Next (increment by 1):

Q3 Q2 Q1 Q0  →  Q3 Q2 Q1 Q0
─────────────────────────────
0  0  0  0   →  0  0  0  1   (0→1)
0  0  0  1   →  0  0  1  0   (1→2)
0  0  1  0   →  0  0  1  1   (2→3)
0  0  1  1   →  0  1  0  0   (3→4)
...
1  1  1  1   →  0  0  0  0   (15→0, overflow!)

                      └─ RCO (Ripple Carry Out) = 1

Logic equations:
────────────────
Q0_next = !Q0                    (toggle every time)
Q1_next = Q1 XOR Q0              (toggle when Q0=1)
Q2_next = Q2 XOR (Q1 AND Q0)     (toggle when Q1,Q0=1)
Q3_next = Q3 XOR (Q2 AND Q1 AND Q0)

RCO = Q3 AND Q2 AND Q1 AND Q0    (all bits high?)

This is implemented with:
├─ XOR gates (for toggle)
├─ AND gates (for carry propagation)
└─ D flip-flops (for storage)

74HC161 Pinout and Connections

         74HC161 (DIP-16 package)
         ┌──────┴──────┐
   /CLR  │1          16│ VCC (+5V)
    CLK  │2          15│ RCO (Ripple Carry Out)
     A   │3          14│ QA  ◄── Counter outputs
     B   │4          13│ QB
     C   │5          12│ QC
     D   │6          11│ QD
    ENP  │7          10│ ENT
    GND  │8           9│ /LOAD
         └─────────────┘

Pin Functions:
──────────────
A, B, C, D:  Parallel load inputs (data to load)
QA-QD:       Counter outputs (current count)
CLK:         Clock input (count on rising edge)
ENP, ENT:    Enable inputs (both must be HIGH to count)
/LOAD:       Load enable (active LOW - loads A,B,C,D)
/CLR:        Clear (active LOW - resets to 0000)
RCO:         Ripple Carry Out (goes HIGH when count = 1111)

Chaining Multiple 74HC161s for 15-bit Counter

This is the critical part - connecting four counters:

     U6 [3:0]        U7 [7:4]        U8 [11:8]       U9 [14:12]
     ┌────────┐      ┌────────┐      ┌────────┐      ┌────────┐
A[0]─┤3     14├─PC[0]│3     14├─PC[4]│3     14├─PC[8]│3     14├─PC[12]
A[1]─┤4     13├─PC[1]│4     13├─PC[5]│4     13├─PC[9]│4     13├─PC[13]
A[2]─┤5     12├─PC[2]│5     12├─PC[6]│5     12├─PC[10]│5     12├─PC[14]
A[3]─┤6     11├─PC[3]│6     11├─PC[7]│6     11├─PC[11]│6      11├─(unused)
     │        │      │        │      │        │      │        │
CLK─►│2      15│     │2      15│     │2      15│     │2      15│
     │     RCO├────►│7,10  RCO├────►│7,10  RCO├────►│7,10  RCO├──(unused)
     │        │  │  │        │  │  │        │  │  │        │
     │7,10    │  │  │        │  │  │        │  │  │        │
     └────────┘  │  └────────┘  │  └────────┘  │  └────────┘
                 │              │              │
         Carry chain: enables next stage only when previous overflows

Control signals (shared by all):
────────────────────────────────
CLK    → All counters clock together
/LOAD  → All counters load together (for jumps)
/CLR   → All counters clear together (reset)

ENP, ENT on U6 → Controls counting (PC_INC signal)
ENP, ENT on U7,U8,U9 → Connected to previous RCO

The Carry Chain Explained

How the 15-bit counter works:

Stage 1 (U6, bits 0-3):
────────────────────────
Counts: 0, 1, 2, 3, ..., 14, 15, 0, 1, 2...
RCO is HIGH when count = 15 (1111)

Stage 2 (U7, bits 4-7):
────────────────────────
Only increments when U6.RCO = HIGH
This happens every 16 counts
So U7 counts: 0, 0, 0, ..., 1, 1, 1, ...
(Changes every 16 clock cycles)

Stage 3 (U8, bits 8-11):
────────────────────────
Only increments when U7.RCO = HIGH
This happens every 256 counts
Counts the "pages" of memory

Stage 4 (U9, bits 12-14):
─────────────────────────
Only increments when U8.RCO = HIGH
This happens every 4096 counts
Provides the highest address bits

Maximum count:
──────────────
Binary: 111111111111111 (15 ones)
Decimal: 32767
Hex: 0x7FFF

Then wraps to 0 (overflow)

Visual representation of counting:

Count    │ U9[14:12] │ U8[11:8] │ U7[7:4] │ U6[3:0] │
         │  (4096s)  │  (256s)  │  (16s)  │  (1s)   │
─────────┼───────────┼──────────┼─────────┼─────────┤
0        │    000    │   0000   │  0000   │  0000   │
1        │    000    │   0000   │  0000   │  0001   │
...      │    ...    │   ....   │  ....   │  ....   │
15       │    000    │   0000   │  0000   │  1111   │◄─U6 about to overflow
16       │    000    │   0000   │  0001   │  0000   │◄─U7 increments!
...      │    ...    │   ....   │  ....   │  ....   │
255      │    000    │   0000   │  1111   │  1111   │
256      │    000    │   0001   │  0000   │  0000   │◄─U8 increments!
...      │    ...    │   ....   │  ....   │  ....   │
32767    │    111    │   1111   │  1111   │  1111   │◄─Maximum
32768    │    000    │   0000   │  0000   │  0000   │◄─Wraps around!

PC Control Logic

The PC has three modes of operation:

Mode 1: INCREMENT (normal operation)
─────────────────────────────────────
Condition: No jump instruction
Action: PC = PC + 1
Control: PC_INC = HIGH, /LOAD = HIGH

Mode 2: JUMP (load new address)
────────────────────────────────
Condition: Jump instruction taken
Action: PC = A (load from A register)
Control: PC_INC = LOW, /LOAD = LOW

Mode 3: RESET (startup)
───────────────────────
Condition: System reset
Action: PC = 0
Control: /CLR = LOW

The Control Circuit

We need logic to decide: increment or load?

Using 74HC00 (NAND gates):

Inputs:
├─ JUMP (from control unit)
├─ RESET (from reset button)
└─ CLK (system clock)

Outputs:
├─ PC_INC (enable counting)
├─ PC_LOAD (/LOAD signal)
└─ PC_CLR (/CLR signal)

Logic:
──────
PC_LOAD = JUMP (when jumping, load new address)
PC_INC = !JUMP AND !RESET (count when not jumping/reset)
PC_CLR = RESET (clear on reset)


Circuit with 74HC00:
────────────────────
       ┌────┐
JUMP──►│1  3├──► PC_LOAD (inverted for /LOAD)
JUMP──►│2   │
       └────┘
       74HC00 (NAND gate used as NOT)

       ┌────┐
JUMP──►│4  6├──┐
RESET─►│5   │  │
       └────┘  │

             PC_INC (HIGH when both inputs LOW)

PC Operation Examples

Let me trace through complete operations:

Example 1: Normal Increment (Sequential Execution)

Initial state:
├─ PC = 0x0100 (256)
├─ Next instruction is at address 0x0101
└─ No jump condition

Step-by-step:
─────────────

t=0ns: Fetch instruction from ROM[0x0100]
├─ PC_Out = 0x0100
├─ ROM receives address
└─ Instruction retrieved

t=50ns: Execute instruction (say, D=D+1)
├─ ALU does computation
├─ Result stored in D
└─ No jump (JUMP=0)

t=100ns: Clock rises (increment PC)
├─ JUMP=0, so PC_INC=1
├─ ENP=1, ENT=1 on U6
├─ Counter increments
└─ PC becomes 0x0101

t=110ns: New PC value stable
├─ PC = 0x0101 ✓
├─ Points to next instruction
└─ Ready for next cycle

Timeline:
────────
CLK:    ────╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝

PC_INC: ═══════════════════════════  (HIGH, counting enabled)

PC_Out: ──── 0x0100 ────── 0x0101 ──

                    Incremented here

Example 2: Jump Instruction (Non-Sequential)

Scenario: Execute "0;JMP" (unconditional jump)

Initial state:
├─ PC = 0x0100
├─ A = 0x0200 (jump target)
└─ Instruction says: always jump

Step-by-step:
─────────────

t=0ns: Fetch JMP instruction
├─ ROM[0x0100] contains jump instruction
├─ Control unit decodes it
└─ Determines: JUMP=1

t=50ns: Control signals asserted
├─ JUMP=1 (jump taken)
├─ PC_INC=0 (don't increment)
├─ /LOAD=0 (load mode)
└─ PC_CLR=1 (not clearing)

t=100ns: Clock rises
├─ /LOAD is LOW → parallel load enabled
├─ A[14:0] → Counter inputs
├─ All flip-flops load: PC ← A
└─ PC becomes 0x0200

t=110ns: Jump complete
├─ PC = 0x0200 ✓
├─ Now pointing at jump target
└─ Next instruction fetched from 0x0200

Timeline:
────────
CLK:    ────╗     ╔════╗     ╔════
            ╚═════╝    ╚═════╝

              Jump loads here

JUMP:   ════════════════════════  (HIGH during jump)

/LOAD:  ───────╗    ╔══════════  (LOW to load)
               ╚════╝

A[14:0]: ───── 0x0200 ──────────  (jump target)

PC_Out: ──── 0x0100 ────── 0x0200

                 Jumped! (not incremented)

Example 3: Conditional Jump Not Taken

Scenario: "D;JGT" (jump if D > 0), but D=0

Initial state:
├─ PC = 0x0100
├─ A = 0x0200
├─ D = 0
└─ Instruction: jump if D > 0

Step-by-step:
─────────────

t=0ns: Check condition
├─ ALU computes: is D > 0?
├─ ZR flag = 1 (zero)
├─ Condition FALSE
└─ Jump NOT taken

t=50ns: Control signals
├─ JUMP=0 (don't jump)
├─ PC_INC=1 (increment instead)
├─ /LOAD=1 (don't load)
└─ Normal increment mode

t=100ns: Clock rises
├─ Counters increment
├─ PC = PC + 1
└─ PC becomes 0x0101

Result:
├─ PC = 0x0101 (incremented)
├─ Jump was ignored
└─ Execution continues sequentially

This shows conditional control flow!

Example 4: System Reset

Scenario: Reset button pressed

All current state:
├─ PC = 0x1234 (random value)
├─ Various program running
└─ Need to restart from beginning

Step-by-step:
─────────────

t=0ns: Reset button pressed
├─ RESET signal goes LOW
├─ PC_CLR=0 (active)
└─ Immediately affects counters

t=5ns: Counters clear
├─ /CLR pin goes LOW on all 74HC161s
├─ Asynchronous clear (doesn't wait for clock!)
├─ All flip-flops reset to 0
└─ PC becomes 0x0000

t=10ns: Reset released
├─ RESET signal goes HIGH
├─ PC_CLR=1 (inactive)
└─ System ready

t=100ns: First instruction fetch
├─ PC = 0x0000
├─ Fetches ROM[0]
└─ Program starts from beginning

Timeline:
────────
RESET:  ────╗         ╔════════  
            ╚═════════╝          
             ◄──10ns──►          

PC_Out: ──── 0x1234 ─── 0x0000 ──

                  Cleared instantly
                  (no clock needed!)

This is why /CLR is "asynchronous"

Why 74HC161 Instead of 74HC574?

Could we build PC with 74HC574 + adder?

Hypothetical PC with 74HC574:

PC_Out → [+1 Adder] → [74HC574] → PC_Out

                         CLK

Problems:
├─ Need separate 16-bit adder (4× 74HC283)
├─ Need multiplexer to select increment vs load
├─ More ICs (8+ instead of 4)
├─ More complex control logic
└─ Slower (more gate delays)

74HC161 advantages:
├─ Built-in counter (no external adder)
├─ Built-in load (no external mux)
├─ Fewer ICs (4 instead of 8+)
├─ Simpler control (just 3 signals)
└─ Faster (optimized internally)

The 74HC161 is PERFECT for PC!

Register Timing and Synchronization

The Clock Signal (Master Timekeeper)

All registers operate from the same clock:

         555 Timer (Clock Generator)

         [Buffer: 74HC04]

        ┌─────┴─────┬──────────┬────────┐
        │           │          │        │
        ▼           ▼          ▼        ▼
     LOAD_A      LOAD_D    PC_CLK   (ALU)
        │           │          │
        ▼           ▼          ▼
    A Register  D Register  PC Counter

All synchronized to same clock!

Why synchronization matters:

Bad (no clock):
───────────────
Data changes randomly → chaos!
Register captures wrong data
ALU sees partial results
Program jumps to random addresses

Good (with clock):
──────────────────
Everything changes at same instant
Data stable before capture
Predictable behavior
Deterministic operation

Critical Timing Paths

Path 1: ALU → Register → ALU (feedback loop)
────────────────────────────────────────────

Longest path through system:

  t=0ns:  Register outputs → ALU inputs
  t=10ns:  ALU preprocessing complete
  t=50ns:  ALU carry chain settles
  t=100ns: ALU output stable
  t=110ns: Register setup time met
  t=110ns: CLOCK EDGE (safe to capture)

Minimum clock period: 110ns
Maximum frequency: ~9 MHz


Path 2: Memory → Register
──────────────────────────
ROM[PC] → Instruction Register → Control signals

  t=0ns:  PC outputs address
  t=50ns: ROM access time
  t=80ns: Data valid
  t=100ns: Setup time met
  t=110ns: CLOCK EDGE

Memory is slower than ALU!
Usually limits clock speed


Path 3: Register → Memory
──────────────────────────
A Register → RAM address → Data read

  t=0ns:  A outputs address
  t=10ns: Address decode
  t=60ns: RAM access time
  t=70ns: Data valid at output

Can complete in one cycle if slow enough

Setup and Hold Times (Critical!)

Every register has timing requirements:

         ┌── Setup Time ──┐ ┌─ Hold Time ─┐
         │                 │ │             │
Data: ───┴─────────────────┴─┴─────────────┴───
         │                 │ │             │
         │  Must be stable │ │Must not     │
         │  before edge    │ │change yet   │
         │                 │ │             │
CLK:  ───────────╗          ╚═══╗          ────
                 ╚═════════════╝

                   Clock edge

74HC574 Timing (typical):
─────────────────────────
Setup time:    12ns
Hold time:     3ns
Clock-to-Q:    20ns

74HC161 Timing (typical):
─────────────────────────
Setup time:    15ns
Hold time:     5ns
Clock-to-Q:    25ns

If violated → data corruption!

Real-world example of timing violation:

Bad Design:
───────────
Data changes: ───────╗   ╔════
                     ╚═══╝
                     │   │
                     │ 5ns (too short!)
                     │   │
CLK edge:    ────────────╗
                         ╚═══

Result: Glitch captured! Random 1 or 0


Good Design:
────────────
Data stable: ═══════════════════
                   │      │
                  20ns  5ns
                   │      │
CLK edge:    ──────╗      ╚═══
                   ╚══════════

Result: Clean capture! ✓

The Complete Register System

How They Work Together

Instruction Execution Cycle:

Step 1: FETCH
─────────────
PC → ROM address
ROM → Instruction
Instruction → Control Unit

Step 2: DECODE
──────────────
Control Unit → Signals
├─ ALU control
├─ Register control
└─ Memory control

Step 3: EXECUTE
───────────────
A, D → ALU
ALU → Result
Result → Register (if needed)

Step 4: UPDATE PC
─────────────────
If jump: PC ← A
Else: PC ← PC + 1

All synchronized by clock!

Example: Complete Instruction

Let’s trace D = M + 1 through the hardware:

Assembly: @100     // First, load address
          D=M+1    // Then, this instruction

Focus on: D=M+1

Initial State:
├─ PC = 0x0050 (pointing to this instruction)
├─ A = 0x0064 (100 in hex)
├─ D = 0x0000 (current D value)
└─ RAM[100] = 0x0042 (66 decimal)


Clock Cycle 1: Fetch & Decode
──────────────────────────────
t=0ns: PC outputs 0x0050
       ROM address = 0x0050

t=50ns: ROM outputs instruction
        Instruction = 1111110111010000
        (binary for "D=M+1")

t=60ns: Control unit decodes
        ├─ ALU_CTRL = 011111 (increment)
        ├─ A_OR_M = 1 (use M, not A)
        ├─ LOAD_D = 1 (will store to D)
        └─ JUMP = 0 (no jump)


Clock Cycle 2: Execute
──────────────────────
t=0ns: A register outputs address
       A_Out = 0x0064 (100)
       RAM address = 100

t=50ns: RAM outputs data
        RAM[100] = 0x0042 (66)
        M data → ALU Y input

t=60ns: ALU computes M+1
        Y = 0x0042
        Control = increment Y
        Result = 0x0043 (67)

t=100ns: ALU output stable
         ALU_Out = 0x0043

t=110ns: Clock edge!
         LOAD_D pulses
         D ← 0x0043

t=120ns: D register updated
         D_Out = 0x0043 ✓


Clock Cycle 3: Update PC
────────────────────────
t=0ns: JUMP=0, so increment

t=110ns: Clock edge
         PC increments
         PC = 0x0051

Result:
├─ D = 0x0043 (67 = 66 + 1) ✓
├─ PC = 0x0051 (next instruction) ✓
└─ Took 3 clock cycles total

Register Interaction Patterns

Pattern 1: Register to Register
────────────────────────────────
D = A

Flow:
A_Out → Data_Bus → D_In
LOAD_D pulse
Done in 1 cycle!


Pattern 2: ALU Feedback
───────────────────────
D = D + 1

Flow:
D_Out → ALU Y input
ALU computes D + 1
ALU_Out → D_In
LOAD_D pulse
D updated in 1 cycle


Pattern 3: Memory Access
────────────────────────
D = M

Flow:
A_Out → RAM address
RAM → Data
Data → D_In
LOAD_D pulse
Need 2-3 cycles (memory slower)


Pattern 4: Complex Calculation
───────────────────────────────
D = (A + D) & M

Flow:
Cycle 1: Compute A + D → temp
Cycle 2: Load temp to D
Cycle 3: Compute D & M → result
Cycle 4: Load result to D

Multi-cycle operation!

Design Decisions Explained

Why These Specific ICs?

74HC574 for A and D:

Pros:
✓ Simple edge-triggered operation
✓ Separate input/output pins
✓ 3-state capability (future expansion)
✓ Very common and cheap
✓ Well understood

Cons:
✗ No built-in increment (but don't need it)
✗ Requires external adder for arithmetic

Verdict: Perfect for simple storage registers

74HC161 for PC:

Pros:
✓ Built-in counter (auto-increment)
✓ Synchronous operation (no glitches)
✓ Parallel load capability (for jumps)
✓ Clear function (for reset)
✓ Carry chain support (for multi-byte)

Cons:
✗ Cannot use for general storage
✗ More complex than 74HC574

Verdict: Perfect for program counter

Alternative Designs Considered

Option 1: Use SRAM for all registers
─────────────────────────────────────
Pros: Simpler (just address decode)
Cons: Slower, needs addressing logic
Verdict: Overkill for 3 registers


Option 2: Use shift registers
─────────────────────────────
Pros: Serial data loading
Cons: Too slow (16 clocks to load)
Verdict: Wrong tool for the job


Option 3: Use larger counters (74HC193)
───────────────────────────────────────
Pros: Up/down counting
Cons: Don't need down counting
Verdict: Unnecessary complexity


Option 4: Use 74HC377 (with enable)
───────────────────────────────────
Pros: Extra enable control
Cons: Same function as 74HC574 + /OE
Verdict: No advantage


Final Choice: 74HC574 + 74HC161
────────────────────────────────
├─ Minimum IC count
├─ Standard, available parts
├─ Well-documented behavior
├─ Easy to understand
└─ Perfect match for requirements

Power Consumption

Per IC power consumption (typical):

74HC574 (idle):    ~10µA
74HC574 (active):  ~5mA at 1MHz

74HC161 (idle):    ~10µA
74HC161 (active):  ~6mA at 1MHz

Register module total:
├─ 4× 74HC574 = 20mA
├─ 4× 74HC161 = 24mA
└─ Total: ~44mA at 1MHz

Very low power!
Can run from batteries
Much better than old 74LS series

Testing Registers

Testing A and D Registers

TEST 1: Basic Storage
─────────────────────
1. Apply data: 0xAAAA to Data_In
2. Pulse LOAD_A
3. Check: A_Out should be 0xAAAA
4. Change Data_In to 0x5555
5. Don't pulse LOAD_A
6. Check: A_Out still 0xAAAA (retained)

Pass criteria: Data retained until next load


TEST 2: All Bits
────────────────
Test patterns:
├─ 0x0000 (all zeros)
├─ 0xFFFF (all ones)
├─ 0xAAAA (alternating: 1010...)
├─ 0x5555 (alternating: 0101...)
├─ Walking 1s: 0x0001, 0x0002, 0x0004...
└─ Walking 0s: 0xFFFE, 0xFFFD, 0xFFFB...

Verifies: All flip-flops work


TEST 3: Timing
──────────────
1. Set Data_In = 0x1234
2. Pulse LOAD_A (20ns wide)
3. Measure Clock-to-Q delay
4. Should be < 25ns

Verifies: IC speed within spec


TEST 4: Multiple Registers
───────────────────────────
1. Load A = 0xABCD
2. Load D = 0x1234
3. Verify both retained independently
4. No crosstalk between registers

Verifies: Isolation between registers

Testing Program Counter

TEST 1: Increment
─────────────────
1. Clear PC (PC = 0)
2. Set PC_INC = HIGH
3. Apply 10 clock pulses
4. Check: PC should be 10

Pass: Counts correctly


TEST 2: Parallel Load
─────────────────────
1. Set A = 0x1234
2. Assert /LOAD (LOW)
3. Clock pulse
4. Check: PC = 0x1234

Pass: Load function works


TEST 3: Carry Chain
───────────────────
1. Set PC = 0x000F (15)
2. Increment once
3. Check: PC = 0x0010 (16)
   (Verify bit 4 changes, not just bit 0)

Pass: Carry propagates correctly


TEST 4: Maximum Count
─────────────────────
1. Set PC = 0x7FFF (32767)
2. Increment once
3. Check: PC = 0x0000 (wraps)

Pass: Overflow handling correct


TEST 5: Reset
─────────────
1. Set PC = random value
2. Assert /CLR
3. Check: PC = 0x0000 immediately
   (no clock needed)

Pass: Asynchronous clear works

Common Problems and Solutions

Problem 1: Register always outputs 0
────────────────────────────────────
Likely causes:
├─ No power to IC
├─ /OE pin not grounded (hi-Z mode)
├─ Clock not reaching IC
└─ Damaged IC

Tests:
├─ Check VCC at pin 16/20
├─ Check GND at pin 8/10
├─ Verify /OE = 0V
└─ Replace IC if needed


Problem 2: Register doesn't update
───────────────────────────────────
Likely causes:
├─ No clock signal
├─ Clock pulse too short
├─ Setup time violated
└─ Timing issue

Tests:
├─ Probe CLK pin with scope
├─ Verify pulse width > 15ns
├─ Check data stable before clock
└─ Slow down clock if needed


Problem 3: Random values stored
────────────────────────────────
Likely causes:
├─ Missing decoupling cap
├─ Setup/hold time violation
├─ Noise on data bus
└─ Clock edge too fast

Solutions:
├─ Add 0.1µF cap next to IC
├─ Add buffer if loading heavy
├─ Slow down clock transitions
└─ Check PCB routing


Problem 4: PC counts wrong
──────────────────────────
Likely causes:
├─ Carry chain broken
├─ ENP/ENT not connected
├─ RCO not reaching next stage
└─ One IC not counting

Tests:
├─ Check each counter individually
├─ Verify RCO connections
├─ Probe each stage with scope
└─ Replace bad counter IC


Problem 5: PC doesn't jump
──────────────────────────
Likely causes:
├─ /LOAD not reaching ICs
├─ Parallel load data wrong
├─ /LOAD pulse too short
└─ Control logic error

Tests:
├─ Probe /LOAD signal
├─ Check A register value
├─ Verify timing of /LOAD
└─ Check control circuit

Understanding Through Analogy

The Register as a Mailbox

A and D Registers = Post Office Boxes
──────────────────────────────────────
├─ Box #A and Box #D
├─ You can PUT mail in (write)
├─ You can GET mail out (read)
├─ Mail STAYS until replaced
└─ Need key to change (clock signal)

Program Counter = Street Address
─────────────────────────────────
├─ Walking down street (incrementing)
├─ Each building is instruction (ROM)
├─ Sometimes skip buildings (jump)
├─ Automated walk (counter)
└─ Can teleport (parallel load)


The Clock = Postal Worker's Route
──────────────────────────────────
├─ Comes at same time each day
├─ Everything synchronized to schedule
├─ Reliable, predictable timing
└─ System works because of routine

The Complete Picture

         ┌───────────────────────────┐
         │   Instruction Memory      │
         │        (ROM)              │
         └──────────┬────────────────┘

                    │ Instruction

         ┌──────────────────────────┐
         │   Control Unit           │
         │  (Decodes instruction)   │
         └─────┬────────────────────┘

    ┌──────────┼──────────┬─────────┐
    │          │          │         │
    ▼          ▼          ▼         ▼
┌────────┐ ┌────────┐ ┌───────┐ ┌──────┐
│   A    │ │   D    │ │  PC   │ │ ALU  │
│  Reg   │ │  Reg   │ │Counter│ │      │
└────┬───┘ └────┬───┘ └───┬───┘ └──┬───┘
     │          │         │        │
     └──────────┴─────────┴────────┘
           Data flows between all

Summary: The Elegant Register Design

Key Insights:

  1. Registers are memory - but fast, local, and simple

  2. Two types needed:

    • Storage (A, D) - just remember
    • Counter (PC) - remember AND increment
  3. Perfect IC match:

    • 74HC574: optimal for storage
    • 74HC161: optimal for counting
  4. Synchronization critical:

    • All on same clock
    • Predictable timing
    • No race conditions
  5. Simple yet complete:

    • Only 3 registers
    • Enough for full computer
    • Turing-complete!

The register design shows that simplicity works. You don’t need dozens of registers like modern CPUs. Three carefully designed registers, with the right hardware support, are enough to build a working computer!


Would you like me to explain:

  1. How registers connect to the ALU in detail?
  2. The complete fetch-execute cycle with timing diagrams?
  3. How to expand to more registers if desired?
  4. The memory interface in the same level of detail?

Photo of Yinhuan Yuan

Hi, I'm Yinhuan Yuan. I'm a software engineer based in Toronto. You can read more about me on yuan.fyi.