The Hack ALU Explained: A Deep Dive into the Design

Let me explain the ALU from the ground up, building your understanding step by step.

What is an ALU? (The Big Picture)

Think of an ALU (Arithmetic Logic Unit) as the calculator brain of a computer. Just like you can take two numbers and add them, subtract them, or compare them, the ALU does the same thing - but with binary numbers (0s and 1s) and at incredible speed.

The Basic Idea

You give the ALU:
├─ Two inputs (X and Y) - the numbers to work with
├─ Control signals - instructions for what to do
└─ You get back:
   ├─ A result (the answer)
   └─ Status flags (information about the answer)

Why Do We Need It?

Every computation in a computer goes through the ALU. When you:

Add two numbers in a spreadsheet → ALU
Compare if a password matches → ALU
Calculate graphics in a game → ALU (millions of times per second)

The Hack ALU specifically does 18 different operations using just 6 control signals. This is elegant design!

The Hack ALU Specification

The 6 Control Signals

The Hack ALU uses 6 bits to control what operation it performs:

zx (zero the x)    - Should we ignore X and use 0 instead?
nx (negate the x)  - Should we flip all bits in X?
zy (zero the y)    - Should we ignore Y and use 0 instead?
ny (negate the y)  - Should we flip all bits in Y?
f  (function)      - Should we ADD or AND?
no (negate output) - Should we flip all bits in the result?

These 6 signals combine to create 18 useful operations!

The 18 Operations

Here's what the Hack ALU can compute:

Constants:
  0        → Output zero (useful more than you'd think!)
  1        → Output one
  -1       → Output negative one (all bits set: 0xFFFF)

Single Input Operations:
  X        → Just pass X through
  Y        → Just pass Y through
  !X       → Bitwise NOT of X (flip all bits)
  !Y       → Bitwise NOT of Y
  -X       → Arithmetic negation of X
  -Y       → Arithmetic negation of Y

Increment/Decrement:
  X+1      → Increment X
  Y+1      → Increment Y
  X-1      → Decrement X
  Y-1      → Decrement Y

Two Input Operations:
  X+Y      → Add X and Y
  X-Y      → Subtract Y from X
  Y-X      → Subtract X from Y
  X&Y      → Bitwise AND
  X|Y      → Bitwise OR

The 2 Status Flags

zr (zero)     → Is the result exactly zero?
ng (negative) → Is the result negative? (MSB = 1)

These flags are critical for making decisions. For example:

"If the result is zero, jump to this address" → Used for loops
"If the result is negative, do something else" → Used for comparisons

The Design Strategy: Divide and Conquer

The genius of the Hack ALU is that it breaks the problem into 4 stages:

Stage 1: Preprocessing X
   ↓
Stage 2: Preprocessing Y
   ↓
Stage 3: Choose ADD or AND
   ↓
Stage 4: Postprocessing (and flags)

Let me explain each stage in detail.

Stage 1 & 2: Input Preprocessing (The "Conditioning")

The Problem

We need to manipulate inputs before doing the main operation. For example:

To compute -X, we need to NOT X and then add 1
To compute 1, we need to set X=0, Y=0, and add 1
To compute X+1, we need to add X + 1

The Solution: Two Steps Per Input

For each input (X and Y), we have two preprocessing steps:

Step 1A: Zero Selection (Using 74HC157 Multiplexer)

Question: "Should we use the actual input, or force it to zero?"

Hardware: 74HC157 (Quad 2:1 Multiplexer)

Think of a multiplexer like a railroad switch:

           Input Signal ──┐
                         │
                    ┌────▼────┐
    zx=0 (use X) ──►│ Switch  │
    zx=1 (use 0) ──►│         │──► Output
                    └─────────┘
           0000 ─────┘

When zx=0: Output = X (pass through)
When zx=1: Output = 0 (forced to zero)

Why a 74HC157?

It's a 4-bit multiplexer (one IC handles 4 bits)
We need 4 of them for 16 bits
Very fast switching (~10ns)
Common, cheap, readily available

Example:

X input = 0b1010 (decimal 10)
zx = 0 → Output: 0b1010 (X passes through)
zx = 1 → Output: 0b0000 (forced to zero)

Step 1B: Negation (Using 74HC86 XOR Gate)

Question: "Should we flip all the bits?"

Hardware: 74HC86 (Quad XOR Gate)

XOR is perfect for conditional inversion:

Data Bit ──┐
          │
      ┌───▼───┐
nx ──►│  XOR  │──► Output
      └───────┘

Truth table:
Data  nx  Output
  0    0    0     (no change)
  1    0    1     (no change)
  0    1    1     (inverted!)
  1    1    0     (inverted!)

When nx=0: Output = Input (pass through)
When nx=1: Output = NOT Input (all bits flipped)

Why XOR?

XOR with 0 = unchanged
XOR with 1 = inverted
Perfect for conditional bit flipping!

Example:

Input after zeroing = 0b1010
nx = 0 → Output: 0b1010 (unchanged)
nx = 1 → Output: 0b0101 (inverted)

Chaining the Two Steps

X[3:0] ──► [74HC157] ──► [74HC86] ──► X_PROC[3:0]
             Zero?           Negate?      (processed X)
             (controlled     (controlled
              by zx)          by nx)

This happens in parallel for all 16 bits:

4× 74HC157 for zeroing (one IC per 4 bits)
4× 74HC86 for negation (one IC per 4 bits)

Why This Design?

This two-stage approach gives us 4 possible preprocessing states:

zx  nx  Result
─────────────────────────
0   0   X         (original value)
0   1   !X        (bitwise NOT)
1   0   0         (zero)
1   1   -1        (all bits set)

The same preprocessing happens for Y (using zy and ny).

Stage 3: The Core Function (ADD or AND)

Now we have preprocessed X and Y. Time for the main operation!

The Two Core Operations

The ALU can do exactly two operations at this stage:

Addition (arithmetic)
AND (logical bitwise)

Why Just These Two?

These are the primitives! With clever preprocessing and postprocessing:

ADD gives us: addition, subtraction, increment, decrement, negation
AND gives us: bitwise AND, bitwise OR (via De Morgan's law)

Addition Circuit (Using 74HC283 Adders)

The 74HC283: A 4-bit Adder

How it works:

    A3 A2 A1 A0    (4-bit number A)
  + B3 B2 B1 B0    (4-bit number B)
  ─────────────
C4  S3 S2 S1 S0    (4-bit sum + carry out)

Example:

A = 0b0101 (5)
B = 0b0011 (3)
────────────
S = 0b1000 (8)
C = 0      (no carry)

Building a 16-bit Adder (The Carry Chain)

The Critical Design Decision:

To add 16-bit numbers, we need to chain four 4-bit adders:

     X[3:0]   Y[3:0]      X[7:4]   Y[7:4]      X[11:8]  Y[11:8]     X[15:12] Y[15:12]
        │       │             │       │              │       │            │       │
     ┌──▼───────▼──┐       ┌──▼───────▼──┐       ┌──▼───────▼──┐      ┌──▼───────▼──┐
     │  74HC283    │       │  74HC283    │       │  74HC283    │      │  74HC283    │
     │  Slice 0    │       │  Slice 1    │       │  Slice 2    │      │  Slice 3    │
     │             │       │             │       │             │      │             │
GND─►│C0        C4 │──────►│C0        C4 │──────►│C0        C4 │─────►│C0        C4 │──► (overflow)
     │             │       │             │       │             │      │             │
     └──┬──────────┘       └──┬──────────┘       └──┬──────────┘      └──┬──────────┘
        │                     │                     │                    │
    SUM[3:0]              SUM[7:4]              SUM[11:8]            SUM[15:12]

The Carry Chain is the Critical Path:

The carry must ripple through all four adders:

Slice 0 computes bits [3:0] and generates carry C4
Slice 1 waits for C4, then computes bits [7:4]
Slice 2 waits for the carry from Slice 1
Slice 3 waits for the carry from Slice 2

This is why carry chain routing is critical on the PCB!

Timing:

Each 74HC283: ~20ns delay
Total 16-bit addition: ~80ns (4 × 20ns)
If traces are long or have vias, add more delay

Example of 16-bit Addition:

  X = 0x1234 (4660 decimal)
+ Y = 0x5678 (22136 decimal)
─────────────────────────────
Sum = 0x68AC (26796 decimal)

Slice 0: 0x4 + 0x8 = 0xC, carry=0
Slice 1: 0x3 + 0x7 = 0xA, carry=0
Slice 2: 0x2 + 0x6 = 0x8, carry=0
Slice 3: 0x1 + 0x5 = 0x6, carry=0

AND Circuit (Using 74HC08 AND Gates)

Much simpler than addition!

The 74HC08: Quad 2-input AND gate

For each bit:
X[i] ──┐
       ├─── AND ──► Output[i]
Y[i] ──┘

Truth table:
X  Y  Output
0  0    0
0  1    0
1  0    0
1  1    1

All 16 bits operate independently (no carry!)

We need:

4× 74HC08 (each handles 4 bits)
No chaining needed
Very fast: ~10ns per IC

Example:

X = 0b11110000
Y = 0b10101010
─────────────
& = 0b10100000

Function Selection (Using 74HC157 Multiplexer Again)

Now we have both results:

SUM[15:0] from the adders
AND[15:0] from the AND gates

We need to choose which one to output:

For each bit position:

    SUM[i] ──┐
             │
        ┌────▼────┐
   f ──►│  74HC157│──► F_OUT[i]
        │   MUX   │
    AND[i] ──┘
        └─────────┘

When f=0: Output = AND[i] (logical operation)
When f=1: Output = SUM[i] (arithmetic operation)

We need:

4× 74HC157 (one per 4 bits)
All share the same f control signal

Why this works:

Both ADD and AND circuits run simultaneously
We just select which result to use
No time wasted waiting

Stage 4: Output Processing and Flags

Output Negation (Using 74HC86 XOR Again)

Same technique as input negation:

For each bit:

F_OUT[i] ──┐
          │
      ┌───▼───┐
no ──►│  XOR  │──► ALU_OUT[i]
      └───────┘

When no=0: Output = F_OUT (unchanged)
When no=1: Output = NOT F_OUT (inverted)

Why negate the output?

This is the magic that gives us subtraction and OR!

Subtraction using negation:

X - Y = X + (-Y)
     = X + (NOT Y + 1)

We do this by:
1. Set ny=1 (negate Y)
2. Set f=1 (addition)
3. The "phantom +1" comes from the carry chain!

Wait, where does the +1 come from? Two's complement math!

In two's complement:

To negate a number: NOT it, then add 1
BUT, if we're adding, the adder naturally adds the carry-in!

Actually, for proper subtraction in Hack ALU, they use a clever trick:

X - Y is computed as:
X + (!Y + 1)

Steps:
1. zy=0, ny=1 → Y becomes !Y
2. f=1 → ADD
3. The +1 comes from... wait, how?

Actually, look at operation 15 (X-Y):
Control: 010011
zx=0, nx=1, zy=0, ny=0, f=1, no=1

Let me decode this:
- X stays as X (zx=0, nx=1... wait, nx=1 means NOT X!)
- Actually X-Y encoding is: zx=0, nx=1, zy=0, ny=0, f=1, no=1
  
Let me recalculate:
X preprocessing: zx=0, nx=1 → !X
Y preprocessing: zy=0, ny=0 → Y  
Function: f=1 → ADD → !X + Y
Output: no=1 → !(!X + Y) = X + !Y = X - Y - 1... hmm

Actually, I need to think about this more carefully. Let me look at the actual operation encoding:

Let me reconsider the operation encoding more carefully:

For X-Y (operation 15):
Encoding: 010011
zx=0, nx=1, zy=0, ny=0, f=1, no=1

Step by step:
1. X → (zx=0, nx=1) → !X
2. Y → (zy=0, ny=0) → Y
3. ADD: !X + Y
4. Negate output (no=1): !(!X + Y)

Using De Morgan: !(!X + Y) = !!X & !Y = X & !Y... 

Hmm, that doesn't give us X-Y either.

Let me check the actual Hack specification...

Actually, let me explain this correctly based on the actual Hack design:

The Truth About Hack ALU Operations

The operations use binary arithmetic properties cleverly:

For X-Y:

Encoding: 010011
zx=0: Use X
nx=1: X becomes !X (NOT X)
zy=0: Use Y  
ny=0: Y stays Y
f=1:  ADD → !X + Y
no=1: Negate → !(!X + Y)

Math:
!(!X + Y) = X & !Y  (De Morgan's Law)... no, that's wrong

Actually, in two's complement:
!(!X + Y) means bitwise NOT of (!X + Y)
= X - Y - 1... still not right

Let me look at this differently. The actual encoding for X-Y is:

Let me be precise with the actual Hack ALU encoding:

Operation: X - Y
Control bits: 010011

Breaking down:
- zx=0, nx=1 → produces !X (NOT X)
- zy=0, ny=0 → produces Y
- f=1 → ADD: !X + Y
- no=1 → NOT output: !(!X + Y)

The key insight:
In two's complement, -Y = !Y + 1
So: X - Y = X + (-Y) = X + (!Y + 1)

But we computed: !(!X + Y)

Using Boolean algebra:
!(!X + Y) = !!X & !Y (De Morgan)
          = X & !Y

That's still not subtraction...

Actually, I think I'm confusing bitwise NOT with arithmetic negation.

Let me reconsider the actual hardware behavior:

You know what, let me explain this more carefully by looking at actual examples:

How X-Y Actually Works

Let's trace through X=5, Y=3 for operation X-Y:

Binary:
X = 0000000000000101 (5)
Y = 0000000000000011 (3)

Control: 010011 (zx=0,nx=1,zy=0,ny=0,f=1,no=1)

Step 1: Preprocess X
zx=0, nx=1 → !X
!X = 1111111111111010

Step 2: Preprocess Y  
zy=0, ny=0 → Y
Y = 0000000000000011

Step 3: Add
!X + Y = 1111111111111010 + 0000000000000011
       = 1111111111111101

Step 4: Negate output (no=1)
!(!X + Y) = !(1111111111111101)
          = 0000000000000010
          = 2 in decimal

And indeed, 5 - 3 = 2! ✓

Why This Works (The Mathematical Proof)

In two's complement (16-bit):

!X = (2^16 - 1) - X     (bitwise NOT)
!X + Y = (2^16 - 1) - X + Y

!(!X + Y) = !((2^16 - 1) - X + Y)
          = (2^16 - 1) - ((2^16 - 1) - X + Y)
          = (2^16 - 1) - (2^16 - 1) + X - Y
          = X - Y

Mathematical magic! ✓

So the output negation (no signal) is what makes subtraction possible!

Zero Flag Generation

Question: "Is the output exactly zero?"

Circuit Design:

All 16 output bits → NOR gates → ZR flag

We use two stages:
1. U33: 8-input NOR for bits [7:0]
2. U34: 8-input NOR for bits [15:8]  
3. U35: 2-input NOR to combine

Logic:
ZR = NOR(all 16 bits)
   = 1 if all bits are 0
   = 0 if any bit is 1

Why this works:

NOR truth:
All inputs 0 → Output 1
Any input 1  → Output 0

So:
ALU_OUT = 0x0000 → ZR = 1 ✓
ALU_OUT = anything else → ZR = 0 ✓

Hardware choice:

74HC4078 (8-input NOR) - perfect for this!
2× 74HC4078 for 16 bits total
1× 74HC02 (2-input NOR) to combine

Negative Flag

Much simpler:

In two's complement:
- Negative numbers have MSB = 1
- Positive numbers have MSB = 0

So:
NG = ALU_OUT[15]    (just the most significant bit!)

No gates needed - direct wire connection!

Putting It All Together: Operation Examples

Let me trace through some complete examples:

Example 1: Computing X+Y

Inputs:
X = 0x0005 (binary: 0000000000000101)
Y = 0x0003 (binary: 0000000000000011)

Control for X+Y: 000010
zx=0, nx=0, zy=0, ny=0, f=1, no=0

┌─────────────────────────────────┐
│ Stage 1: Preprocess X           │
├─────────────────────────────────┤
│ zx=0 → Use X (not zero)         │
│ nx=0 → Don't negate             │
│ Result: X_PROC = 0x0005         │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 2: Preprocess Y           │
├─────────────────────────────────┤
│ zy=0 → Use Y (not zero)         │
│ ny=0 → Don't negate             │
│ Result: Y_PROC = 0x0003         │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 3: Core Function          │
├─────────────────────────────────┤
│ Adder: 0x0005 + 0x0003 = 0x0008│
│ AND:   0x0005 & 0x0003 = 0x0001│
│                                 │
│ f=1 → Select ADD                │
│ Result: F_OUT = 0x0008          │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 4: Postprocess & Flags    │
├─────────────────────────────────┤
│ no=0 → Don't negate output      │
│ Result: ALU_OUT = 0x0008        │
│                                 │
│ Flags:                          │
│ ZR = 0 (result not zero)        │
│ NG = 0 (MSB is 0, positive)     │
└─────────────────────────────────┘

Final Answer: 5 + 3 = 8 ✓

Example 2: Computing -X (Negation)

Input:
X = 0x0005 (5)
Y = (don't care)

Control for -X: 001111
zx=0, nx=0, zy=1, ny=1, f=1, no=1

┌─────────────────────────────────┐
│ Stage 1: Preprocess X           │
├─────────────────────────────────┤
│ zx=0 → Use X                    │
│ nx=0 → Don't negate             │
│ Result: X_PROC = 0x0005         │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 2: Preprocess Y           │
├─────────────────────────────────┤
│ zy=1 → Force to zero            │
│ Y = 0x0000                      │
│ ny=1 → Negate it                │
│ !0x0000 = 0xFFFF                │
│ Result: Y_PROC = 0xFFFF         │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 3: Core Function          │
├─────────────────────────────────┤
│ f=1 → ADD                       │
│ 0x0005 + 0xFFFF = 0x0004        │
│ (with carry overflow)           │
│ Result: F_OUT = 0x0004          │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 4: Postprocess            │
├─────────────────────────────────┤
│ no=1 → Negate output            │
│ !0x0004 = 0xFFFB                │
│ Result: ALU_OUT = 0xFFFB        │
│                                 │
│ In two's complement:            │
│ 0xFFFB = -5 ✓                   │
└─────────────────────────────────┘

Final Answer: -5 (correct negation!)

Example 3: Computing X|Y (OR)

Inputs:
X = 0b1100 (0xC)
Y = 0b1010 (0xA)

Control for X|Y: 010101
zx=0, nx=1, zy=0, ny=1, f=0, no=1

┌─────────────────────────────────┐
│ Stage 1: Preprocess X           │
├─────────────────────────────────┤
│ zx=0 → Use X                    │
│ nx=1 → Negate it                │
│ !0xC = 0x3                      │
│ Result: X_PROC = 0x0003         │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 2: Preprocess Y           │
├─────────────────────────────────┤
│ zy=0 → Use Y                    │
│ ny=1 → Negate it                │
│ !0xA = 0x5                      │
│ Result: Y_PROC = 0x0005         │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 3: Core Function          │
├─────────────────────────────────┤
│ f=0 → AND                       │
│ 0x0003 & 0x0005 = 0x0001        │
│ Result: F_OUT = 0x0001          │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Stage 4: Postprocess            │
├─────────────────────────────────┤
│ no=1 → Negate output            │
│ !0x0001 = 0xFFFE                │
│                                 │
│ Wait, let's check with 4-bit:   │
│ !0x1 = 0xE (1110 in binary)     │
│                                 │
│ Original:                       │
│ X = 1100                        │
│ Y = 1010                        │
│ X|Y should = 1110 = 0xE ✓       │
└─────────────────────────────────┘

This works because of De Morgan's Law:
!((!X) & (!Y)) = X | Y

Design Decisions Explained

Why This Architecture?

1. Simplicity

Only 2 core operations (ADD and AND)
Everything else is preprocessing/postprocessing
Easy to understand and debug

2. Speed

All stages are pipelined
Both ADD and AND compute simultaneously
Just select which result to use

3. Flexibility

18 operations from just 6 control bits
Can add more operations without hardware changes
Just change the control signals!

4. Hardware Efficiency

Uses common, cheap ICs
No custom silicon needed
Can build with through-hole components

Why 74HC Logic?

The 74HC family was chosen for good reasons:

Speed:

Propagation delay: ~10ns per gate
Fast enough for MHz-range computers
Much faster than older 74LS series

Power:

Low power consumption
CMOS technology (microamps when idle)
Can run on batteries

Availability:

Industry standard since 1980s
Available from multiple manufacturers
Easy to source, even in 2024

Voltage:

Runs on 2V-6V (we use 5V)
TTL-compatible
Good noise margins

Component Count Tradeoffs

Our Design:
├─ Input conditioning: 16 ICs
├─ Core function: 12 ICs
├─ Output & flags: 6 ICs
└─ Total: 34 ICs

Alternative designs:
├─ Use PLD/CPLD: 1 IC (but needs programming)
├─ Use MSI ALU chip: 4 ICs (less educational)
├─ Use FPGA: 1 IC (overkill, expensive)

We chose discrete logic because:

Educational value (see every gate)
Easy to debug with logic probe
No programming needed
Matches Nand2Tetris philosophy

Critical Design Aspects

The Carry Chain (Most Important!)

Why it matters:
├─ Longest propagation path through ALU
├─ Determines maximum clock speed
├─ Must be routed with care
└─ Any delay here affects everything

PCB routing requirements:
├─ Straight line (no bends if possible)
├─ Equal length traces (±1mm)
├─ No vias (stay on one layer)
├─ Wide traces (0.5mm minimum)
└─ Keep away from noisy signals

Physical layout:
  U17 ──20mm──► U18 ──20mm──► U19 ──20mm──► U20
  
Perfect alignment = optimal timing

Power Distribution

Why it matters:
├─ 34 ICs switching simultaneously
├─ Current spikes during transitions
├─ Can cause voltage drops
└─ Affects timing and reliability

Solution:
├─ Decoupling cap at EVERY IC (0.1µF)
├─ Bulk caps (100µF) at power entry
├─ Wide power traces (0.8mm)
├─ Ground plane on bottom layer
└─ Power plane zones

Each IC draws ~5mA average
Peak current: 34 × 20mA = 680mA
Use 2A supply for safety margin

Signal Integrity

Control signals fan-out:
├─ zx → 4 ICs (U1, U3, U5, U7)
├─ nx → 4 ICs (U2, U4, U6, U8)
├─ zy → 4 ICs (U9, U11, U13, U15)
├─ ny → 4 ICs (U10, U12, U14, U16)
├─ f  → 4 ICs (U25, U26, U27, U28)
└─ no → 4 ICs (U29, U30, U31, U32)

Solution: Star topology
       Control Signal Input
              │
         ┌────┼────┐
         │    │    │
        IC1  IC2  IC3  IC4

Benefits:
├─ Equal trace lengths
├─ No reflections
├─ Clean signal at each IC
└─ Easy to debug

Understanding Through Analogy

Think of the ALU as a factory assembly line:

Stage 1: Raw Material Prep (X input)
├─ Inspector 1: "Do we need this material?" (zx)
├─ Machine 1: "Should we flip it?" (nx)
└─ Output: Preprocessed X

Stage 2: Raw Material Prep (Y input)  
├─ Inspector 2: "Do we need this material?" (zy)
├─ Machine 2: "Should we flip it?" (ny)
└─ Output: Preprocessed Y

Stage 3: Main Assembly (ADD or AND)
├─ Station A: Addition line (always running)
├─ Station B: AND line (always running)
├─ Selector: "Which result do we want?" (f)
└─ Output: Function result

Stage 4: Quality Control & Shipping
├─ Inverter: "Should we flip the final product?" (no)
├─ Inspector: "Is it zero?" (ZR flag)
├─ Inspector: "Is it negative?" (NG flag)
└─ Output: Final result + status

Every "product" (computation) goes through all stages
Total time: ~100-150ns (incredibly fast!)

Common Misconceptions Clarified

"Why not just use one giant multiplexer?"

Bad idea because:
├─ Would need 2^6 = 64 input paths
├─ Each path = full 16-bit datapath
├─ Enormous IC count (100+ ICs)
├─ Slow (many mux stages)
└─ Inflexible (can't add operations)

Our design:
├─ Only 34 ICs
├─ Fast (pipelined stages)
├─ Flexible (change control signals)
└─ Understandable!

"Why compute both ADD and AND simultaneously?"

Seems wasteful, but:
├─ Only ~25ns "wasted" computation
├─ Saves a whole clock cycle
├─ Faster overall (no waiting)
├─ Both operations needed frequently
└─ ICs are cheap, time is expensive

Alternative (compute sequentially):
├─ Save 8 ICs (U21-U24 AND gates)
├─ Lose 25ns per operation
├─ More complex control logic
└─ Not worth the tradeoff

"Why not use an ALU chip like 74181?"

The 74181 is a classic 4-bit ALU chip
To build 16-bit:
├─ Need 4× 74181 (16-bit)
├─ Need 74182 carry lookahead
├─ More expensive than discrete logic
├─ Less educational value
└─ Different operation encoding

Hack ALU advantages:
├─ Custom operations for Hack computer
├─ Simpler control encoding (6 bits)
├─ Matches Nand2Tetris course
├─ Build from NAND gates (conceptually)
└─ Learn how ALUs work inside

Testing and Debugging Strategy

The Bottom-Up Approach

Level 1: Individual ICs
├─ Test power (VCC, GND)
├─ Test each gate separately
└─ Verify truth tables

Level 2: Functional Blocks  
├─ Test X input conditioning (8 ICs)
├─ Test Y input conditioning (8 ICs)
├─ Test adder chain (4 ICs)
├─ Test AND gates (4 ICs)
└─ Test output stage (6 ICs)

Level 3: Integrated Operations
├─ Test all 18 operations
├─ Verify flags (ZR, NG)
└─ Check timing

Level 4: Stress Testing
├─ Rapid input changes
├─ Boundary conditions (0, -1, MAX)
├─ Random test vectors
└─ Temperature variations

What Can Go Wrong (and How to Fix It)

Problem: Wrong addition results
├─ Likely cause: Broken carry chain
├─ Test: Check each carry connection
├─ Fix: Re-solder U17-U20 connections

Problem: All outputs zero
├─ Likely cause: No power or grounded output
├─ Test: Check VCC at each IC
├─ Fix: Check power distribution

Problem: Intermittent errors
├─ Likely cause: Missing decoupling caps
├─ Test: Check with oscilloscope
├─ Fix: Add 0.1µF caps at ALL ICs

Problem: Wrong flag values
├─ Likely cause: Flag logic wiring
├─ Test: Manually set ALU_OUT, check flags
├─ Fix: Verify U33, U34, U35 connections

Problem: Slow operation
├─ Likely cause: Long carry chain traces
├─ Test: Measure propagation delay
├─ Fix: Re-route PCB with shorter traces

The Big Picture: Why This Design Matters

Educational Value

This ALU teaches you:

Binary arithmetic - how computers actually calculate
Boolean logic - how AND, OR, NOT combine
Hardware design - making tradeoffs between speed, cost, complexity
Digital timing - why trace length matters
System integration - connecting many parts into one whole

Real-World Relevance

Modern CPUs use the same principles:

Preprocessing inputs
Multiple parallel execution units
Result selection
Status flags for conditional operations
Pipelined architecture

Your Hack ALU is a working, real computer component!

Expandability

Want to improve it?

Easy upgrades:
├─ Add more operations (change control encoding)
├─ Add carry lookahead (faster addition)
├─ Add multiplier circuit
├─ Add barrel shifter (for shifts/rotates)
└─ Interface to microcontroller

Advanced version could:
├─ Use 74HC283 with 74HC182 carry lookahead
├─ Add 74HC181 ALU slices
├─ Create 32-bit or 64-bit version
└─ Add floating-point support

Summary: The Elegant Design

The Hack ALU is beautiful because it's:

Simple: Only 2 core operations Clever: Preprocessing/postprocessing creates 18 operations Fast: Parallel computation with result selection Educational: Every gate visible and understandable Practical: Buildable with common, cheap components Functional: Real working computer component

The key insight: You don't need special hardware for every operation. With clever preprocessing (zero/negate) and postprocessing (negate output), just ADD and AND give you everything you need!

This is elegant engineering - maximum functionality from minimum complexity.