- Published on
The Hack ALU Explained A Deep Dive into the Design
- Authors

- Name
- Yinhuan Yuan
The Hack ALU Explained: A Deep Dive into the Design
Let me explain the ALU from the ground up, building your understanding step by step.
What is an ALU? (The Big Picture)
Think of an ALU (Arithmetic Logic Unit) as the calculator brain of a computer. Just like you can take two numbers and add them, subtract them, or compare them, the ALU does the same thing - but with binary numbers (0s and 1s) and at incredible speed.
The Basic Idea
You give the ALU:
├─ Two inputs (X and Y) - the numbers to work with
├─ Control signals - instructions for what to do
└─ You get back:
├─ A result (the answer)
└─ Status flags (information about the answer)
Why Do We Need It?
Every computation in a computer goes through the ALU. When you:
- Add two numbers in a spreadsheet → ALU
- Compare if a password matches → ALU
- Calculate graphics in a game → ALU (millions of times per second)
The Hack ALU specifically does 18 different operations using just 6 control signals. This is elegant design!
The Hack ALU Specification
The 6 Control Signals
The Hack ALU uses 6 bits to control what operation it performs:
zx (zero the x) - Should we ignore X and use 0 instead?
nx (negate the x) - Should we flip all bits in X?
zy (zero the y) - Should we ignore Y and use 0 instead?
ny (negate the y) - Should we flip all bits in Y?
f (function) - Should we ADD or AND?
no (negate output) - Should we flip all bits in the result?
These 6 signals combine to create 18 useful operations!
The 18 Operations
Here's what the Hack ALU can compute:
Constants:
0 → Output zero (useful more than you'd think!)
1 → Output one
-1 → Output negative one (all bits set: 0xFFFF)
Single Input Operations:
X → Just pass X through
Y → Just pass Y through
!X → Bitwise NOT of X (flip all bits)
!Y → Bitwise NOT of Y
-X → Arithmetic negation of X
-Y → Arithmetic negation of Y
Increment/Decrement:
X+1 → Increment X
Y+1 → Increment Y
X-1 → Decrement X
Y-1 → Decrement Y
Two Input Operations:
X+Y → Add X and Y
X-Y → Subtract Y from X
Y-X → Subtract X from Y
X&Y → Bitwise AND
X|Y → Bitwise OR
The 2 Status Flags
zr (zero) → Is the result exactly zero?
ng (negative) → Is the result negative? (MSB = 1)
These flags are critical for making decisions. For example:
- "If the result is zero, jump to this address" → Used for loops
- "If the result is negative, do something else" → Used for comparisons
The Design Strategy: Divide and Conquer
The genius of the Hack ALU is that it breaks the problem into 4 stages:
Stage 1: Preprocessing X
↓
Stage 2: Preprocessing Y
↓
Stage 3: Choose ADD or AND
↓
Stage 4: Postprocessing (and flags)
Let me explain each stage in detail.
Stage 1 & 2: Input Preprocessing (The "Conditioning")
The Problem
We need to manipulate inputs before doing the main operation. For example:
- To compute
-X, we need to NOT X and then add 1 - To compute
1, we need to set X=0, Y=0, and add 1 - To compute
X+1, we need to add X + 1
The Solution: Two Steps Per Input
For each input (X and Y), we have two preprocessing steps:
Step 1A: Zero Selection (Using 74HC157 Multiplexer)
Question: "Should we use the actual input, or force it to zero?"
Hardware: 74HC157 (Quad 2:1 Multiplexer)
Think of a multiplexer like a railroad switch:
Input Signal ──┐
│
┌────▼────┐
zx=0 (use X) ──►│ Switch │
zx=1 (use 0) ──►│ │──► Output
└─────────┘
0000 ─────┘
When zx=0: Output = X (pass through)
When zx=1: Output = 0 (forced to zero)
Why a 74HC157?
- It's a 4-bit multiplexer (one IC handles 4 bits)
- We need 4 of them for 16 bits
- Very fast switching (~10ns)
- Common, cheap, readily available
Example:
X input = 0b1010 (decimal 10)
zx = 0 → Output: 0b1010 (X passes through)
zx = 1 → Output: 0b0000 (forced to zero)
Step 1B: Negation (Using 74HC86 XOR Gate)
Question: "Should we flip all the bits?"
Hardware: 74HC86 (Quad XOR Gate)
XOR is perfect for conditional inversion:
Data Bit ──┐
│
┌───▼───┐
nx ──►│ XOR │──► Output
└───────┘
Truth table:
Data nx Output
0 0 0 (no change)
1 0 1 (no change)
0 1 1 (inverted!)
1 1 0 (inverted!)
When nx=0: Output = Input (pass through)
When nx=1: Output = NOT Input (all bits flipped)
Why XOR?
- XOR with 0 = unchanged
- XOR with 1 = inverted
- Perfect for conditional bit flipping!
Example:
Input after zeroing = 0b1010
nx = 0 → Output: 0b1010 (unchanged)
nx = 1 → Output: 0b0101 (inverted)
Chaining the Two Steps
X[3:0] ──► [74HC157] ──► [74HC86] ──► X_PROC[3:0]
Zero? Negate? (processed X)
(controlled (controlled
by zx) by nx)
This happens in parallel for all 16 bits:
- 4× 74HC157 for zeroing (one IC per 4 bits)
- 4× 74HC86 for negation (one IC per 4 bits)
Why This Design?
This two-stage approach gives us 4 possible preprocessing states:
zx nx Result
─────────────────────────
0 0 X (original value)
0 1 !X (bitwise NOT)
1 0 0 (zero)
1 1 -1 (all bits set)
The same preprocessing happens for Y (using zy and ny).
Stage 3: The Core Function (ADD or AND)
Now we have preprocessed X and Y. Time for the main operation!
The Two Core Operations
The ALU can do exactly two operations at this stage:
- Addition (arithmetic)
- AND (logical bitwise)
Why Just These Two?
These are the primitives! With clever preprocessing and postprocessing:
- ADD gives us: addition, subtraction, increment, decrement, negation
- AND gives us: bitwise AND, bitwise OR (via De Morgan's law)
Addition Circuit (Using 74HC283 Adders)
The 74HC283: A 4-bit Adder
How it works:
A3 A2 A1 A0 (4-bit number A)
+ B3 B2 B1 B0 (4-bit number B)
─────────────
C4 S3 S2 S1 S0 (4-bit sum + carry out)
Example:
A = 0b0101 (5)
B = 0b0011 (3)
────────────
S = 0b1000 (8)
C = 0 (no carry)
Building a 16-bit Adder (The Carry Chain)
The Critical Design Decision:
To add 16-bit numbers, we need to chain four 4-bit adders:
X[3:0] Y[3:0] X[7:4] Y[7:4] X[11:8] Y[11:8] X[15:12] Y[15:12]
│ │ │ │ │ │ │ │
┌──▼───────▼──┐ ┌──▼───────▼──┐ ┌──▼───────▼──┐ ┌──▼───────▼──┐
│ 74HC283 │ │ 74HC283 │ │ 74HC283 │ │ 74HC283 │
│ Slice 0 │ │ Slice 1 │ │ Slice 2 │ │ Slice 3 │
│ │ │ │ │ │ │ │
GND─►│C0 C4 │──────►│C0 C4 │──────►│C0 C4 │─────►│C0 C4 │──► (overflow)
│ │ │ │ │ │ │ │
└──┬──────────┘ └──┬──────────┘ └──┬──────────┘ └──┬──────────┘
│ │ │ │
SUM[3:0] SUM[7:4] SUM[11:8] SUM[15:12]
The Carry Chain is the Critical Path:
The carry must ripple through all four adders:
- Slice 0 computes bits [3:0] and generates carry C4
- Slice 1 waits for C4, then computes bits [7:4]
- Slice 2 waits for the carry from Slice 1
- Slice 3 waits for the carry from Slice 2
This is why carry chain routing is critical on the PCB!
Timing:
- Each 74HC283: ~20ns delay
- Total 16-bit addition: ~80ns (4 × 20ns)
- If traces are long or have vias, add more delay
Example of 16-bit Addition:
X = 0x1234 (4660 decimal)
+ Y = 0x5678 (22136 decimal)
─────────────────────────────
Sum = 0x68AC (26796 decimal)
Slice 0: 0x4 + 0x8 = 0xC, carry=0
Slice 1: 0x3 + 0x7 = 0xA, carry=0
Slice 2: 0x2 + 0x6 = 0x8, carry=0
Slice 3: 0x1 + 0x5 = 0x6, carry=0
AND Circuit (Using 74HC08 AND Gates)
Much simpler than addition!
The 74HC08: Quad 2-input AND gate
For each bit:
X[i] ──┐
├─── AND ──► Output[i]
Y[i] ──┘
Truth table:
X Y Output
0 0 0
0 1 0
1 0 0
1 1 1
All 16 bits operate independently (no carry!)
We need:
- 4× 74HC08 (each handles 4 bits)
- No chaining needed
- Very fast: ~10ns per IC
Example:
X = 0b11110000
Y = 0b10101010
─────────────
& = 0b10100000
Function Selection (Using 74HC157 Multiplexer Again)
Now we have both results:
- SUM[15:0] from the adders
- AND[15:0] from the AND gates
We need to choose which one to output:
For each bit position:
SUM[i] ──┐
│
┌────▼────┐
f ──►│ 74HC157│──► F_OUT[i]
│ MUX │
AND[i] ──┘
└─────────┘
When f=0: Output = AND[i] (logical operation)
When f=1: Output = SUM[i] (arithmetic operation)
We need:
- 4× 74HC157 (one per 4 bits)
- All share the same
fcontrol signal
Why this works:
- Both ADD and AND circuits run simultaneously
- We just select which result to use
- No time wasted waiting
Stage 4: Output Processing and Flags
Output Negation (Using 74HC86 XOR Again)
Same technique as input negation:
For each bit:
F_OUT[i] ──┐
│
┌───▼───┐
no ──►│ XOR │──► ALU_OUT[i]
└───────┘
When no=0: Output = F_OUT (unchanged)
When no=1: Output = NOT F_OUT (inverted)
Why negate the output?
This is the magic that gives us subtraction and OR!
Subtraction using negation:
X - Y = X + (-Y)
= X + (NOT Y + 1)
We do this by:
1. Set ny=1 (negate Y)
2. Set f=1 (addition)
3. The "phantom +1" comes from the carry chain!
Wait, where does the +1 come from? Two's complement math!
In two's complement:
- To negate a number: NOT it, then add 1
- BUT, if we're adding, the adder naturally adds the carry-in!
Actually, for proper subtraction in Hack ALU, they use a clever trick:
X - Y is computed as:
X + (!Y + 1)
Steps:
1. zy=0, ny=1 → Y becomes !Y
2. f=1 → ADD
3. The +1 comes from... wait, how?
Actually, look at operation 15 (X-Y):
Control: 010011
zx=0, nx=1, zy=0, ny=0, f=1, no=1
Let me decode this:
- X stays as X (zx=0, nx=1... wait, nx=1 means NOT X!)
- Actually X-Y encoding is: zx=0, nx=1, zy=0, ny=0, f=1, no=1
Let me recalculate:
X preprocessing: zx=0, nx=1 → !X
Y preprocessing: zy=0, ny=0 → Y
Function: f=1 → ADD → !X + Y
Output: no=1 → !(!X + Y) = X + !Y = X - Y - 1... hmm
Actually, I need to think about this more carefully. Let me look at the actual operation encoding:
Let me reconsider the operation encoding more carefully:
For X-Y (operation 15):
Encoding: 010011
zx=0, nx=1, zy=0, ny=0, f=1, no=1
Step by step:
1. X → (zx=0, nx=1) → !X
2. Y → (zy=0, ny=0) → Y
3. ADD: !X + Y
4. Negate output (no=1): !(!X + Y)
Using De Morgan: !(!X + Y) = !!X & !Y = X & !Y...
Hmm, that doesn't give us X-Y either.
Let me check the actual Hack specification...
Actually, let me explain this correctly based on the actual Hack design:
The Truth About Hack ALU Operations
The operations use binary arithmetic properties cleverly:
For X-Y:
Encoding: 010011
zx=0: Use X
nx=1: X becomes !X (NOT X)
zy=0: Use Y
ny=0: Y stays Y
f=1: ADD → !X + Y
no=1: Negate → !(!X + Y)
Math:
!(!X + Y) = X & !Y (De Morgan's Law)... no, that's wrong
Actually, in two's complement:
!(!X + Y) means bitwise NOT of (!X + Y)
= X - Y - 1... still not right
Let me look at this differently. The actual encoding for X-Y is:
Let me be precise with the actual Hack ALU encoding:
Operation: X - Y
Control bits: 010011
Breaking down:
- zx=0, nx=1 → produces !X (NOT X)
- zy=0, ny=0 → produces Y
- f=1 → ADD: !X + Y
- no=1 → NOT output: !(!X + Y)
The key insight:
In two's complement, -Y = !Y + 1
So: X - Y = X + (-Y) = X + (!Y + 1)
But we computed: !(!X + Y)
Using Boolean algebra:
!(!X + Y) = !!X & !Y (De Morgan)
= X & !Y
That's still not subtraction...
Actually, I think I'm confusing bitwise NOT with arithmetic negation.
Let me reconsider the actual hardware behavior:
You know what, let me explain this more carefully by looking at actual examples:
How X-Y Actually Works
Let's trace through X=5, Y=3 for operation X-Y:
Binary:
X = 0000000000000101 (5)
Y = 0000000000000011 (3)
Control: 010011 (zx=0,nx=1,zy=0,ny=0,f=1,no=1)
Step 1: Preprocess X
zx=0, nx=1 → !X
!X = 1111111111111010
Step 2: Preprocess Y
zy=0, ny=0 → Y
Y = 0000000000000011
Step 3: Add
!X + Y = 1111111111111010 + 0000000000000011
= 1111111111111101
Step 4: Negate output (no=1)
!(!X + Y) = !(1111111111111101)
= 0000000000000010
= 2 in decimal
And indeed, 5 - 3 = 2! ✓
Why This Works (The Mathematical Proof)
In two's complement (16-bit):
!X = (2^16 - 1) - X (bitwise NOT)
!X + Y = (2^16 - 1) - X + Y
!(!X + Y) = !((2^16 - 1) - X + Y)
= (2^16 - 1) - ((2^16 - 1) - X + Y)
= (2^16 - 1) - (2^16 - 1) + X - Y
= X - Y
Mathematical magic! ✓
So the output negation (no signal) is what makes subtraction possible!
Zero Flag Generation
Question: "Is the output exactly zero?"
Circuit Design:
All 16 output bits → NOR gates → ZR flag
We use two stages:
1. U33: 8-input NOR for bits [7:0]
2. U34: 8-input NOR for bits [15:8]
3. U35: 2-input NOR to combine
Logic:
ZR = NOR(all 16 bits)
= 1 if all bits are 0
= 0 if any bit is 1
Why this works:
NOR truth:
All inputs 0 → Output 1
Any input 1 → Output 0
So:
ALU_OUT = 0x0000 → ZR = 1 ✓
ALU_OUT = anything else → ZR = 0 ✓
Hardware choice:
- 74HC4078 (8-input NOR) - perfect for this!
- 2× 74HC4078 for 16 bits total
- 1× 74HC02 (2-input NOR) to combine
Negative Flag
Much simpler:
In two's complement:
- Negative numbers have MSB = 1
- Positive numbers have MSB = 0
So:
NG = ALU_OUT[15] (just the most significant bit!)
No gates needed - direct wire connection!
Putting It All Together: Operation Examples
Let me trace through some complete examples:
Example 1: Computing X+Y
Inputs:
X = 0x0005 (binary: 0000000000000101)
Y = 0x0003 (binary: 0000000000000011)
Control for X+Y: 000010
zx=0, nx=0, zy=0, ny=0, f=1, no=0
┌─────────────────────────────────┐
│ Stage 1: Preprocess X │
├─────────────────────────────────┤
│ zx=0 → Use X (not zero) │
│ nx=0 → Don't negate │
│ Result: X_PROC = 0x0005 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 2: Preprocess Y │
├─────────────────────────────────┤
│ zy=0 → Use Y (not zero) │
│ ny=0 → Don't negate │
│ Result: Y_PROC = 0x0003 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 3: Core Function │
├─────────────────────────────────┤
│ Adder: 0x0005 + 0x0003 = 0x0008│
│ AND: 0x0005 & 0x0003 = 0x0001│
│ │
│ f=1 → Select ADD │
│ Result: F_OUT = 0x0008 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 4: Postprocess & Flags │
├─────────────────────────────────┤
│ no=0 → Don't negate output │
│ Result: ALU_OUT = 0x0008 │
│ │
│ Flags: │
│ ZR = 0 (result not zero) │
│ NG = 0 (MSB is 0, positive) │
└─────────────────────────────────┘
Final Answer: 5 + 3 = 8 ✓
Example 2: Computing -X (Negation)
Input:
X = 0x0005 (5)
Y = (don't care)
Control for -X: 001111
zx=0, nx=0, zy=1, ny=1, f=1, no=1
┌─────────────────────────────────┐
│ Stage 1: Preprocess X │
├─────────────────────────────────┤
│ zx=0 → Use X │
│ nx=0 → Don't negate │
│ Result: X_PROC = 0x0005 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 2: Preprocess Y │
├─────────────────────────────────┤
│ zy=1 → Force to zero │
│ Y = 0x0000 │
│ ny=1 → Negate it │
│ !0x0000 = 0xFFFF │
│ Result: Y_PROC = 0xFFFF │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 3: Core Function │
├─────────────────────────────────┤
│ f=1 → ADD │
│ 0x0005 + 0xFFFF = 0x0004 │
│ (with carry overflow) │
│ Result: F_OUT = 0x0004 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 4: Postprocess │
├─────────────────────────────────┤
│ no=1 → Negate output │
│ !0x0004 = 0xFFFB │
│ Result: ALU_OUT = 0xFFFB │
│ │
│ In two's complement: │
│ 0xFFFB = -5 ✓ │
└─────────────────────────────────┘
Final Answer: -5 (correct negation!)
Example 3: Computing X|Y (OR)
Inputs:
X = 0b1100 (0xC)
Y = 0b1010 (0xA)
Control for X|Y: 010101
zx=0, nx=1, zy=0, ny=1, f=0, no=1
┌─────────────────────────────────┐
│ Stage 1: Preprocess X │
├─────────────────────────────────┤
│ zx=0 → Use X │
│ nx=1 → Negate it │
│ !0xC = 0x3 │
│ Result: X_PROC = 0x0003 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 2: Preprocess Y │
├─────────────────────────────────┤
│ zy=0 → Use Y │
│ ny=1 → Negate it │
│ !0xA = 0x5 │
│ Result: Y_PROC = 0x0005 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 3: Core Function │
├─────────────────────────────────┤
│ f=0 → AND │
│ 0x0003 & 0x0005 = 0x0001 │
│ Result: F_OUT = 0x0001 │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Stage 4: Postprocess │
├─────────────────────────────────┤
│ no=1 → Negate output │
│ !0x0001 = 0xFFFE │
│ │
│ Wait, let's check with 4-bit: │
│ !0x1 = 0xE (1110 in binary) │
│ │
│ Original: │
│ X = 1100 │
│ Y = 1010 │
│ X|Y should = 1110 = 0xE ✓ │
└─────────────────────────────────┘
This works because of De Morgan's Law:
!((!X) & (!Y)) = X | Y
Design Decisions Explained
Why This Architecture?
1. Simplicity
- Only 2 core operations (ADD and AND)
- Everything else is preprocessing/postprocessing
- Easy to understand and debug
2. Speed
- All stages are pipelined
- Both ADD and AND compute simultaneously
- Just select which result to use
3. Flexibility
- 18 operations from just 6 control bits
- Can add more operations without hardware changes
- Just change the control signals!
4. Hardware Efficiency
- Uses common, cheap ICs
- No custom silicon needed
- Can build with through-hole components
Why 74HC Logic?
The 74HC family was chosen for good reasons:
Speed:
- Propagation delay: ~10ns per gate
- Fast enough for MHz-range computers
- Much faster than older 74LS series
Power:
- Low power consumption
- CMOS technology (microamps when idle)
- Can run on batteries
Availability:
- Industry standard since 1980s
- Available from multiple manufacturers
- Easy to source, even in 2024
Voltage:
- Runs on 2V-6V (we use 5V)
- TTL-compatible
- Good noise margins
Component Count Tradeoffs
Our Design:
├─ Input conditioning: 16 ICs
├─ Core function: 12 ICs
├─ Output & flags: 6 ICs
└─ Total: 34 ICs
Alternative designs:
├─ Use PLD/CPLD: 1 IC (but needs programming)
├─ Use MSI ALU chip: 4 ICs (less educational)
├─ Use FPGA: 1 IC (overkill, expensive)
We chose discrete logic because:
- Educational value (see every gate)
- Easy to debug with logic probe
- No programming needed
- Matches Nand2Tetris philosophy
Critical Design Aspects
The Carry Chain (Most Important!)
Why it matters:
├─ Longest propagation path through ALU
├─ Determines maximum clock speed
├─ Must be routed with care
└─ Any delay here affects everything
PCB routing requirements:
├─ Straight line (no bends if possible)
├─ Equal length traces (±1mm)
├─ No vias (stay on one layer)
├─ Wide traces (0.5mm minimum)
└─ Keep away from noisy signals
Physical layout:
U17 ──20mm──► U18 ──20mm──► U19 ──20mm──► U20
Perfect alignment = optimal timing
Power Distribution
Why it matters:
├─ 34 ICs switching simultaneously
├─ Current spikes during transitions
├─ Can cause voltage drops
└─ Affects timing and reliability
Solution:
├─ Decoupling cap at EVERY IC (0.1µF)
├─ Bulk caps (100µF) at power entry
├─ Wide power traces (0.8mm)
├─ Ground plane on bottom layer
└─ Power plane zones
Each IC draws ~5mA average
Peak current: 34 × 20mA = 680mA
Use 2A supply for safety margin
Signal Integrity
Control signals fan-out:
├─ zx → 4 ICs (U1, U3, U5, U7)
├─ nx → 4 ICs (U2, U4, U6, U8)
├─ zy → 4 ICs (U9, U11, U13, U15)
├─ ny → 4 ICs (U10, U12, U14, U16)
├─ f → 4 ICs (U25, U26, U27, U28)
└─ no → 4 ICs (U29, U30, U31, U32)
Solution: Star topology
Control Signal Input
│
┌────┼────┐
│ │ │
IC1 IC2 IC3 IC4
Benefits:
├─ Equal trace lengths
├─ No reflections
├─ Clean signal at each IC
└─ Easy to debug
Understanding Through Analogy
Think of the ALU as a factory assembly line:
Stage 1: Raw Material Prep (X input)
├─ Inspector 1: "Do we need this material?" (zx)
├─ Machine 1: "Should we flip it?" (nx)
└─ Output: Preprocessed X
Stage 2: Raw Material Prep (Y input)
├─ Inspector 2: "Do we need this material?" (zy)
├─ Machine 2: "Should we flip it?" (ny)
└─ Output: Preprocessed Y
Stage 3: Main Assembly (ADD or AND)
├─ Station A: Addition line (always running)
├─ Station B: AND line (always running)
├─ Selector: "Which result do we want?" (f)
└─ Output: Function result
Stage 4: Quality Control & Shipping
├─ Inverter: "Should we flip the final product?" (no)
├─ Inspector: "Is it zero?" (ZR flag)
├─ Inspector: "Is it negative?" (NG flag)
└─ Output: Final result + status
Every "product" (computation) goes through all stages
Total time: ~100-150ns (incredibly fast!)
Common Misconceptions Clarified
"Why not just use one giant multiplexer?"
Bad idea because:
├─ Would need 2^6 = 64 input paths
├─ Each path = full 16-bit datapath
├─ Enormous IC count (100+ ICs)
├─ Slow (many mux stages)
└─ Inflexible (can't add operations)
Our design:
├─ Only 34 ICs
├─ Fast (pipelined stages)
├─ Flexible (change control signals)
└─ Understandable!
"Why compute both ADD and AND simultaneously?"
Seems wasteful, but:
├─ Only ~25ns "wasted" computation
├─ Saves a whole clock cycle
├─ Faster overall (no waiting)
├─ Both operations needed frequently
└─ ICs are cheap, time is expensive
Alternative (compute sequentially):
├─ Save 8 ICs (U21-U24 AND gates)
├─ Lose 25ns per operation
├─ More complex control logic
└─ Not worth the tradeoff
"Why not use an ALU chip like 74181?"
The 74181 is a classic 4-bit ALU chip
To build 16-bit:
├─ Need 4× 74181 (16-bit)
├─ Need 74182 carry lookahead
├─ More expensive than discrete logic
├─ Less educational value
└─ Different operation encoding
Hack ALU advantages:
├─ Custom operations for Hack computer
├─ Simpler control encoding (6 bits)
├─ Matches Nand2Tetris course
├─ Build from NAND gates (conceptually)
└─ Learn how ALUs work inside
Testing and Debugging Strategy
The Bottom-Up Approach
Level 1: Individual ICs
├─ Test power (VCC, GND)
├─ Test each gate separately
└─ Verify truth tables
Level 2: Functional Blocks
├─ Test X input conditioning (8 ICs)
├─ Test Y input conditioning (8 ICs)
├─ Test adder chain (4 ICs)
├─ Test AND gates (4 ICs)
└─ Test output stage (6 ICs)
Level 3: Integrated Operations
├─ Test all 18 operations
├─ Verify flags (ZR, NG)
└─ Check timing
Level 4: Stress Testing
├─ Rapid input changes
├─ Boundary conditions (0, -1, MAX)
├─ Random test vectors
└─ Temperature variations
What Can Go Wrong (and How to Fix It)
Problem: Wrong addition results
├─ Likely cause: Broken carry chain
├─ Test: Check each carry connection
├─ Fix: Re-solder U17-U20 connections
Problem: All outputs zero
├─ Likely cause: No power or grounded output
├─ Test: Check VCC at each IC
├─ Fix: Check power distribution
Problem: Intermittent errors
├─ Likely cause: Missing decoupling caps
├─ Test: Check with oscilloscope
├─ Fix: Add 0.1µF caps at ALL ICs
Problem: Wrong flag values
├─ Likely cause: Flag logic wiring
├─ Test: Manually set ALU_OUT, check flags
├─ Fix: Verify U33, U34, U35 connections
Problem: Slow operation
├─ Likely cause: Long carry chain traces
├─ Test: Measure propagation delay
├─ Fix: Re-route PCB with shorter traces
The Big Picture: Why This Design Matters
Educational Value
This ALU teaches you:
- Binary arithmetic - how computers actually calculate
- Boolean logic - how AND, OR, NOT combine
- Hardware design - making tradeoffs between speed, cost, complexity
- Digital timing - why trace length matters
- System integration - connecting many parts into one whole
Real-World Relevance
Modern CPUs use the same principles:
- Preprocessing inputs
- Multiple parallel execution units
- Result selection
- Status flags for conditional operations
- Pipelined architecture
Your Hack ALU is a working, real computer component!
Expandability
Want to improve it?
Easy upgrades:
├─ Add more operations (change control encoding)
├─ Add carry lookahead (faster addition)
├─ Add multiplier circuit
├─ Add barrel shifter (for shifts/rotates)
└─ Interface to microcontroller
Advanced version could:
├─ Use 74HC283 with 74HC182 carry lookahead
├─ Add 74HC181 ALU slices
├─ Create 32-bit or 64-bit version
└─ Add floating-point support
Summary: The Elegant Design
The Hack ALU is beautiful because it's:
Simple: Only 2 core operations Clever: Preprocessing/postprocessing creates 18 operations Fast: Parallel computation with result selection Educational: Every gate visible and understandable Practical: Buildable with common, cheap components Functional: Real working computer component
The key insight: You don't need special hardware for every operation. With clever preprocessing (zero/negate) and postprocessing (negate output), just ADD and AND give you everything you need!
This is elegant engineering - maximum functionality from minimum complexity.