Blog Logo

2025-12-06 ~ 19 min read

The TD4 4-Bit CPU A Comprehensive Analysis of Minimalist TTL/CMOS Architecture and Digital Logic Implementation


The TD4 4-Bit CPU: A Comprehensive Analysis of Minimalist TTL/CMOS Architecture and Digital Logic Implementation

1. Introduction: Contextualizing Minimalist CPU Design

The TD4 4-bit Central Processing Unit (CPU) stands as a landmark in digital logic education and minimalist computer architecture. Conceived by Iku Watanabe, the TD4 is renowned for being one of the world’s simplest functional CPUs, explicitly designed to cultivate hobbies and provide beginners with a direct, intuitive understanding of assembly language and computer composition.1 Its entire design philosophy is rooted in architectural minimalism, successfully achieving Turing completeness while utilizing a minimal number of standard integrated circuits (ICs), typically built from less than 20 common logic gate chips.1

This educational framework is built upon the ubiquitous 74HC series of transistor-transistor logic (TTL) or high-speed CMOS (HC) components, such as the 74HC14 Schmitt trigger, 74HC161 counter, and 74HC283 adder.5 The transparency afforded by constructing the CPU entirely from discrete logic allows students and engineers to visually observe the signal flow and state changes across the system, often aided by numerous LEDs illuminating the bus lines, registers, and program counter.6

1.1. Core Architectural Constraints

The design of the TD4 is severely constrained, which is necessary to maintain its simplicity. The data path is fixed at a 4-bit word size, defining the capacity of its registers and the range of its arithmetic operations.2 Crucially, the program memory is extremely limited, featuring only 16 bytes of Read-Only Memory (ROM).6 This ROM is typically implemented using 16 banks of DIP switches, allowing for manual programming without the need for specialized external programmers.1

This design utilizes a Harvard architecture model, characterized by separate spaces for program instructions (the switch-based ROM) and data storage (the internal registers A and B).8 This separation simplifies the control logic by avoiding the need for a unified memory bus capable of handling both instruction fetching and data access simultaneously, a complexity inherent in typical Von Neumann designs. The architecture contains no general-purpose Random Access Memory (RAM), forcing data manipulation to rely strictly on the two main registers and external I/O ports.4

The severe optimization for physical component reduction represents a key design trade-off. The aggressive effort to maintain a minimal chip count, essential for low cost and simplicity 8, directly constrains the resulting Instruction Set Architecture (ISA). For example, the Command Decoder logic, responsible for interpreting the opcode, was optimized to use a minimum number of logic chips (often cited as two decoder chips).10 This constraint necessitates sacrificing instruction orthogonality, resulting in an intentionally incomplete ISA where only 12 of the 16 possible 4-bit opcodes are mapped to functional instructions.10 This restriction highlights a fundamental engineering decision where the simplicity of the hardware implementation dictates the complexity and completeness of the machine language interface.

2. Datapath Components and Digital Logic Implementation

The TD4 datapath is a direct realization of classical CPU components using off-the-shelf 74HC series ICs. The implementation choices illustrate how fundamental computer functions can be achieved using basic logic blocks.

2.1. The Register File and Specialized Storage

The TD4 contains several vital 4-bit storage elements: two general-purpose registers (A and B), the Program Counter (PC), and the Output Register. A key component is the 74HC161 4-bit synchronous binary counter, which serves as the foundation for all 4-bit registers.6

Registers A, B, and the Output Register are primarily utilized in their parallel-load capacity. Data is latched into these registers on the clock edge, provided that the corresponding control signal, such as not_LOADx, is asserted low.9 The Program Counter (PC) is unique among these, leveraging both its parallel-load functionality for jump instructions and its synchronous counting capability for fetching sequential instructions. When a jump occurs, the target address is loaded into the PC register, overriding the sequential counting mechanism.9

The system’s status information is limited to the single Carry Flag (C Flag), which is realized using a single D flip-flop (an element found within a 74HC74 dual D flip-flop package).6 This flag captures the carry-out result of the Arithmetic Logic Unit (ALU), serving as the sole mechanism for conditional control flow.

2.2. The Arithmetic Logic Unit (ALU)

The TD4’s arithmetic capability is provided entirely by a single dedicated chip: the 74HC283 4-bit binary full adder.6 This chip is an asynchronous device, meaning its output (the 4-bit sum and the carry-out signal) is generated immediately based on its inputs, independent of the clock cycle, which simplifies the Control Unit design.12

The 74HC283 ALU is a two-input device:

  1. Input 1: Always receives the 4-bit Immediate value (Im) derived from the current instruction’s lower four bits.9
  2. Input 2: Receives data selected by the Data Selector, which determines the second operand.9

A significant consequence of this implementation is the inherent limitation of the ALU. By using only a full adder, the TD4 is fundamentally restricted to addition operations. It lacks native subtraction, comparison, or bitwise logical operations (AND, OR, XOR), forcing complex decision-making and data manipulation to be simulated through addition and reliance on the resulting C Flag state.12

2.3. Program ROM and Address Decoding

The TD4’s memory structure is inherently simplified. The program memory consists of 16 banks of 8-position DIP switches, providing 16 full 8-bit instruction slots.6 This arrangement constitutes the entire program memory available to the CPU.

The Program Counter’s 4-bit output determines which instruction is currently active. This 4-bit address signal is fed into an Address Decoder, typically implemented with a 4-to-16 demultiplexer (such as the 74HC154 or 74HC138).6 The decoder converts the 4-bit address into one of 16 active-low select signals, which then electrically closes the circuit for the corresponding row of DIP switches, placing the 8-bit instruction onto the instruction bus.9

2.4. Data Selection and Multiplexing

The datapath requires mechanisms to route data from various sources (registers, input port, constant zero) to the ALU’s second input. This is achieved using Data Selectors, specifically implemented using dual 4-to-1 multiplexers, such as the 74HC153.6 The multiplexers receive control signals (SEL_A and SEL_B) from the Control Unit to choose the appropriate source operand.9

This datapath design mandates that all data manipulation—including simple data transfers—is executed via the single 74HC283 adder.9 For instance, a move operation such as MOV A, Im (Move Immediate to Register A) is not implemented as a direct register write, but rather as an arithmetic operation: the Control Unit asserts the Data Selector to choose a hard-wired zero signal (ground) as the second operand. The ALU then performs the calculation of 0\+text{Im}0 \+ \\text\{Im\}, and the result is clocked into Register A.9 This architectural design choice unifies the execution pathway for both arithmetic and data transfer instructions, significantly simplifying the Control Unit’s signaling requirements and contributing to the minimal chip count.

3. Synchronization, Clocking, and Timing

The TD4 operates as a fully synchronous digital circuit, relying on a system clock for sequencing operations and preventing race conditions.13

3.1. Clock Generation and Frequency

The system clock signal, usually a square wave, is typically generated by a circuit based on the 74HC14 Hex Schmitt trigger inverter.6 In some variants, an external component like a 555 timer is used.8 The clock frequency is intentionally set low, generally ranging from 1 Hz to 1000 Hz.8 This low clock speed is essential for the TD4’s educational function, allowing the user to observe the execution of instructions step-by-step through the visual feedback provided by the extensive LED display network.6

For debugging and manual execution, a push-button switch enables a step-by-step mode. The signal from this button is typically passed through the 74HC14 inverter to ensure the mechanical switch action results in a clean, debounced clock signal edge, which is necessary for reliable clocking of the synchronous registers.12

3.2. Micro-timing and the Two-Phase Cycle

TD4 instructions are generally designed to execute in a single clock cycle.8 However, the internal synchronous operation often requires a two-phase timing structure to coordinate the asynchronous ALU calculations with the synchronous register loads.8

  1. Phase 1 (Rising Clock Edge): During this phase, the instruction is fetched, decoded, and the necessary data is routed to the ALU inputs. The asynchronous 74HC283 ALU executes the operation (Add/Move) and generates a stable output result and Carry Flag state.8
  2. Phase 2 (Falling Clock Edge): Once the ALU output has stabilized, the synchronous components—specifically the 74HC161 registers and the PC—are triggered. Based on the control signals asserted during Phase 1, the new data is latched into the destination register, or the Program Counter is incremented/loaded.8

The design is highly efficient in its fetch stage due to the choice of program memory. Because the ROM is implemented using combinatorial logic (DIP switches and a decoder), instruction retrieval is extremely fast; the instruction bits become available on the bus almost instantaneously after the Program Counter output stabilizes.9 This avoids the multi-cycle latency and complex timing control typically required to interface with slower, dynamic memory (RAM), simplifying the control logic’s sequencing requirements during the Fetch stage.14

4. Instruction Set Architecture (ISA) and Programmatic Constraints

The ISA is the programmable interface of the TD4 CPU, defining its operations, registers, and data types.15 It reflects the fundamental constraints imposed by the component-minimalist design.

4.1. Instruction Encoding and Addressing Modes Review

All instructions are encoded as 8 bits. The upper 4 bits (Bits 7-4) constitute the Opcode, defining the operation, while the lower 4 bits (Bits 3-0) serve as the Immediate value (Im).9

The TD4 supports several basic addressing modes 16:

  • Immediate Addressing: The operand value is contained directly within the instruction itself (e.g., ADD A, Im).17
  • Register Addressing (Register Direct): The operand is located in one of the CPU’s registers (A or B) specified by the instruction (e.g., MOV A, B).17
  • Implicit Addressing: The operand is known automatically, such as the Input Port or the Program Counter being incremented.17

4.2. Opcode Dictionary and Semantic Analysis

The standard TD4 implementation utilizes 12 of the 16 available opcodes.10 These instructions cover fundamental operations required for Turing completeness: data movement, arithmetic, I/O, and control flow.9

Table 1: TD4 Standard Instruction Set and Binary Mapping

InstructionOpcode (Bits 7-4)Im (Bits 3-0)MnemonicOperation Meaning
00000ImADD A, ImA + Im rightarrow\\rightarrow A
100010000MOV A, BB rightarrow\\rightarrow A
200100000IN AInput Port rightarrow\\rightarrow A
30011ImMOV A, ImIm rightarrow\\rightarrow A
401000000MOV B, AA rightarrow\\rightarrow B
50101ImADD B, ImB + Im rightarrow\\rightarrow B
601100000IN BInput Port rightarrow\\rightarrow B
70111ImMOV B, ImIm rightarrow\\rightarrow B
910010000OUT BB rightarrow\\rightarrow Output Port
111011ImOUT ImIm rightarrow\\rightarrow Output Port
141110ImJNC ImPC rightarrow\\rightarrow Im if Carry Flag = 0
151111ImJMP ImPC rightarrow\\rightarrow Im (Unconditional)

4.3. The Architectural Leak: Immediate Value Coupling

A crucial detail stemming from the minimalist datapath design is that the 4-bit immediate value is permanently wired to one input of the 74HC283 adder.9 This pervasive connection means that conceptually pure move or I/O instructions (e.g., OUT B or MOV A, B), which should ignore the immediate value, technically still execute an addition operation involving the Im value.9

Consequently, the burden falls upon the programmer to ensure that for instructions that do not require an immediate operand, the corresponding DIP switches for the lower four bits are explicitly set to 0000.9 While the Control Unit prevents the ALU output from being latched into a destination register during instructions like OUT B, the presence of a non-zero immediate value will still result in spurious data signals on the internal bus, potentially complicating debugging or future architectural extensions. This phenomenon serves as a powerful illustration of how hardware simplification directly influences programmatic discipline and exposes the physical implementation details to the assembly language level.

5. Control Unit and State Sequencing

The Control Unit (CU) is the managerial component of the CPU, orchestrating the fetch-decode-execute cycle by translating the 8-bit instruction into timed control signals.18 In the TD4, this unit is highly streamlined to achieve component minimization.

5.1. Command Decoder Logic

The Command Decoder, which forms the heart of the Control Unit, is implemented using discrete combinatorial logic chips (IC8 and IC10).9 These typically include basic gate arrays such as the 74HC10 Triple 3-input NAND gate and the 74HC32 Quad 2-input OR gate.6 This hardwired logic is responsible for realizing the ISA’s truth table: mapping the 4-bit opcode input into the dozens of simultaneous control outputs required by the datapath components.20

5.2. Control Signal Generation

The decoder’s outputs are categorized by their function:

  • Register Loading: Generating active-low signals like not_LOAD_A and not_LOAD_B. During execution, the Control Unit ensures that only the intended destination register receives a low signal, enabling the data transfer from the ALU bus.9
  • Data Selection: Generating SEL_A and SEL_B to manage the 74HC153 multiplexers, directing the correct source data (A, B, Input, or Zero) to the ALU input.9

A significant characteristic of the TD4 design is the avoidance of a complex sequential state machine, common in microcoded or multi-cycle architectures.22 Since all instructions execute in a single clock cycle 8, the Control Unit can generate its control signals almost entirely through combinatorial logic.9 This reliance on instantaneous logic signal generation simplifies the control implementation dramatically, replacing complex sequencing registers with simple logic gates. The feasibility of this approach is contingent upon the low clock frequency and rapid stabilization of the 74HC logic outputs.

5.3. Control Flow Implementation (Jumps and Branching)

Control flow instructions—JMP (unconditional jump) and JNC (jump if no carry)—are managed by overriding the Program Counter’s default increment function and forcing a parallel load of a new address.9

  • JMP Im (Opcode 1111): The Opcode dictates that the Control Unit unconditionally asserts the not_LOAD_PC signal. This action clocks the 4-bit immediate value (which specifies the target address) directly into the PC register, effecting an immediate jump.9
  • JNC Im (Opcode 1110): This conditional branch relies on incorporating the Carry Flag state into the Control Unit’s logic. The hardware implements a logical AND operation: the PC is loaded only if the Opcode is 1110 and the Carry Flag is currently clear (logic 0).9 This conditional gating of the PC load line allows the machine to alter its execution path based on the result of the preceding arithmetic operation.

6. Practical Programming and Operational Analysis

The practical application of the TD4 ISA demands a keen awareness of the underlying hardware mechanisms and architectural limitations.

6.1. Input/Output (I/O) Mechanism Detail

The TD4 utilizes simple, direct I/O ports. Input data is provided by a set of micro switches, which are buffered and routed to the Data Selector.6 Instructions like IN A (0010 0000) and IN B (0110 0000) activate the appropriate select signals to pass the input port data through the ALU (added to zero) and latch it into registers A or B.9 The Output Port is controlled by instructions OUT B (1001 0000) or OUT Im (1011 Im). The output state, visible via a set of LEDs (often driven by an inverting buffer such as the 74HC540) 6, displays the data from either register B or the immediate value of the instruction.9

6.2. Assembly Programming Examples: The Counting Loop

The small register size (4 bits) and the reliance on the C-Flag for conditional branching mandate careful programming techniques. A classic example is creating a loop that counts from 0000 to 1111 (0 to 15 decimal) and then repeats, which demonstrates the interaction between the ALU, the Carry Flag, and the conditional jump instruction.12

Table 2: TD4 Counting Loop Program Example (Partial Trace)

AddressOpcodeMnemonicInitial Register BFinal Register BCarry Flag (C)Next PC
00001001 0000OUT B0000000000001
00010101 0001ADD B, 10000000100010
00101110 0000JNC 00001000100000 (Jump taken)
(Loop repeats until 1111 is reached)
00010101 0001ADD B, 11111000010010
00101110 0000JNC 00000000010011 (Jump not taken)

When register B reaches 1111_21111\_2 and the instruction ADD B, 1 executes, the result is 0000_20000\_2 with a Carry Flag set to 1, indicating overflow. This transition causes the JNC 0 instruction to fail its condition (C is not 0), allowing sequential execution to proceed to address 0011_20011\_2. This demonstrates how the C-Flag serves as the primary method for handling conditional decisions within the TD4 architecture.9

The structural absence of dedicated comparison or bitwise logic operations in the ISA compels the programmer to treat the ALU and the Carry Flag as a pseudo-logic unit. All decision-making must be synthesized by performing arithmetic operations designed specifically to generate a desired C-Flag state. Furthermore, strict state initialization is required; if the register B is not explicitly cleared after an overflow (e.g., via MOV B, 0), the register would retain the overflowed value and immediately cause an unwanted overflow in the subsequent loop iteration.12

7. Conclusion: Synthesis and Future Directions

The TD4 4-bit CPU represents a successful and elegantly minimalist approach to computer architecture. Its construction from standard 74HC TTL/CMOS logic chips provides an unparalleled view of the classical CPU components—the ALU, registers, and Control Unit—and their interaction during the synchronous fetch-decode-execute cycle.18 The system’s low clock speed and visual feedback via LEDs reinforce its primary role as a pedagogical tool.8

The analysis confirms that the TD4’s architectural limitations—specifically the 4-bit word size, the 16-byte ROM constraint, and the reliance on a single ALU for all datapath operations—are not deficiencies but necessary consequences of the design goal: achieving maximum simplicity and transparency. These limitations inherently restrict its practicality but maximize its educational value. The requirement for programmers to zero out the immediate field for non-immediate instructions, and the necessity of managing flow control entirely through the Carry Flag, are direct consequences of the highly optimized hardware implementation.

The open-source nature of the TD4 fosters architectural experimentation and extension.2 Common modifications proposed or undertaken by enthusiasts include mapping the four reserved opcodes, integrating standard RAM memory, expanding the address space beyond 16 instructions, or replacing the mechanical DIP switch ROM with modern programmable devices like EEPROMs or microcontrollers (such as an Arduino) for simplified programming.4 These potential enhancements validate the robust and modular foundation provided by the TD4’s standard digital logic implementation.

Works cited

  1. Minimax 4bit CPU - TD4 architecture CPU/SBC from Denjhang’s Retro Hardware on Tindie, accessed December 5, 2025, https://www.tindie.com/products/denjhang/minimax-4bit-cpu-td4-architecture-cpusbc/
  2. CPU DIY Kit TD4 CPU Make a simple 4-bit CPU By Yourself Open Source Software and Hardware Including PCB and All Components - AliExpress, accessed December 5, 2025, https://www.aliexpress.com/item/1005005101922345.html
  3. 4Bit TD4 CPU Self-made Introduction 74 Series Chip Logic Circuit Design CPU Operating Principle Learning - AliExpress, accessed December 5, 2025, https://www.aliexpress.com/item/1005005034385162.html
  4. TD4 CPU - Hackaday.io, accessed December 5, 2025, https://hackaday.io/project/26215-td4-cpu
  5. TD4 CPU Assembly Kit 74 Series Chips Logic Circuit Design STEM CPU Design | eBay, accessed December 5, 2025, https://www.ebay.com/itm/396591429756
  6. TD4 DELUXE THE SIMPLE TTL CPU Make your own CPU and learn how computers work! - Budgetronics, accessed December 5, 2025, https://www.budgetronics.eu/data/mediablocks/TD4%20building%20kit%20manual.pdf
  7. TD4 4-bit DIY CPU – Part 5 - Kevin’s Blog - WordPress.com, accessed December 5, 2025, https://emalliab.wordpress.com/2025/10/29/td4-4-bit-diy-cpu-part-5/
  8. The simplest 4-bit RISC CPU - Hackaday.io, accessed December 5, 2025, https://hackaday.io/project/191290-the-simplest-4-bit-risc-cpu
  9. Guide to the TD4 4-bit DIY CPU | Hey There Buddo! - Philip Zucker, accessed December 5, 2025, https://www.philipzucker.com/td4-4bit-cpu/
  10. Project | TD4 CPU - Hackaday.io, accessed December 5, 2025, https://hackaday.io/project/26215/logs?sort=oldest
  11. 4Bit TD4 CPU Self-made 74 Series Chip CPU from innoelement on Tindie, accessed December 5, 2025, https://www.tindie.com/products/johnson/4bit-td4-cpu-self-made-74-series-chip-cpu-2/
  12. DIY 4-bit CPU. Have you ever made a processor? I did… | by teardownit 🛠️ ✍️ | Medium, accessed December 5, 2025, https://teardownit.medium.com/diy-4-bit-cpu-917e7bff228a
  13. Clock signal - Wikipedia, accessed December 5, 2025, https://en.wikipedia.org/wiki/Clock_signal
  14. Instruction cycle - Wikipedia, accessed December 5, 2025, https://en.wikipedia.org/wiki/Instruction_cycle
  15. Instruction set architecture - Wikipedia, accessed December 5, 2025, https://en.wikipedia.org/wiki/Instruction_set_architecture
  16. Addressing mode - Wikipedia, accessed December 5, 2025, https://en.wikipedia.org/wiki/Addressing_mode
  17. Addressing Modes - GeeksforGeeks, accessed December 5, 2025, https://www.geeksforgeeks.org/computer-organization-architecture/addressing-modes-1/
  18. Central processing unit - Wikipedia, accessed December 5, 2025, https://en.wikipedia.org/wiki/Central_processing_unit
  19. OS 00: Into the CPU - Registers, ALU, and Control Unit Explained - DEV Community, accessed December 5, 2025, https://dev.to/faiyaz032/os-00-into-the-cpu-registers-alu-and-control-unit-explained-3o2d
  20. Boolean Algebra Truth Tables for Logic Gate Functions - Electronics Tutorials, accessed December 5, 2025, https://www.electronics-tutorials.ws/boolean/bool_7.html
  21. Lab 4 - The Datapath, and ALU Control Units, accessed December 5, 2025, http://www.cs.ucr.edu/~windhs/lab4/lab4.html
  22. Using State Machines in Programmable Logic - Texas Instruments, accessed December 5, 2025, https://www.ti.com/lit/pdf/scla076
  23. asfdrwe/simpleTD4: implementation of 4bit CPU TD4 written with verilog - GitHub, accessed December 5, 2025, https://github.com/asfdrwe/simpleTD4
  24. Introduction to the Fetch-Execute Cycle | Baeldung on Computer Science, accessed December 5, 2025, https://www.baeldung.com/cs/fetch-execute-cycle

Photo of Yinhuan Yuan

Hi, I'm Yinhuan Yuan. I'm a software engineer based in Toronto. You can read more about me on yuan.fyi.