Building one's own computer from scratch is a challenge that eventually crosses the mind of everyone interested in digital electronics. This challenge can take two very different forms.
First way is building a computer around an existing CPU. People generally use old and simple 8-bit processors, available in DIP packages and running with a slow clock, such as Zilog Z80 or Mostek 6502. Those projects put the emphasis on I/O peripherals and software.
The other way is building a microprocessor from scratch, at gate level. As opposed to using an existing CPU, the biggest part of the project is theoretical as it requires a lot of design and choice. The final implementation can be done with ready to use logic gates and circuits (like the 7400 and the 4000 series), with discrete components (transistors or even relays !) or with programmable logic (FPGA and CPLD).
I was thinking about building my own computer for a long time and the 7400 Logic Competition became a perfect occasion for such a project. Using an existing CPU didn't interest me because it mainly consists in reading the documentation of the chosen CPU and interfacing it with peripherals. Moreover, such a project wouldn't put the 7400 series at the center but use it only as glue logic (address decoding etc.). So I decided to build my own CPU from scratch with 7400 logic circuits.
There is already a huge number of home-brew CPUs on the Internet, but there were always several things that bothered me about them.
Here are the objectives for my design to make it withstand from the other home-brew CPUs.
I found one existing processor that may have aproached those objectives at some point: PISC uses only 22 logic chips and can run very powerful instructions on 16 bit data words. The problem is that is uses both 74181 ALUs and 74172 register files, that are not longer produced and cannot be replaced with other 74 chips.
A CPU can in theory perform any work as long as it is Turing complete and has enough memory. At first I thought about One Instruction or No Instruction Set Processors. They would have been very easy to build with just a few circuits. But doing anything useful on those architectures is a challenge by itself. Thus, they totally fail at objectives 3 and 4.
One instruction is not enough, but my CPU has to be minimalistic somehow, otherwise it won't be able to fit in 20 chips. What operations and what elements are required to perform some useful work ? By chance, I thought about BrainFuck. It is an esoteric programming language that has only 8 instructions but can perform some real tasks. I quickly found a program that draws a Mandelbrot fractal in ASCII and a decent implementation of the Game of Life. Most people don't write BrainFuck code directly, but use a macro language. There even exists a compiler from C to BrainFuck.
The eight instructions of BrainFuck:
Character | Meaning |
> | increment the data pointer |
< | decrement the data pointer |
+ | increment the byte at the data pointer |
- | decrement the byte at the data pointer |
. | output the byte at the data pointer as an ASCII encoded character |
, | accept one byte of input, storing its value in the byte at the data pointer |
[ | if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command |
] | if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command |
Here is an example program that prints "ABCDE":
++++++++[>++++++++<-]>+.+.+.+.+.
To execute natively those commands, we will at least need a register to hold the address of the current data cell (let's call it P) and at least one working register that will be able to increase or decrease its contents (let's call it A). All instructions are pretty straightforward to execute, except [ and ]. Indeed, a CPU doesn't have a notion of block or matching parenthesis, it is a high level notion (think of { and } in C or Java). Therefore we will need a compiler that will translate [ and ] into labels and jump statements.
Here is how BrainFuck can be translated into a low-level RISC assembly language that can be run directly by a CPU:
BrainFuck Instruction | Assembly Code |
> | INC P |
< | DEC P |
+ | LD A place into A the contents of the memory cell pointed by P INC A ST A place the contents of A into the memory cell pointed by P |
- | LD A DEC A ST A |
. | LD A OUT A (depends on how output is interfaced) |
, | IN A ST A |
[ | loop: JZ endloop jumps to label if A is 0 |
] | JMP loop endloop: |
Here is our program translated to RISC opcodes (actually I implemented JMP as CLR + JZ, with CLR an instruction that sets a register to 0):
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
loop_0:
LD A; JZ end_loop_0
INC P
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
LD A; INC A; ST A
DEC P
LD A; DEC A; ST A
CLR A; JZ loop_0
end_loop_0:
INC P
LD A; INC A; ST A
LD A; OUT A
LD A; INC A; ST A
LD A; OUT A
LD A; INC A; ST A
LD A; OUT A
LD A; INC A; ST A
LD A; OUT A
LD A; INC A; ST A
LD A; OUT A
LD A; INC A; ST A
LD A; OUT A
That's it, we only need to handle those 10 basic operations to be able to run BrainFuck programs.
Data words must be at least 8 bits long to be able to run real Brainfuck programs. Game of Life requires around 10kB of memory space to run. Mandelbrot compiles into around 10k instructions. That's why the memory address bus and the program address bus must be around 16 bits wide to run useful code.
It's an important design choice to decide whether the code and the data are stored in the same memory or not. Harvard is very straightforward and uses two separate buses. It's simpler to implement, it requires less control logic and it's faster, the CPU can fetch and execute instructions at the same time. I tried both designs for my project (bfm2 VHDL file implements Von Neumann architecture and bfm3 is my final Harvard design), and only with a Harvard architecture I was able to fit the execution of BrainFuck programs into 20 logic chips.
I didn't really understand the physical difference between the two before building my own CPU, but now I do. CISC CPUs use microsequencer, a counter connected to a control ROM (or microcode) that divide the execution of a single instruction into several clock cycles. A CISC architecture can implement more complex instructions (like memory to memory operations), but they're slower and require more control logic. On the other side, RISC CPUs execute as simple instructions as possible, as fast as possible, often in 1 or 2 clock cycles.
My first attempt of the BrainFuck CPU was using a control ROM with up to 16 clock cycles per instruction, but to fit into 20 chips, I had to use the same adder to increase the PC register and doing operations on A and P registers. It worked, but was too slow (16 cylces per instruction) and I wasn't confident about implementing it in hardware. Finally I came up with a RISC architecture that runs one instruction per clock cycle. It is much simpler, robust and faster than the CISC version.
It really doesn't matter for the speed of operation in our case, thus I decided to store memory addresses in Big Endian format, so that the hex dumps of machine code could be read more easily.
Here is the final architecture I came out with for my project.
Each instruction takes one clock cycle to execute and the program and data buses are separate, we don't need an IR register nor a microsequencer. Each opcode is the actual value of the control signals, except for the JZ instruction, where the opcode carries the destination address and all the control signals are driven into their inactive state.
Instr. | Bit 0 | Bits 1 to 15 |
JZ | 1 | jump address |
other | 0 | control signal values |
Number | Name | Role | When |
0 | JZ | drives all other control signals to inactive, loads the jump address into PC if A is 0 | control signals applied immediately, jump address loaded on the next rising edge of CLK |
1 | RAM_/WR | writes the value on the databus to RAM at address P | next rising edge of CLK |
2 | RAM_/OE | outputs RAM value pointed by P on the databus | immediately |
3 | P_/CTEN | enable increase/decrease of P | next rising edge of CLK |
4 | P_/LOAD | load the value on the databus to P | next rising edge of CLK |
5 | D/U | controls if A or P are increased or decreased by /CTEN signals | immediately |
6 | A_/CTEN | enable increase/decrease of P | next rising edge of CLK |
7 | A_/LOAD | load the value on the databus to P | next rising edge of CLK |
8 | A_/OE | output A value to the databus | immediately |
9 | OUT | output peripheral control | - |
10 | IN | input peripheral control | - |
I wrote a complete software suite in Python to build and test programs for my CPU.
compiler/bfc.py
translates BrainFuck code into CPU opcodes.compiler/bfas.py
translates opcodes into machine code (in VHDL std_logic_vector
format).compiler/bfas_x86.py
translates opcodes into x86 assembly for testing purposes.ucode/ucodec.py
is a microcode compiler I wrote for the first CISC version of the CPU, it also helped me to write instructions for the RISC version. It translates a specific language I designed into control ROM. See bfm2.ucode
and bfm3.ucode
comments for more details.examples/
contains various BrainFuck programs such as the Game of Life and Mandelbrot fractal in ASCII.For design decisions I had to take while building my CPU I used VHDL to test different possibilities. I also wanted to be sure my CPU would work properly once implemented in real hardware, so I simulated all the 74 circuits I used first.
vhdl/src/buff.vhd
74241 like buffer.vhdl/src/latch.vhd
74373 like latch.vhdl/src/dff.vhd
74232 like flip-flop.vhdl/src/counter.vhd
generic counter.vhdl/src/counter_74191.vhd
.vhdl/src/counter_74163.vhd
.vhdl/src/counter_74590.vhd
.vhdl/src/adder.vhd
7480 full adder.vhdl/src/ram.vhd
static RAM or ROM with contents preloaded from a file.vhdl/test/*
testbenches for all those components.vhdl/test/bfm.vhd
first unsuccessful CPU.vhdl/test/bfm2.vhd
CISC Von Neumann version, works but slow and uses many chips.vhdl/test/bfm3.vhd
final RISC Harvard design.All VHDL code is compiled using open source GHDL simulator. It takes about 10 minutes for GHDL to draw the first line of the Mandelbrot!
My prototype is a straightforward implementation of my final RISC BrainFuck CPU design.
I planned to use 2 32kB SRAM chips, backed with a lithium battery, as program memory. I failed to build the battery switching circuit myself (it was corrupting some bytes) and didn't have time to order one, so I used a big old 27C1024 UV EPROM memory as a temporary solution.
Circuit diagram and PCB were both made using Kicad.
My CPU runs at 2 MHz, executing one instruction per clock-cycle. With SRAM in lieu of 27C1024 EPROM (100ns access time), I'll be able to go up to 10 MHz.
The Mandelbrot fractal takes about 2 minutes to be drawn.
I used a 16550
UART chip so that my CPU can send an receive bytes of data through a serial cable. It is a pretty complex chip, which is not part of the 7400 series but also it is not part of my CPU, just a peripheral.
Name | Quantity | Function |
W24257A | 1 | RAM: 32k x 8 bit |
M27C1024 | 1 | ROM: 64k x 16 bit |
74HC163 | 4 | PC register: 16 bit, up counter, synchronous preload |
74HC241 | 2 | control signals buffer (selects between jump and other instructions) |
74HC191 | 4 | P register: 16 bit, up/down counter |
74HC191 | 2 | A register: 8 bit, up/down counter |
74HC241 | 1 | A register to databus buffer |
16C550 | 1 | UART chip to communicate with the outside world |
74HC241 | 1 | UART status to databus buffer |
Total 74XX logic chips: 14.
Total cost of components : around 10$ (the UART is the most expensive chip).
Everything is available from at least two manufacturers at Farnell, Digikey and Mouser. (Except for the 27C1024 which will be replaced by cheap SRAM in the final design.)
I didn't have a programmer to flash the 27C1024 chip, so I made one myself. It's a quick and dirty Arduino shield using 4 74LS373 latches and 2 74LS244 buffers (74 series again!) to obtain 16 bits of bidirectional IO for data and 16 bits of output for address.
Source code for the programmer is inclided.
Unfortunately, I didn't have the time to finish the final PCB with CMS components.
27C1024
ROM by two 8 bit SRAM chips backed by a battery.It is possible to design a relatively powerful processor with a very small amount of logic circuitry, provided that the architecture and the instructions are carefully built for a concrete application.
74 logic circuits are still very useful today, new families with better parameters come out, old circuits are removed and new ones added.
Thanks to this project, I learned a lot about CPU design, logic circuits and logic simulation. I hope it will inspire other projects.
It is 4 AM GMT as I write this article, my sources aren't as clean and documented as I'd like them to be. I will try to correct this in a few days.
All work related to this project is placed into public domain.
31st october 2012, Alexis Bezverkhyy alexisSTOPSENDINGMESHIT@grapsus.net