Open 7400 Logic Competition: The BrainFuck Machine

Intro

Building one's own computer from scratch is a challenge that eventually crosses the mind of everyone interested in digital electronics. This challenge can take two very different forms.

First way is building a computer around an existing CPU. People generally use old and simple 8-bit processors, available in DIP packages and running with a slow clock, such as Zilog Z80 or Mostek 6502. Those projects put the emphasis on I/O peripherals and software.

The other way is building a microprocessor from scratch, at gate level. As opposed to using an existing CPU, the biggest part of the project is theoretical as it requires a lot of design and choice. The final implementation can be done with ready to use logic gates and circuits (like the 7400 and the 4000 series), with discrete components (transistors or even relays !) or with programmable logic (FPGA and CPLD).

I was thinking about building my own computer for a long time and the 7400 Logic Competition became a perfect occasion for such a project. Using an existing CPU didn't interest me because it mainly consists in reading the documentation of the chosen CPU and interfacing it with peripherals. Moreover, such a project wouldn't put the 7400 series at the center but use it only as glue logic (address decoding etc.). So I decided to build my own CPU from scratch with 7400 logic circuits.

State of Art

There is already a huge number of home-brew CPUs on the Internet, but there were always several things that bothered me about them.

They use outdated, pre Cold War, discontinued 74 circuits. The classic ones are 74181 4-bit ALU and 74172 multiport register file. Those two chips were only available in L or LS logic (not even CMOS), they are slow, have a huge power consumption (like 50 mA) and you have to hunt on eBay for them at 10$ a piece. A project that uses outdated chips cannot be replicated by other people. Even if they're perfect building blocks for a CPU, they're gone and somehow avoid the real design from scratch.
Home-brew CPUs often try to imitate existing ones from the industry. Thus they implement opcodes just because they can be found in the Z80 or the 8080 and not because they serve any particular purpose.
Because of this imitation, home-brew CPUs become too complex. It's not uncommon to find over 300 chips in them. This makes them hard to understand for other people and they loose their educational purpose.

Objectives

Here are the objectives for my design to make it withstand from the other home-brew CPUs.

Use only modern, available and multi-sourced chips. So that everyone can reproduce it.
Use as few circuits as possible. Let's say no more than 20. In that way, it will have an educational value.
Be able to perform some real work, like draw a fractal or play a text-based game.
Do it at reasonable speed. If a 2$ micro-controller can emulate my CPU faster than the real hardware, it isn't worth it.

I found one existing processor that may have aproached those objectives at some point: PISC uses only 22 logic chips and can run very powerful instructions on 16 bit data words. The problem is that is uses both 74181 ALUs and 74172 register files, that are not longer produced and cannot be replaced with other 74 chips.

Design

Instruction set

A CPU can in theory perform any work as long as it is Turing complete and has enough memory. At first I thought about One Instruction or No Instruction Set Processors. They would have been very easy to build with just a few circuits. But doing anything useful on those architectures is a challenge by itself. Thus, they totally fail at objectives 3 and 4.

One instruction is not enough, but my CPU has to be minimalistic somehow, otherwise it won't be able to fit in 20 chips. What operations and what elements are required to perform some useful work ? By chance, I thought about BrainFuck. It is an esoteric programming language that has only 8 instructions but can perform some real tasks. I quickly found a program that draws a Mandelbrot fractal in ASCII and a decent implementation of the Game of Life. Most people don't write BrainFuck code directly, but use a macro language. There even exists a compiler from C to BrainFuck.

mandelbrot game of life

The eight instructions of BrainFuck:

Character	Meaning
>	increment the data pointer
<	decrement the data pointer
+	increment the byte at the data pointer
-	decrement the byte at the data pointer
.	output the byte at the data pointer as an ASCII encoded character
,	accept one byte of input, storing its value in the byte at the data pointer
[	if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command
]	if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command

Here is an example program that prints "ABCDE":

++++++++[>++++++++<-]>+.+.+.+.+.

To execute natively those commands, we will at least need a register to hold the address of the current data cell (let's call it P) and at least one working register that will be able to increase or decrease its contents (let's call it A). All instructions are pretty straightforward to execute, except [ and ]. Indeed, a CPU doesn't have a notion of block or matching parenthesis, it is a high level notion (think of { and } in C or Java). Therefore we will need a compiler that will translate [ and ] into labels and jump statements.

Here is how BrainFuck can be translated into a low-level RISC assembly language that can be run directly by a CPU:

BrainFuck Instruction	Assembly Code
>	INC P
<	DEC P
+	LD A place into A the contents of the memory cell pointed by P INC A ST A place the contents of A into the memory cell pointed by P
-	LD A DEC A ST A
.	LD A OUT A (depends on how output is interfaced)
,	IN A ST A
[	loop: JZ endloop jumps to label if A is 0
]	JMP loop endloop:

Here is our program translated to RISC opcodes (actually I implemented JMP as CLR + JZ, with CLR an instruction that sets a register to 0):

    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
loop_0:
    LD A; JZ end_loop_0
    INC P
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    LD A; INC A; ST A
    DEC P
    LD A; DEC A; ST A
    CLR A; JZ loop_0
end_loop_0:
    INC P
    LD A; INC A; ST A
    LD A; OUT A
    LD A; INC A; ST A
    LD A; OUT A
    LD A; INC A; ST A
    LD A; OUT A
    LD A; INC A; ST A
    LD A; OUT A
    LD A; INC A; ST A
    LD A; OUT A
    LD A; INC A; ST A
    LD A; OUT A

That's it, we only need to handle those 10 basic operations to be able to run BrainFuck programs.

How many bits ?

Data words must be at least 8 bits long to be able to run real Brainfuck programs. Game of Life requires around 10kB of memory space to run. Mandelbrot compiles into around 10k instructions. That's why the memory address bus and the program address bus must be around 16 bits wide to run useful code.

Harvard or Von Neumann ?

It's an important design choice to decide whether the code and the data are stored in the same memory or not. Harvard is very straightforward and uses two separate buses. It's simpler to implement, it requires less control logic and it's faster, the CPU can fetch and execute instructions at the same time. I tried both designs for my project (bfm2 VHDL file implements Von Neumann architecture and bfm3 is my final Harvard design), and only with a Harvard architecture I was able to fit the execution of BrainFuck programs into 20 logic chips.

CISC or RISC ?

I didn't really understand the physical difference between the two before building my own CPU, but now I do. CISC CPUs use microsequencer, a counter connected to a control ROM (or microcode) that divide the execution of a single instruction into several clock cycles. A CISC architecture can implement more complex instructions (like memory to memory operations), but they're slower and require more control logic. On the other side, RISC CPUs execute as simple instructions as possible, as fast as possible, often in 1 or 2 clock cycles.

My first attempt of the BrainFuck CPU was using a control ROM with up to 16 clock cycles per instruction, but to fit into 20 chips, I had to use the same adder to increase the PC register and doing operations on A and P registers. It worked, but was too slow (16 cylces per instruction) and I wasn't confident about implementing it in hardware. Finally I came up with a RISC architecture that runs one instruction per clock cycle. It is much simpler, robust and faster than the CISC version.

Big or Little Endian ?

It really doesn't matter for the speed of operation in our case, thus I decided to store memory addresses in Big Endian format, so that the hex dumps of machine code could be read more easily.

Architecture

Here is the final architecture I came out with for my project.

architecture

Instruction format

Each instruction takes one clock cycle to execute and the program and data buses are separate, we don't need an IR register nor a microsequencer. Each opcode is the actual value of the control signals, except for the JZ instruction, where the opcode carries the destination address and all the control signals are driven into their inactive state.

Instr.	Bit 0	Bits 1 to 15
JZ	1	jump address
other	0	control signal values

Control signals

Number	Name	Role	When
0	JZ	drives all other control signals to inactive, loads the jump address into PC if A is 0	control signals applied immediately, jump address loaded on the next rising edge of CLK
1	RAM_/WR	writes the value on the databus to RAM at address P	next rising edge of CLK
2	RAM_/OE	outputs RAM value pointed by P on the databus	immediately
3	P_/CTEN	enable increase/decrease of P	next rising edge of CLK
4	P_/LOAD	load the value on the databus to P	next rising edge of CLK
5	D/U	controls if A or P are increased or decreased by /CTEN signals	immediately
6	A_/CTEN	enable increase/decrease of P	next rising edge of CLK
7	A_/LOAD	load the value on the databus to P	next rising edge of CLK
8	A_/OE	output A value to the databus	immediately
9	OUT	output peripheral control	-
10	IN	input peripheral control	-

Software

I wrote a complete software suite in Python to build and test programs for my CPU.

compiler/bfc.py translates BrainFuck code into CPU opcodes.
compiler/bfas.py translates opcodes into machine code (in VHDL std_logic_vector format).
compiler/bfas_x86.py translates opcodes into x86 assembly for testing purposes.
ucode/ucodec.py is a microcode compiler I wrote for the first CISC version of the CPU, it also helped me to write instructions for the RISC version. It translates a specific language I designed into control ROM. See bfm2.ucode and bfm3.ucode comments for more details.
examples/ contains various BrainFuck programs such as the Game of Life and Mandelbrot fractal in ASCII.

Simulation

For design decisions I had to take while building my CPU I used VHDL to test different possibilities. I also wanted to be sure my CPU would work properly once implemented in real hardware, so I simulated all the 74 circuits I used first.

vhdl/src/buff.vhd 74241 like buffer.
vhdl/src/latch.vhd 74373 like latch.
vhdl/src/dff.vhd 74232 like flip-flop.
vhdl/src/counter.vhd generic counter.
vhdl/src/counter_74191.vhd.
vhdl/src/counter_74163.vhd.
vhdl/src/counter_74590.vhd.
vhdl/src/adder.vhd 7480 full adder.
vhdl/src/ram.vhd static RAM or ROM with contents preloaded from a file.
vhdl/test/* testbenches for all those components.
vhdl/test/bfm.vhd first unsuccessful CPU.
vhdl/test/bfm2.vhd CISC Von Neumann version, works but slow and uses many chips.
vhdl/test/bfm3.vhd final RISC Harvard design.

simulation wave_big.png

All VHDL code is compiled using open source GHDL simulator. It takes about 10 minutes for GHDL to draw the first line of the Mandelbrot!

Build

My prototype is a straightforward implementation of my final RISC BrainFuck CPU design.

I planned to use 2 32kB SRAM chips, backed with a lithium battery, as program memory. I failed to build the battery switching circuit myself (it was corrupting some bytes) and didn't have time to order one, so I used a big old 27C1024 UV EPROM memory as a temporary solution.

Circuit diagram and PCB were both made using Kicad.

Clock speed

My CPU runs at 2 MHz, executing one instruction per clock-cycle. With SRAM in lieu of 27C1024 EPROM (100ns access time), I'll be able to go up to 10 MHz.

The Mandelbrot fractal takes about 2 minutes to be drawn.

IO

I used a 16550 UART chip so that my CPU can send an receive bytes of data through a serial cable. It is a pretty complex chip, which is not part of the 7400 series but also it is not part of my CPU, just a peripheral.

Chips

Name	Quantity	Function
W24257A	1	RAM: 32k x 8 bit
M27C1024	1	ROM: 64k x 16 bit
74HC163	4	PC register: 16 bit, up counter, synchronous preload
74HC241	2	control signals buffer (selects between jump and other instructions)
74HC191	4	P register: 16 bit, up/down counter
74HC191	2	A register: 8 bit, up/down counter
74HC241	1	A register to databus buffer
16C550	1	UART chip to communicate with the outside world
74HC241	1	UART status to databus buffer

Total 74XX logic chips: 14.

Total cost of components : around 10$ (the UART is the most expensive chip).

Everything is available from at least two manufacturers at Farnell, Digikey and Mouser. (Except for the 27C1024 which will be replaced by cheap SRAM in the final design.)

EPROM programming

I didn't have a programmer to flash the 27C1024 chip, so I made one myself. It's a quick and dirty Arduino shield using 4 74LS373 latches and 2 74LS244 buffers (74 series again!) to obtain 16 bits of bidirectional IO for data and 16 bits of output for address.

programmer

Source code for the programmer is inclided.

Circuit Diagram

circuit

sch.pdf

PCB

pcb

Unfortunately, I didn't have the time to finish the final PCB with CMS components.

To do

Finish the PCB.
Replace 27C1024 ROM by two 8 bit SRAM chips backed by a battery.
Add circuitry to program the program memory in place.
Try to replace the UART chip with 7400 logic chips.

Conclusion

It is possible to design a relatively powerful processor with a very small amount of logic circuitry, provided that the architecture and the instructions are carefully built for a concrete application.

74 logic circuits are still very useful today, new families with better parameters come out, old circuits are removed and new ones added.

Thanks to this project, I learned a lot about CPU design, logic circuits and logic simulation. I hope it will inspire other projects.

Files

It is 4 AM GMT as I write this article, my sources aren't as clean and documented as I'd like them to be. I will try to correct this in a few days.

bfm.tar.gz

License

All work related to this project is placed into public domain.

31st october 2012, Alexis Bezverkhyy alexisSTOPSENDINGMESHIT@grapsus.net