Team Size -- 3
Individual Role -- Pipeline Designer/Developer
Software & Languages -- System Verilog
Ever since the transistor was invented, the power of digital computers has grown exponentially, as dictated by Moore's Law. But beyond just cramming more and more switches onto smaller surface areas, what other techniques are done to make a CPU go faster? That is, how is the CPU designed on a layout and wiring level in order for it to execute and complete the most instructions in the fewest cycles? One key aspect is known as Pipelining, where multiple instructions are processed at the same time, each during a different phase of execution. Which is why it is a capstone project for computer engineers such as myself.
To understand how a pipelined CPU gets designed, I'll first attempt to explain how a simple CPU functions. A CPU is a fancy Finite State Machine, or FSM, where there are various states and transitions between those states. While the CPU is in a certain state, various logic gates and operations are executed before the CPU is allowed to transition from one state to another. Most of these states are short and simple, like fetching the next instruction or performing arithmetic. Others are extremely complicated, like figuring out what the instruction wants to do. The goal is to transition between states within the FSM as fast as possible, ideally once per clock cycle.
However, a new technique came about, one that would allow CPUs to work on multiple instructions at the same time. That technique is known as pipelining, and it can speed up the CPU by a significant margin if done correctly. The pipeline is divided into several "stages," where the CPU processes instructions based on the stage and what is required. Our team designed a 5-stage pipeline with Fetch, Get Ops, Execution, Memory, and Write Back stages. In between each of these stages is a series of registers (data holders) which pass & receive relevant information between stages. These registers allow for multiple instructions to exist within the pipeline at the same time, resulting in multiple instructions being processed simultaneously. Below is a wire diagram overview of our team's initial pipelined CPU design split into 3 images.
The team was also challenged to make the CPU faster to hit a certain clock cycle benchmark. I was chosen to create a 2nd layer of cache memory to make reading and writing take less time. Caches are small spaces of data that hold data taken from normal memory (RAM). Where interfacing with RAM can take dozens if not hundreds of clock cycles, interfacing with caches is designed to take single digit cycles, making them significantly faster at the expense of capacity. The CPU was originally designed with a single layer of cache, but adding a second, larger layer of cache memory would allow the benchmark program to spend less time querying RAM due to the required information already being stored in the second cache layer. Implementing this second cache is similar to how the first cache was added to the CPU, so it was as simple as taking the existing cache design, expanding it, and slotting it into the space between the level 1 cache and the RAM.
Pipelined CPU Design Parts 1-3