MCLR5 Quad-issue Superscalar RISC V initial results

Im working on a Quad-issue Superscalar RISC V processor at the moment and I thought I would share an exciting milestone….

This is a simulation snippet of four RISC V ADDI r1,r1,0x1 instructions in a row. The core fetches and executes the four instructions simultaneously and writes it back to r1, all in one clock cycle.    Neat!!!

The MCLR5 is a quad-issue superscalar RISC V processor core with single-cycle instruction timing.  There are four combinational RISC V ALU cores which process four consecutive instructions and can update up to four registers per clock cycle. The core should come fairly close to an aggregate IPC of nearly four.

 

Quadissue1

MCLR5 Quad-issue Superscalar RISC V initial results

MCL65 running Apple II+ Programs

I uploaded some videos of the system running a few applications and games. My hope was to test the MCL65 on a variety of programs that could demonstrate the instruction as well as cycle accuracy of the core.

MicroCore Labs YouTube Videos

The MC65 is an ultra-small footprint, microsequencer-based, 100% instruction-set compatible, cycle-exact NMOS 6502 core that can be implemented in any FPGA or ASIC technology which can utilize as little as 252 LUTs (0.77%) of a Xilinx Spartan-7 FPGA. It has also been ported to a Xilinx Spartan-3 device where it uses about 10% of the part.

The MCL65 is instruction set compatible with the original NMOS version of the 6502 which was the processor used in computers and game machines such as the Commodore VIC20, Apple II, Atari-2600, and the Commodore-64 as well as many others.

Key Features:

100% Compatible with NMOS 6502 instruction set
Cycle-exact with the original processor
All signals from the original DIP packaged CPU are supported such as SO, SYNC, INT_n, and NMI_n.

Bus timing is identical to the original 6502. All over-fetches, read/write sequences, and addressing mode wrapping/errors are supported.

BCD (Binary Coded Decimal) addition and subtraction are supported.

The MCL65 6502 core is an embedded processor core implemented with a high performance 32-bit microsequencer which can utilize as little as 252 Xilinx LUTs and two block RAMs in a Spartan-7 FPGA. The core is 100% compatible with the original processor and is designed to be cycle-exact which will allow it to be used in applications where firmware cycle timing is critical.

The core was tested on a Commodore VIC-20, Apple II Plus, and the Atari-2600.

Here are a few pictures I took of the system in action in the Apple II Plus.

20171007_10402020171007_10383420171007_10384920171007_11094920171007_11172020171007_111729.jpgUtilization

MCL65 running Apple II+ Programs

MCL65 works in Apple II+

Received the Apple II+ in the mail today but it did not come with any diskettes. I used a terrific tool, ADTPro, to transfer disk images from my PC over to the Apple using the cassette port. It is slow but works great! I was able to transfer over DOS 3.3 and a few games such as Castle Wolfenstein, Zaxxon, and Lode Runner.  They all appear to work fine with the MCL65 and I will take some pictures and video in a day or so.

MCL65 works in Apple II+

MCL65 Working!

The MCL65 is currently running inside of a Commodore VIC-20 computer!  I have no game cartridges at the moment, so I am just running the classic a=a+1 BASIC counting program.

I am using a Digilent Arty S7 board which has a Xilinx Spartan-7 XC7S50. The core utilizes about 0.77% of the device!

The MCL65 is designed to be cycle-exact to the original MOS 6502 microprocessor, so it should be able to run timing-dependent computers like the Apple II’s. ( I believe the disk controller requires certain instruction cycle timing). Hopefully I can get one of these machines soon to give it a try.

I also hope to test the core on an Atari-2600, and a Commodore-64.

Pictures and videos will be coming soon!

 

MCL65 Working!

World’s fastest IBM PCjr

I added 128KB of memory inside of the FPGA and  disabled the MCL86 cycle compatibility with the original 4.77Mhz 8088 processor and got some interesting results:

img_4510

If these speed test results are to be believed, then this IBM PCjr is many times faster than the original IBM PC XT and, for some tests, even faster than the 6Mhz IBM PC AT.

img_4509

img_4513

I am using DOS 2.1 and PCJRMEM.COM /C to allow these test programs to run from the upper/faster 128KB of memory.

It is interesting that Norton Utilities SI.EXE now thinks the processor is a NEC V20.  I think this may have something to do with the prefetch queue and the speed at which it fills when running programs from the upper/fast memory.

The lower 128KB physical DRAM is accessed in the normal fashion with four to six 4.77Mhz clock cycles.  The upper 128KB is located inside of the FPGA and is accessed in a number of 100Mhz clock cycles, so it is many times faster than the 4.77Mhz local bus.

The MCL86 clock cycle compatibility mode is turned off once the PCjr exits it’s POST. This means that once the microsequencer finishes processing an instruction it immediately fetches the next one. With cycle compatibility turned on, the microsequencer will pause for the same number of 4.77Mhz clock cycles that the original processor takes for that instruction.

Is this the world’s fastest IBM PCjr?  🙂

Please visit us at: www.MicroCoreLabs.com for more information.

 

World’s fastest IBM PCjr

IBM PCjr running MCL86 with “Minimum Mode” BIU.

I just finished porting the MCL86 microsequencer-based 8088 core with a minimum-mode BIU (Bus Interface Unit) to a Xilinx Kintex-7 FPGA for use in an IBM PCjr!  Here are some videos of the PCjr in action:

IBM PCjr Music running on the MCL86 microsequencer based 8088 FPGA core

IBM PCjr Minuet running on the MCL86 microsequencer based 8088 FPGA core

 

The Xilinx Kintex-7 IOs are not 5V tolerant, so I added a Lattice ispMACH 4256ZE to translate between the Kintex and the PCjr’s motherboard.

The MCL86 core combined with the minimum-mode 8088 BIU consumes 1.5% of the Kintex-70T FPGA! Four block RAMS are used to hold the microcode.

img_4500

 

It is interesting to note that when I disabled the cycle-accuracy of the MCL86 core, the PCjr would no longer pass it’s POST test. It would always beep twice and then enter a HALT state.  It appears that there is a test that depends on a particular completion time. Perhaps a timer test?

What I did to bypass this was to disable cycle-accuracy until the first NMI was received which should happen either at the end of the POST when the user makes the first key-press, or when the PCjr receives the keyboard “I am OK” information during it’s POST.

The PCjr runs noticeably faster when the cycle accuracy is disabled. The disk drive seek is faster and the warning bell when you try to use the keyboard when the CPU is busy is a higher pitch.

Here is the Norton Utility SI.EXE program running when cycle accuracy is enabled.

img_4484

 

Below is a close-up of the FPGA setup. From the left there is a DIP-clip attached to the 8259 Interrupt Controller so I can probe the 8088 databus. Next to this is the 8088 adapter that is wired to the Lattice ispMACH 4256ZE breakout board which translates between the 5V motherboard and the 3.3V Xilinx Kintex FPGA. The Lattice board is wired to a board that contains the Kintex-7 FPGA.

 

img_4496

My next project will be to integrate some of the PCjr’s memory into the FPGA which will be accessed at the processor’s core speed to see how much performance I can squeeze out of a PCjr!  The MCL86 core is a 16-bit processor and runs at 100Mhz, so theoretically should be able to access an on-FPGA RAM more than 10X faster than memory on the motherboard!

Please visit us at: www.MicroCoreLabs.com for more information.

IBM PCjr running MCL86 with “Minimum Mode” BIU.