[LMH]FPGA / microcode

Eric Blossom eb@comsec.com
Fri Mar 5 10:21:01 2004


On Fri, Mar 05, 2004 at 12:04:12AM -0800, Jaap Weel wrote:
> On 4 Mar 2004, at 18:57, Dan Moniz wrote:
> >On Thursday, March 04, 2004 7:01 PM +0000 Robert Swindells 
> ><rjs@fdy2.demon.co.uk> wrote:
> >>I was only half joking about translating it to VHDL.
> >>[...]
> >>The whole CPU would fit into a $15 FPGA using current technology, add
> >>a couple of 16bit SDRAMs and some boot flash and you have a 10x faster
> >>microExplorer.
> >
> >[...]
> >Of course, there's the longer term problem of getting a monitor and 
> >input devices to talk to it, which would be painful, and disks, file 
> >systems, a running install, oh my!
> >Probably, now that I think about it, an FPGA-based Explorer processor 
> >on a PCI card, with some interface glue would be the way to go.


I've considered the FPGA idea myself a few times.  It would be fun ;-)

One thing to remember, is that by today's standards, the Explorer's
have dinky address spaces.  24MW if I recall correctly.  I think that
a blind re-implementation of the architecture would be a waste of
time.  Why spend all the time and end up with something with a dinky
address space?

Another thing to consider is the basic "back of the envelope
calculation".   For the sake of argument, assume that you're using a
Spartan 2E or Spartan 3 (The spartan 3 has bunches of embedded
multipliers.  You could really speed up your bignum multiplies with
them ;-)).  Using the lower speed grades (much cheaper!) you can
probably run them at about 125 MHz (hands waving wildly...).  This
gives you a 125 MHz microinstruction cycle.  Of course, you're free to
build your memory subsystem as wide as you like, so make it 40-bits or
48-bits or 64-bits...

Now, on the other hand, you're competing against 3 GHz Pentiums or
AMD64s or Opterons running at 2.6GHz or so.  All of these have an
incredible amount of internal parallelism.  You can easily get 4 or 5
operations going per cycle (this includes address generator
updates, ALU ops, floating point adds, mults, loads, etc).  Without
working hard, this gives you an easy 6M ops/second.  It's going
to be really hard to get your 125 MHz FPGA implementation to even
begin to keep up with these monsters.

Perhaps the way to go would be do define
yet-another-lisp-virtual-machine, or figure out what symbolics used (or
just use Common Lisp with lots of declarations!) and target the AMD64
architecture.  It's got twice as many registers as the x86, lots of
address space, they keep getting faster and cheaper all the time, and
you can order one today, and have it delivered tomorrow.

If people really like the idea of running near the metal, consider
using something like the L4 microkernel or EROS as the bottom layer.

      http://os.inf.tu-dresden.de/L4/
      http://www.eros-os.org/

EROS implements a persistent single-level store, which would be ideal
for lisp.  This means everything is addressed the same.  All of memory
is persistent.  There is no "file system", just a directory of
objects.  EROS is also capability based, which is also very cool, but
not directly relevant to this discussion.

> I don't know if this is applicable, but hey, you never know. When they 
> made OpenGenera (you know, the virtual Symbolics-on-Alpha), they didn't 
> give it any intrinsic knowledge about DEC's hardware. All it knew how 
> to do was to talk on the network. It would then find its filesystem 
> through NSF, display its interface through X11, and in various other 
> ways leverage the fact that there was always a copy of OSF/1 to take 
> care of bit-diddling IO. It's not very purist, but you could do a lot 
> with a network interface, while a $200 Walmart FORTRAN Machine can take 
> care of I/O.

This seems like a reasonable approach to me.  Otherwise you're always
going to be "behind the power curve" when it comes to supporting new
peripheral hardware.  Let's just use it!

Note that this is how the microExplorers work too.  NFS and RPC to
access the file system and display/mouse/keyboard respectively.

The other approach would be to take a decent compiled CL
implementation (e.g., SBCL) and start moving it closer to the metal.
Let's not forget that compilers are a lot smarter now than they were
15 years ago.  I personally think that experimenting with a machine
with persistent memory would be fun.

FYI, there's a port of SBCL to the AMD64 underway.

Eric