Thanks for the memory

Wednesday 9 May

One of the things I want to explore with the system is memory management. The 8-bit machines have a 64k limit on memory address ranges, and having 32MB of SDRAM attached means we have a lot of playground to play in…

The basics:
The obvious first step is to allow access to a range of memory. This will be possible using a memory-descriptor that the 8-bit can fill in, and subsequent memory accesses within the range of that descriptor will be fulfilled by the SDRAM instead of the on-board SRAM.

Memory will be made available on a page-by-page basis (ie: any contiguous available set of pages in the 8-bit memory space will mappable from SDRAM). There are memory ranges which the external bus can't override, but in general it will be possible to source any normally-accessible RAM address from the SDRAM. The memory descriptor will have:

  • a four-byte source-address range (expressed in bytes),
  • a one-byte destination address range, expressed in pages.
  • a one-byte size, expressed in pages
This means that by modifying just one register in the source address, everything can shift by up to 256 bytes, and it takes changing a maximum of 4 registers to remap the pages to any location in SDRAM. All well and good, this is a simple linear memory mapping.

Graphics uses:

Now that we have a basic aperture-of-memory available, we can try some more things - we can add a 'stride' to the memory descriptor, which defines the length of a horizontal line in the backing memory. Consider setting up a memory-descriptor structure like

$+0000 [4] : Base address in SDRAM for start of memory aperture
$+0004 [1]: Page address in host-memory for start of memory aperture
$+0005 [1]: Number of pages to map
$+0006 [1]: 'stride', or pages per horizontal line
$+0007 [2]: width in bytes of each virtual line

Now, whenever an address is requested within the aperture of {destination+size} pages, the actual byte returned is discovered by using:

X = (address - base * 256) % width          // with the % operator being the standard 'C' modulus operator
Y = (address - base * 256) / width
byte = base * 256 + Y * stride * 256 + X

The stride and width allow us to specify a longer virtual horizontal length than the physical linear map would allow - but yet we can map this into a linear space in host RAM. Again, moving everything left or right by a byte is a matter of changing the base address, by 1 and moving everything up or down by a line is a matter of changing the base address by 'stride * 256' bytes

This is similar to how Antic can have longer lines in memory and then update display-list LMS values rather than copy data to effect a visual change.


So one other point is that there is 32 Megabytes of SDRAM available. That's an enormous amount of space, and even when you start taking chunks of it for each peripheral and giving them dedicated i/o space, you're still left with at least 31 Megabytes of space, or 496x the entire address space of a 6502. Clearly there ought to be services available for the use of this memory.

One thing we can exploit is that the memory is dissociated with the host computer's bus. Sure, the host has read/write access to a section at a time, and it's probably worth staying away from that while it's mapped into host space, but the rest doesn't have to be static.

Consider allocating a 2k memory-mapping into host space that extends for 512k of SDRAM, and just update the memory-descriptor to "scroll" through that mapping by bumping the memory descriptor by 512 bytes every VBLANK - that's just a few register changes.

Why do I choose those values ? Well,
this thread at AtariAge uses some clever coding to produce simply stunning 8-bit pulse-code-modified audio from an Atari XL at 44.1kHz. The audio itself will take a lot of storage, obviously, and at 44.1kHz and 50Hz Vblank, 882 bytes are consumed every 1/50th of a second (it's similar for 60Hz NTSC, 745/second there). That means it ought to be possible to store the PCM audio in SDRAM, and pretend to have a linear buffer in host-ram that extends for the size of the audio in SDRAM, with judicious pointer/memory-descriptor manipulation on every VBLANK.

All well and good.

Now let's say, when we get to the point 256k into the 512k section, we ask for two things to happen

1. Memory Blit: we copy the last 2k of the 512k to the start of the 512k section. This gives us a buffer that means we can switch pointers and nothing has appeared to happen, but we're now in the first half of the memory buffer, not at the tail end.

2. Memory Fill: we request that the SD card copy from file /path/to/somefile.ext on the SD card to {memory-descriptor base + 2k}, with a length of {254k}, and an offset of {x}, where x increases by 254k every time we call the copy op.

The blit allows us to switch pointers at any point in the last 2k, to point to the same offset in that 2k, but now at the start of the buffer, making for a seamless transition, the read (which will happen in the background, the host computer being blissfully unaware apart from a status byte) will mean the next section of the song is now available once it finishes.


I'm envisioning the 'blit' operation as taking two memory descriptors and copying the contents of the first into the second. In the basic form, it's just a memory copy, but blits could also have operations associated, allowing masks and effectively getting software sprites. Given the memory-freedom, handling horizontal shifts of less than 1 pixel could be done by having multiple pre-shifted sprites and large enough backing defined in the memory-descriptors. In this case the memory descriptor starts to look like:

Keep ping-ponging between the two halves of the memory-descriptor, reading into one half while we play audio from the other, and we can play literally gigabytes of PCM audio on an 8-bit computer.

$+0000 [4] : Base address in SDRAM for start of memory aperture
$+0004 [1]: Page address in host-memory for start of memory aperture
$+0005 [1]: Number of pages to map
$+0006 [1]: 'stride', or pages per horizontal line
$+0007 [2]: X position in bytes within the defined descriptor range
$+0009 [2]: Y position in lines within the defined descriptor range
$+000B [2]: W position in bytes within the defined descriptor range
$+000D [2]: H position in lines within the defined descriptor range

... which fits nicely into 16 bytes with a byte spare for future expansion. Dedicate a page to descriptors, and you have 15 distinct blits that can be called using the software TRAP-style call {routine, descriptor-1-id, descriptor-2-id, operation} from the host to the XMOS chip. Obviously one of those descriptors will be the screen memory if you're blitting to screen-RAM, thus reducing the total down to 15, however there's no reason those can't be re-used, of course...

I'm sure there'll be more services (JPEG decode, MP3 decode, … ?) given that we have a CPU attached to the SDRAM, and can exploit that independently of the 8-bit host, but these are the fundamental ones that come to mind that will allow 8-bits to explore a lot more memory than they would otherwise be restricted to.