The Atari XE host interface
As mentioned in the overview, the host interface revolves around an XMOS chip that monitors the parallel bus, and decides what to do when the bus address/data lines change. The PCB looks like:
The parallel bus actually has some pretty tight timing requirements, mainly because of how late the 8-bit host puts the address onto the lines. With that in mind, I decided the best option was to integrate some local memory onto the host interface, and allow cards to "upload" their device-drivers to this memory.
It seemed to me, though, that having a PBI interface to memory was an opportunity not to be missed - as long as the memory timings can be maintained, any source of memory can be bolted on, and given the signals available, memory paging can be very flexible. I chose to add a 32MByte SDRAM chip as the memory interface - with peripheral cards having the ability to (without any intervention from the 8-bit) talk to 64k of "their own" RAM on-board, and a register-based memory mapping facility in $D1xx that the running application can map arbitrary address range sizes to the external SDRAM via the EXTSEL/MPD lines.
I also thought it would be kind of cool if there was a "backup" facility for the memory, which could then be exploited by applications that knew of it, so there's also an SD-card slot on the host interface, providing for fast (>3 MBytes/sec) transfer from SD-card to the SDRAM. Writing is slower, but still manages around 1.5 MBytes/sec. The XL is limited to a lot less than that over its bus, of course, but since this i/o is all asynchronous to the CPU, and the data can be mapped into the host-machines memory map simply, lots of new use-cases become possible.
Given that we have a fast, interactive CPU constantly monitoring the 8-bit bus as a dedicated task, it opens up a lot of possibility for how memory is managed within the combined system…
Memory - device to host
From the perspective of the devices, the SDRAM appears to be a 64k buffer area (address 0..$FFFF) into which it can transfer data at any point to any address. Typically it might be used as a circular buffer with head/tail pointers but the flexibility is there to do anything the firmware wants.
Obviously this won't work from the perspective of the 8-bit, so there are registers in $D1xx for each device, where the device identifier 'n' ranges from 0 to 7:
- DEVRAM_MAP0[n] contains 1 byte, which is the page-number for a segment of memory that will be mapped into the 8-bits address space using /EXTSEL (eg: $40 would map from $4000).
- DEVRAM_SIZE0[n] contains 7 bits (the top bit disables mapping), which is the extent of the mapping, again in pages, for a maximum size of 127 pages or just under 32 KBytes. Each page needs to be located within the RAM area of the machine, but has no other restrictions. The default value for this register is $FF which, as mentioned above, turns off mapping.
- DEVRAM_MAP1[n] and DEVRAM_SIZE1[n] are identical in every way to their DEVRAM_MAP0 counterparts.
Mapping will only work when the device is selected by the OS, so different devices can reserve the same space within the 8-bit memory range without fear of affecting each other. Since this memory is local to the 8-bit bus, it behaves in every way just like normal RAM in the computer.
Typically, it might be used as a buffer area, whereby a device receives some data asynchronously, pushes that data up into its mapped space (which the 8-bit is blissfully unaware of at this point), signals an IRQ, and the 8-bit's IRQ handler will automatically enable the device, and pull data from the mapped space into the 8-bit to deal with as is appropriate.
Additional to the SDRAM, and independent of device selection, each device has 16 registers in the $D2xx space, which can be updated from either device or 8-bit host, and read at any time (without selecting the device) by the host.
Memory - host-based
From the perspective of the host itself, there are a set of 8 SDRAM_MAP 24-bit registers, which specify the page within SDRAM starting from 1MB (everything below 1MB is reserved for i/o, which might seem over the top, but might end up being useful with later firmware and we have the space). Given the enormous space available ($1EFFF pages), this requires a 3-byte page register for the base-address. This seems unwieldy, so the scheme I came up with is to use a 2-byte register for the page base address for the currently-running application, giving a range of 16MB of RAM, which … would seem to be sufficient…
The remaining 15MB of RAM is available to an OS or "system"-like process (maybe a ramdisk, BIOS use, or something). There's nothing actively *preventing* people from using this area (you just set the highest byte in the 3-byte register to 1) but as a matter of protocol, sticking to the lower 16MB is suggested.
Each of the 8 SDRAM_MAP registers has a corresponding 1-byte SDRAM_SIZE[n] register, which again specifies the page-count of the corresponding mapped area. This gives a maximum of 8 mapped segments each of 64k, although obviously only one (the first one that matches the range specified) will really be mapped. This implies a priority order for the segments, which may be useful. If your page count extends into ROM area, it will not be mapped.
These 8 mapped areas do not require any external device to be mapped, it's purely internal memory for the attached 8-bit - the act of setting the registers will mean the memory is mapped on the next clock.
Memory - backup
The attached SD card interface can be used at any time (via a register interface in $D3xx) to read from, or write to, any page in the SDRAM. Basically just set the starting address (4 bytes) into SDC_BASE, the SDRAM base address to read or write from/to into the (four-byte) SDC_LOW, the operation - whether read or write - into (SDC_OP), and begin the operation with a write to SDC_EXEC.
The operation will first put a '1' into SDC_BUSY, then begin to asynchronously transfer data from the SD card to the SDRAM memory. Once the operation is complete, SDC_BUSY will read as 0. It is up to the host computer to busy-wait or just poll periodically to see if the transfer is complete. Once the transfer has started, changing the registers will have no effect until the transfer is complete (a copy is taken for internal use).
Boot procedure
At power-on, the host interface will probe for an attached enclosure, and if found will send a series of queries to the enclosure over its xlink. This will allow the establishment of the state:
- Which device-slots have cards in them, and hence the assigned ids for each available device.
- Uploading of the 6502 device-driver code from the devices to local SDRAM, including any interrupt handlers for example.
- Establishment of the memory-mapping requirements for the devices
The host interface will then present that information to the 8-bit via the appropriate registers, and the OS will go through its standard routine for booting each device. This means (at a minimum) the device will need to supply an INIT vector routine.
Since the xlink is so fast (100 Mbit/sec), I am assuming that the XMOS cpu's can have this information to hand once the 8-bit gets around to querying it. I may have to revise this, if not.
Another option (for the case of the 1088XEL when in the mini-ITX enclosure), would be that the master power control could be set from the device-side XMOS chip, only turning on the computer motherboard when the expansion system is ready to run.
Further thinking
The host-side XMOS chip Is mainly concerned with monitoring and transferring data, but the design of the chip allows us to do a lot more. There are up to 16 concurrent processes available on the chip, all of which have guaranteed latency timings and performance. That makes for some interesting possibilities.
Ultimately I'd like to open "slots" on the XMOS chip, and provide a standard API for the 8-bit host to call into these slots and take advantage of the program there. For now, it'll just be whatever is in the current firmware. Ideas might include:
- Putting display memory into an area mapped into SDRAM and having a 'blitter' operation made available with the host-computer just setting source and destination x,y,w,h and the XMOS chip doing the rest. At 100 MHz per core, the XMOS chip ought to be able to do a fair number of full-screen blits to local SDRAM in a VBI call, freeing up the host 8-bit to do other things.'
- Implementing a floating point co-processor in 'C' on the XMOS chip and letting the host 8-bit just call the routine to do much faster floating point arithmetic.
Et cetera, et cetera, et cetera…