Yet Another Z180 (YAZ180 v2)

Testing on the YAZ180 v1 , shown below, is now complete. I don’t want to use it for further driver and platform development, because the PLCC socket for the 256kB Flash is becoming worn-out.

It will continue to operate as an augmented Nascom Basic machine, with an integrated Intel HEX loader (HexLoadr) supporting direct loaded assembler or C applications.

img_0626

YAZ180 v1 at full configuration.

The new PCB for the YAZ180 v2 has been ordered.

These are some screenshots of the new PCB.

 

Update

Pi Day, March 14 2017.

After dwelling on the fact that the V2 PCB was really just a clean up the V1 PCB, with no additional features, I decided not to build the beautiful new PCBs that arrived today.

But rather, to create a new PCB with additional features.

 

New Features

When I originally designed the YAZ180 the breakout for the 82C55 was simply an interim design, to enable me to test the board. I was thinking of making an Arduino style pin-out, or something along those lines. But this is something much better.

Recently, after reading Paul’s page on interfacing an IDE drive to an 8051 microprocessor with the 82C55, I decided that adding IDE to the YAZ180 was a must-have feature.

So there is a new connector on the YAZ180 to break out the 82C55 pins, in IDE 44-pin 2mm format. I have not followed the design provided by Paul exactly. I’d note that his design and the earlier design by Peter Fraasse were specialist designs, which don’t support the generalised usage of the 82C55 chip, beyond the IDE functionality.

By the above statement I mean that in Mode 1 and Mode 2 for Port A and Port B, the PC2, PC4, and PC6 pins of the 82C55 device are designated registered strobe input pins /STB in input mode, or peripheral acknowledge /ACK in output mode. If an inverting output buffer is connected on these lines, then the registered input and output mode capability is lost. This would restrict the functionality of the 82C55 to simply Mode 0, being the mode that is used to create the IDE functionality.

As I’ve connected the three IDE address selection pins to PC2, PC4, and PC6, and these pins are not passed through an inverting buffer in the design, it is possible to use the 82C55 in any of its modes, and therefore to use the IDE 2.5″ 44-pin form factor to connect the YAZ180 82C55 ports to extension PCBs of any type or design.

As a connected IDE drive or other extension board may need to interrupt the CPU, I have connected the IDE INTRQ pin to the remaining inverting buffer to provide an input to the CPU on /INT0. As the /INT0 (or actually the INTRQ) input terminates on the IDE header, either a IDE drive through INTRQ, or either of the two 82C55 INTR pins, PC3 or PC0, can originate the interrupt.

I have reconfigured the Am9511A-1 to use the /NMI interrupt, as previously the /INT0 was configured.

The new YAZ180 v2 PCB has been ordered. YAZ180_v2_Schematic.

Happy Pi Day.

Update – RetroChallenge Day 1

I’ve decided to enter the RetroChallenge 2017/04 and my challenge is to read and write to an IDE drive using the newly configured IDE interface on the YAZ180v2. But before I can write the code for the IDE interface, there’s a bit of building and testing that needs to be done.

The new PCBs arrived a few days ago, and they look great. But Arduino Day and the first day of the RetroChallenge 2017/04, 1st April, seemed like a good day to lay them out.

P1080754

New PCBs. 2oz Copper, 2mm thick. Opulent.

I was hoping to lay build several boards at once, but somehow I forgot that there was only one RAM and one FT245 device in my component stocks. That means that I had to satisfy myself with just one board for now.

Note the suitably Retro PowerMac (circa 2001) driving the layout guide screen.

P1080755

Adhoc Workspace

This is the board just before cooking. Respect to anyone who notices the substantial noob layout mistake. Anyway, after a small smokey explosion, everything was rectified.

P1080758

Two YAZ180 versions, side by side.

This is the finished build of the YAZ180 v2. Looks very tasty. Retro goodness.

P1080761.JPG

Fully populated YAZ180 v2 PCB.

I’m still working on fixing an issue with my code, which I noticed when experimenting with the Am9511A APU, and inserting an Interrupt Jump Table. Basically, I’m getting jumps to odd or  random locations, which is detected buy filling unused locations with 0x76, the HALT, OpCode. The most common address where the HALT is executed at is 0x00C3.

Previously, I’d been filling unused locations with 0xFF, the RST 38H OpCode shared with the INT0 location 0x0038, which was causing the APU to be triggered inappropriately. This issue has me snookered. I can’t move on, in the software sense, until it is resolved .

 

This slideshow requires JavaScript.

Update – RetroChallenge Day 8

Well this week was one of the most frustrating weeks ever, in terms of time spent vs. results obtained.

There are two major projects in hand. 1. Getting the YAZ180 v2 running, and 2. resolving the software issue plaguing my initialisation code.

Hardware issues

Bringing up a new piece of hardware is never easy. Initially, nothing can be trusted to work, and everything needs to be checked against the design, and then even the design checked for correctness. Bringing up the YAZ180 v1 was very time consuming, because I had to develop the PLD design during the process, as well as checking that all the hardware was sorking as it should. I thought that bringing up the YAZ180 v2 would be easy. Just solder it together and win. But it has not been so simple.

Essentially, after a week of working on this every evening, I don’t know why it is not working correctly. All the standard things, volts, clock, stuck address and data lines, etc are all working correctly. But it still doesn’t work. And, it may not be just one thing that is wrong, but if anything is not perfect it just won’t work.

After a few days of testing, I found that I’d programmed the PLD devices with an old version of the CUPL code. Nearly right, but not exactly right. Once I’d isolated that issue, by ensuring the new GAL devices worked perfectly in the V1 board, I thought it would be enough. But no. There’s still something wrong.

My current thought is that somehow, either electro-static damage or heat damage, the RAM is unreliable. But, I’m not sure enough of this to unsolder the RAM device and replace it. I’ll be spending this weekend on resolving this problem.

Software issues

Because of the effort I’ve been putting into resolving the hardware issues, I’ve not been able to solve the software issue apparent in the YAZ180 initialisation and serial code. I’ve documented the issue on Github.

My lesson learned is NOT to fill unused memory with 0xFF bytes. This causes RST 38H jumps to the INT0 location when the PC is incorrectly loaded, and can be very distracting. Best to fill unused memory with either 0x76 HALT bytes, to see where things became broken, or with 0xC9 RET bytes to just float over the underlying issue.

I’ll need to fix this properly, but it has consumed several weeks of effort, and I’m not much closer to resolution.

Update – RetroChallenge Day 10

The weekend was unkind, but today some new eyes (literally) have brought successes.

Hardware Issues

After doing quite a bit of further testing, I’m fairly sure that I’ve damaged the RAM and will need to replace it. So, I’ve ordered a hot-air solder gun. Should have had one for a long time. Finally, I’ve got a round-‘tuit. I’ll have to order some replacement components too, which will result in being able to make additional boards as well.

Software Issues

Finally, I’ve resolved my issue. What we had here was a classic “failure to understand”. Somewhat embarrassed to leave this here for Internet eternity.

  • Z80 vectors are supported by a JUMP table.
  • Z180 vectors are supported by an ADDRESS table.

Insert JP instructions into an address table and you will have a very very bad day.

Or in my case, quite a few of them.

This issue cost so much time. But at least on the up side, I’ve written robust Z80 and Z180 vector tables, improved my ASCI code, and cleaned up initialisation code, in trying to track this down.

Also finally, I now understand. Which is the entire point, anyway.

Update – RetroChallenge Day 17

Following up on the success of last weekend, I was hoping to have a lot of achievement to write about today. Unfortunately, it has been a grind this week too.

I have been distracted back into the original project that unearthed my previous software problem, and led me along the path to getting a much better understanding of the Z180 CPU, and then solving the issue. The original project was building an interrupt driven driver for the Am9511A-1 Arithmetic Processing Unit.

I’ve spent pretty much the past week on this code, and digging through it with a fine tooth comb. I’m now of a belief that my Am9511A driver code is correct, but my hardware is not correct and may never be correct.

The issue lies with the requirement for the Am9511A to have the Address lines and Chip Select signal remain valid for 30ns following raising of the Write signal. Unfortunately, the Z180 only maintains valid address lines for 5ns following Write. This means that writing to the AM9511A APU is very much a hit and miss affair, with miss being the most likely outcome. I’m still thinking about ways to bodge this to work. But, I think that it may just be too hard to get the old APU to work with a modern CPU. More on this later.

This week I’ll be working on the PaulMon IDE code, and migrating it from 8051 to Z80 nomenclature, and trying to get it to compile.

Update – RetroChallenge Day 21

Well the last couple of days have been exciting, as I found a way to make the Am9511A APU work. A hint from a fellow competitor (on working with the MC6809 CPU) inspired me to look further for information on options to fix the hardware interface.

The Z180 E CLOCK

The Z180 has an almost undocumented feature, called the E Clock. Yes, it is documented in datasheet that it exists, but there’s no real background that I can find as to why it exists, except that is for a Secondary Bus Interface. This pin and signal doesn’t exist on the Z80, for example. Anyway, since it has the same name as a signal on the MC6809, I thought it might be worth looking at it. It turns out that the E Clock provides a shortened version of the WR and RD cycles. Which is exactly what we need.

One caveat however, when running at doubled PHI rates (i.e. 1:1 PHI – CLK) the shortening of the E Clock signal is not sufficient to drive the APU successfully. At 18.432MHz, the PHI/2 timing is 27ns. Therefore, the minimum of 30ns between release of WR and CS is not always held. This means that we’ll need to keep the PHI at half CLK whilst using writing data into the APU. In practice this means that using the APU requires we cut the CPU clock by 50% or PHI/2 being 57ns, to ensure the trailing 30ns is provided.

Anyway. Good news. With the revised timing, the Am9511A-1 is working.

Am9511A FDIV

Am9511A APU Floating Divide in 115us

The E Clock is not an inverted signal, so to generate the active low APU_WR signal we have to first invert it, then OR it with the WR signal. For the purposes of testing, I’ve got a little breadboard with a GAL on the side, but later I’ll build a new PCB and add in a SN74LVC1G97 little logic device to provide the APU_WR signal.

Am9511A FDIV PUPI

Am9511A APU FDIV PUPI command interval 128us

Am9511A FDIV PHI6 Cycles

Am9511A APU FDIV in 179 Phi/6 Clock cycles

So now we see the Am9511A APU FDIV floating point divide takes about 101us to 115us when running at 1.536MHz, or from the datasheet 154 to 184 clock cycles. In 101us, the Z180 CPU at 36.864MHz produces 3,723 cycles. To produce a floating point divide using the Lawrence Livermore Library requires about 13,080 cycles, according to the AM9511A Floating Point Processor Manual by Steven Cheng. Therefore, we are still substantially faster than antique software on a modern Z180!

Update – RetroChallenge Wrap Up

Well the month of RC2017/04 didn’t go quite as planned. My original intention was to have the YAZ180v2 working very quickly, then get straight down to porting Paul’s IDE code from 8051 to Z180 to get the new IDE interface working. But, there were several speed bumps along the way.

Gaining an education

Since I just started on this whole Z80 processor and assembly language programming thing a few months ago, I don’t have a long history of coding to fall back on. I had written some code for the Z80 in the RC2014 hardware, which I then tried to use on the Z180 in the YAZ180. But, there is a subtle difference in “generation” between the way the interrupt vectors work across the two machines. Obvious, once you know about it but a real “time killer” if you don’t.

Firstly, filling unused space in your assembly program with 0xFF is a very dangerous thing to do in Z80 assembler, particularly if you don’t understand that 0xFF is the op code for RST38, which is a single byte jump to the same location as the Interrupt 0 in IM1 mode. It would make more sense to fill the unused space with 0x76, which is the HALT instruction, to trap an unexpected program counter value.

Look before you leap

Secondly, the interrupt vectors on the Z80 were designed to contain code, and the PC is just loaded with the address of the vector, and execution begins from there. So for an INT0 (or RST38) execution begins from 0x0038. But, the interrupt vectors on the Z180 are designed to hold an address. The difference being that an interrupt will load the PC with the contents of the two bytes at the vector, and then begin execution from there. I think this difference is a sign of the generational difference between the two implementations. One of the clearest differences I can find, anyway.

Timing is everything

One of the goals for the YAZ180 is to bring some old chips back to life, in a modern platform. Along with the TIL311, GAL16v8, and 82C55 devices, the Am9511A holds pride of place as the very first arithmetic processing unit ever made. I’ve invested far too much time in getting the Am9511A to work, but it is important to me that my project can make it work.

I believed that I had devices that were specified to run at 3MHz but which in fact didn’t. That may be incorrect. More likely was that I wasn’t driving them properly, because my timing was out. I will need to go back and test them all again.

Here the issue is that the Am9511A requires extended validity of data and chip select signal, following the validity of the write signal. At least 30ns is required. This is not provided by the Z180 in its normal timing, although in the configuration I have it, coincidentally because I’d buffered the data bus, it is nearly right. Only the chip select line was being incorrectly handled.

I was nearly giving up but then a tweet from a fellow RC2017/04 competitor gave me the inspiration to look further. It turns out that the Z180 has a secondary I/O timing signal called the E Clock. This signal is not present on the Z80, and as I didn’t understand its purpose I’d left it unconnected in the YAZ180.

Whilst the Zilog datasheets on the Z180 completely gloss over the purpose of the E Clock signal, by simply not mentioning it, the original Hitachi 64180 datasheets do mention it. The original purpose of the E Clock signal was to provide timing for “a large selection of 6800 type peripheral devices” including the “Hitachi 6300 CMOS series (6221 PIA, 6350 ACIA, etc) as well as the 6500 family devices”.

In summary, the E Clock provides a signal that is half a T cycle shorter than the write signal. It means that gating the write signal with the E Clock would allow me to release the APU write signal sufficiently early to maintain the extended chip select timing required. Basically, the APU won’t operate with a Z180 T Cycle any less than 60ns, or 16.6MHz. So in my implementation, the PHI clock will need to run at half speed or 9.234MHz, whilst the Z180 is using the Am9511A. Unless I cook up another plan.

Zapped

And the final note from this month is that I believe my very poor ESD protocol has led to the destruction of the SRAM on my YAZ180v2. Therefore I had to desolder it (and it looked so nicely done) to remove it, and order some new components.

Ordering new components is always a bit of a hurdle for me. I’ve collected quite a few things that I don’t use, so I tend to ration myself on purchases vs. progress. Finally, at the end of the month I ordered more components to build further YAZ180 boards, and some spare SRAM to enable me to repair the one I have made already.

It continues to amaze me just how much difference there is in the cost to build an Arduino AVR board (basically just a chip at the most essential level), vs something like the YAZ180. The YAZ180v2 bill of materials, excluding specials like the GAL16V8D, TIL311, and Am9511A devices that I have to find of eBay, comes to over $150 Australian!!! We need to export more coal, to get the AUD dollar back up there!

And that’s it for my RC2017/04 month. Soon as the parts arrive I’ll be completing the YAZ180v2, and then testing the IDE interface. I hope that will be done before the end of 2017/05.

Update – Post RetroChallenge

Well good news. The only issue was a bad solder joint on the new SRAM chip. Now the YAZ180v2 is running, and I can get onto translating the IDE code from 8051 to Z80.

IMG_0698

YAZ180 with IDE drive attached.

I’ve sourced code from both PJRC in 8051 mnemonics for an 8255 PIO and from Retroleum in Z80 mnemonics for an 8 bit interface. Between the two of them, together with the examples from the OSDev Wiki, it should be easy to make a fairly robust implementation. And, on May 18th, the driver code was finally working.

Next activity is to integrate this into the z88dk, and then using the FAT-FS code from Elm ChaN, get the disks properly working.

Update – August 2017

Over the past few months progress has been made on various fronts with the YAZ180v2. Firstly, the IDE interface is fully working, and has been integrated into z88dk. Also, the issues with the Am9511A-1 APU have been resolved, and a working driver has been integrated into z88dk. While I still have to revise the C interface for these two pieces of code, because I’m still learning this, the development work is now done.

I am particuarly happy about getting the Am9511A APU working, as this was causing me the most technical difficulty, and stretched my understanding the most. I’m also happy that the capability in the Am9511A seems to be realised through performance improvements in arithmetic computation.

Over the coming months, I hope to resolve the remaining untested components in the YAZ180. These include the parallel programming interface, to allow the YAZ180 to be “cold loaded”, to protect against bricking the system, if a user doesn’t have a EEPROM or Flash writer. I’m still in two minds abou this feature, as the cost of the FT245 device and USB socket is about the same as a stand alone EEPROM writer, and the parallel programming interface consumes space that could be otherwise used for an SPI or USB interface, for example.

I also need to test the I2C interfaces, and debug the driver that I wrote back in May (still unused) to complete the feature set.

With this done, I’ve now done a new minor revision of the hardware, to clean up the issues that have been noted over the past months.

Open Issues

  • APU – Gate E clock with WR to produce shortened APU WR
  • NMI – remove this from the APU, and terminate high. Not CP/M compatible.
  • INTO – reconnect it to the APU.
  • 5V – Power inductor spacing.

Update – October 2017

Well, I fixed the issues and then convinced myself that there needed to be Yet Another feature added to the YAZ180, before I signed it off. So now, the v2.1 board has access to the Internet through an ESP-01S AT interface.

 

This slideshow requires JavaScript.

I sell on Tindie

Building up this new board will be November and December’s activity, together with building a complete YABIOS (Yet Another BIOS) to make use of all of the features packed into the YAZ180 board. I’ll be picking the best bits, IMHO, from the Cambridge Z88, ZX Spectrum, and CP/M 2.2 – ZSystem to build a banking capable YABIOS, supporting 1MByte of address space.

Update – August 2018

I seriously need to write another blog on the YAZ180. But I guess the Github commits just speak for themselves. There are only three things left on the to-do list. 1. finish the firmware loading program. 2. rewrite the I2C interface. And 3. implement FreeRTOS as a hypervisor allowing multiple 60kB applications to run simultaneously.

Here’s a picture of an application, which was one of the early drivers for building the YAZ180, a mass-storage platform for my HP48 calculator.

IMG_1572

YAZ180 communicating with HP48 using Kermit on ASCI1 (TTY).

Characterising Am9511A-1 APU

I’ve built two hardware solutions supporting the AM9511A APU. Partly for historical enjoyment, and partly because the AM9511A is actually still faster at floating point and 32-bit long arithmetic than “modern” Z80 devices.

I’ve built a Z180 based board, the YAZ180, which runs the AM9511A at 2.304MHz. And I’ve built an APU Module for the RC2014 platform, which runs the AM9511A at 2.4576MHz. Both of these solutions are supported by the z88dk am9511 maths library and platform drivers for Z80, Z180, and 8085 CPUs.

To compare the performance, for example, from the Am9511/Am9512 Floating Point Processor Manual we have some comparison tables. On average the Am9511A APU (at 1.966MHz) produces a hardware floating point divide in 165.9 cycles (of a 2MHz 8080 processor). Converted to my YAZ180 Am9511A implementation (at 2.304MHz), we have the equivalent in 141.5 cycles of the 2MHz 8080. Converted to best case modern Z180 terms (overclocked to 36.864MHz) this is 2,609 CPU cycles to return a hardware floating point divide.

To produce an equivalent software floating point divide, using the equivalent vintage LLL floating point library, requires 13,080 cycles.

This means that floating point on the 40 year old AM9511A-1 APU is still 5.0 times faster than an overclocked Z180 running antique 8080 code. Sweet!

Testing

I’m integrating an Am9511A-1 APU device into my YAZ180 build. The basic device is capable of operating at 3MHz. But, I’ve found that driving one sample at 3.072MHz doesn’t work. But, it works fine at 1.536MHz, and at 2.304MHz.

This one example has 83.33ns delay between RD or WR and PAUSE signal being operated. This means that it should comfortably operate with the minimum of one wait state when the Z8S180 is running with a 18.432MHz bus.

And, I’ve got to say that these devices run hot… OHS issue hot. There is a reason they are provided in a ceramic package. They sink 70mA at 12V plus another 70mA at 5V, and all that energy has to go somewhere.

Originally, I had created the timing for the Am9511A-1 is generated by dividing the Z8S180 18.432MHz system clock by 6. The divide FCPU by 6 for FAPU CLK 3.072MHz is done with a SN74LS92N device. And for test purposes, the Z8S180 is also half rate clocked at 9.216 MHz, producing 1.576MHz for the APU.

img_0620

YAZ180 Test Rig – Am9511A-1

Initial Testing

The test is initially pretty simple, really just a proof of life. Will they properly push and pop data at 3.072MHz?

If not, then I’ll need to redesign the YAZ180 to operate the Am9511A-1 at a lower APU clock. But, if my initial samples are just not the up to specification, then they can be secondary devices.

Ok, now what kind of devices have I got to hand, and lets see what the results are…

Front Serial ID Country & Rear ID 3.072MHz 1.576MHz
239KH2Z Malaysia 9237FP Fail Pass
301MBZP Malaysia 9252CP Fail Pass
3368YF8 Malaysia 9335CP Fail Pass
348W76S Malaysia 9347DP Fail Pass
921BDIV Philippines 8917WM Fail Pass

Kind of boring really. Pretty clear that AMD oversold the capability of its Am9511A-1 to run at 3MHz. Or, I’m not feeding them with the correct timing.

I’ll need to redesign the YAZ180v2 to provide a divided by 8 clock at 2.304MHz, and cross my fingers that they work at that clock rate.

Update September 2017

Following a substantial investment in testing of the solution in April and August, I’ve finally got a working system that can reliably use the Am9511A-1 to produce computational results.

The main issue was described in the Retro Challenge period. The Z180 CPU I/O timing is slightly different to the Z80, in that it doesn’t provide any interval between the WR/RD deselection and the CS deselection. For most peripheral devices this isn’t a problem, but for some it causes them not to work. Hitachi recognised this problem, and provided a shortened I/O signal called the “E Clock”, which can be gated together with the WR/RD signals to gain the timing desired.

The Am9511A doesn’t require the RD signal to rise before the CS signal. Its TRCS time is 0ns. However it does require the WR signal to rise 60ns (or for Am9511A-1 30ns) before the CS signal. This can be achieved by gating the WR signal together with the E Clock signal. The Z180 E Clock is 1/2 a Phi cycle shorter than the WR signal during I/O instructions.  Therefore provided that the Z180 Phi is slow enough, then the required timing can be held. At 18,432MHz, the Z180 Phi signal is 54ns long, and therefore 1/2 Phi is 27ns, which is too short to work consistently. However, the Z180 can halve its Phi speed easily, and simply by changing the CMR register the Phi clock can be reduced to 9.216MHz, which in turn gives us (theoretically) 54ns TWCS for the Am9511A-1.

In practice gating the WR signal through the E Clock provides 41ns for TWCS, which is enough to work reliably with Am9511A-1 devices.

However, as the Am9511A-1 timing is derived from the Z180 clock (divided by 8), running Phi at 9.216MHz means that the APU is in turn under clocked to 1.152MHz. We can avoid this waste of capability, by managing the Z180 clock precisely during the command interrupt generated by the Am9511A-1. On entry to the interrrupt, we reduce the Phi to allow commands (TWCS) to work, and on exit from the interrupt we return the Phi to its original setting of 18.432MHz (and the APU to 2.304MHz). This also conveniently avoids having to modify the Z180 ASCI code to manage serial communications in the light of the reduced system clock.

I have also tested the Am9511A-1 using this revised WR to CS timing, with my original clocking of divide by 6 from Z180 system clock, or 3.072MHz, and it doesn’t work. This means that we actually do need to stay within the specification, and not overclock the APU.

z88dk Driver

The z88dk supports the YAZ180 platform, so I have added a driver for the Am9511A into the device folders. The driver is based on an interrupt driven model. As noted in the Am9511A Processor Manual, there are 4 different models that can be adopted.

The Am9511A Pause signal is connected to the Z180 Wait signal, so if the CPU generates a request that requires time to fulfil, then the APU will cause the CPU to Wait by signalling Pause. Clearly this means that the CPU cannot do anything while it is in Wait mode. This model of working is called Demand Wait, and it provides the fastest APU response, but doesn’t support using the CPU effectively.

The APU status register can be read without the APU causing a Pause response. Therefore the CPU can poll the APU by reading the status register. In this situation the CPU can continue profitably, but response to APU requirements (for new operands, or commands) is limited to the polling rate. As different APU commands can be completed in different times, the optimal polling time can be difficult to calculate.

The APU End signal can be connected to a CPU interrupt, and in the case of the YAZ180 it is connected to the NMI, which is then triggered at the end of each command, when the APU is ready to receive a new command. If the APU commands are buffered in advance, then using this interrupt mechanism the APU can complete a long sentence of commands “autonomously” and in parallel to the CPU activities, interrupting only when it needs to load a new command. This is the mechanism used in the z88dk driver.

Some ideas for creating an optimal interrupt driven driver for the Am9511A were inspired by reading “An Efficient Software Driver for Am9511 Arithmetic Processor Implementation”, B. Furht and P. Lee, 1984.

The final driver option is to connect the APU Service Request signals to the CPU DMA interface. In the YAZ180 these hardware signals are connected, and if desired a DMA enabled driver could be written to take advantage of this interface. However as only 4 bytes of information can be transferred in each load or unload command, at face value there seems to be little advantage in building a DMA software interface.

The C interface for the Am9511A driver supports direct access to the Am9511A, like the asm driver, and hopefully it can be integrated into the z88dk math library options, to support transparent usage of the APU where it exists.

Currently, some simple C interface code is now available on z88dk, which allows us to do performance testing.

Performance Testing

A simple test of seeking prime numbers exercises the floating point divide and 32 bit fixed point divide. This “brute force” method is not optimised for the APU as 4x 32 bit numbers must be loaded into the APU, but only two divides and a subtraction are done, for each calculation cycle. During the operand loading process, the CPU must also be slowed down to half rate, but since this is during an non-maskable interrupt there is little effect on the overall system.

So, some comparisons. First using Nascom Basic, we have a baseline, for seeking 1000 prime numbers.

20 PRINT "LIMIT";
30 INPUT L
40 FOR N = 3 TO L
50    FOR D = 2 TO (N-1)
60      IF N/D=INT(N/D) THEN GOTO 100
70    NEXT D
80    PRINT N;
90    GOTO 110
100   PRINT ".";
110 NEXT N
120 END

200 REM 124.7 Seconds (hand timed) - Z180 36.864MHz - Nascom Basic

Adding the Am9511A to the calculation, by doing the inner loop in assembly and calling the APU for the divide and subtract duties, does speed things up.

20  PRINT "LIMIT";
30  INPUT L
40  FOR N = 3 TO L
50    IF USR(N) = 0 THEN GOTO 100
80    PRINT N;
90    GOTO 110
100   PRINT ".";
110 NEXT N
120 END

200 REM 90.4 Seconds (hand timed) - Z180 36.864MHz - Nascom Basic

This is interesting, as the Am9511A is running at 2.304MHz (straight out of 1977), yet it is STILL faster than software on a 36.864MHz Z180.

Now, what happens when using the z88dk, to do the calculation in C?

void main(void)
{
  static signed long l, n, d;
  printf("Limit: ");
  scanf("%ld", &l);
  for (n = 3; n != l; ++n)
  {
    for (d = 2; d != (n-1); ++d)
      if ((float)n/(float)d == n/d) break;
    if (d == (n-1))
    printf("%ld", n);
      else
    printf(".");
  }
}

// 67.8 Seconds (hand timed) - Z180 36.864MHz - z88dk sdcc_iy
// 62.7 Seconds (hand timed) - Z180 36.864MHz - z88dk newlib

// zcc +yaz180 -subtype=basic_dcio -vn -SO3 -lm -clib=sdcc_iy --max-allocs-per-node200000 primesC.c -o primesC -create-app
// zcc +yaz180 -subtype=basic_dcio -vn -SO3 -lm -clib=new primesC.c -o primesC -create-app

Using the assembly math routines in z88dk is now the fastest solution. But, we’d expect C to be faster than Basic. What happens if we do the divides and subtractions in the Am9511A. That’s why we’re here.

void main(void)
{
  static signed long l, n, d, r;

  apu_reset( (void *) 0x2021 ); //INITIALISE THE APU with the NMI VECTOR ADDRESS

  printf("Limit: ");
  scanf("%ld", &l);

  for (n = 3; n != l; ++n)
  {
    for (d = 2; d != (n-1); ++d)
    {
      apu_cmd_ld( &n,   APU_OP_ENT32);
      apu_cmd_ld( NULL,     APU_OP_FLTD);
      apu_cmd_ld( &d,   APU_OP_ENT32);
      apu_cmd_ld( NULL,     APU_OP_FLTD);
      apu_cmd_ld( NULL,     APU_OP_FDIV);
      apu_cmd_ld( &n,   APU_OP_ENT32);
      apu_cmd_ld( &d,   APU_OP_ENT32);
      apu_cmd_ld( NULL,     APU_OP_DDIV);
      apu_cmd_ld( NULL,     APU_OP_FLTD);
      apu_cmd_ld( NULL,     APU_OP_FSUB);
      apu_cmd_ld( &r,   APU_OP_REM32);

      apu_isr();        // calls the ISR to trigger the process
      apu_chk_idle();   // blocks until the APU is idle

      if (r == 0) break;
    }
    if (d == (n-1))
      printf("%ld", n);
    else
      printf(".");
  }
}

// 94.9 Seconds (hand timed) - Z180 36.864MHz - z88dk

// zcc +yaz180 -subtype=basic_dcio -vn -m -SO3 -clib=sdcc_iy --max-allocs-per-node200000 @primesC_APU.lst -o primesC_APU
// appmake +glue -b primesC_APU --ihex --clean

Well the end result is (surprisingly) similar to using the Am9511A together with Basic. The issue is that it takes a long time to load and unload the operands to the APU, and to process (cast) them, and then we only do a relatively simple operation when they’re there. Relatively speaking for this test, we’re bound by the I/O rate of the APU, which is quite slow, and this does not demonstrate actual the compute rate of the APU.

APU_OP_ENT32
APU_OP_FLTD -  56-342 Cycles
APU_OP_ENT32
APU_OP_FLTD -  56-342 Cycles
APU_OP_FDIV - 154-184 Cycles
APU_OP_ENT32
APU_OP_ENT32
APU_OP_DDIV - 196-210 Cycles
APU_OP_FLTD -  56-342 Cycles
APU_OP_FSUB -  70-370 Cycles
APU_OP_REM32

However, this is still a good result. We’re comparing a modern overclocked z180 at 36.864MHz, with an ancient device, released in 1977, running at 2.304MHz. And, we have achieved comparable results.

Update July 2018

Over the past year I’ve done quite a lot with the Am9511A. One interesting application was using it to calculate a Mandelbrot set, and generate a text image.

From the point of view of drivers, I re-wrote the interrupt based driver to use a 3 byte operand address in the ring buffer, to support access to the full memory space of the Z180.

I also uncovered a hardware bug, which was caused because I was resetting the APU clock divider with the system reset. This meant that the APU was not getting 5 clock cycles with Reset held high, and would therefor not reset reliably. Removing the reset signal from the divider chip solved this issue.

Update January 2022

Over the past few years the Am9511A has proven itself to be a worthwhile companion in all of my Z80 and 8085 CPU systems. To date I’ve never had a functioning device fail, despite their very high operating temperature, and their performance advantage over Z80 software floating point libraries is proven.

Intel 8231A APU + Intel 8085 CPU

Intel 8231A APU Module with Intel 8085AH-2 CPU Module

Perhaps the culmination of the path was building an Intel 8085 CPU Module for the RC2014 platform, and then adding an Intel 8231A equipped APU Module to the solution, together with the 8085 drivers for the z88dk am9511 maths library which makes, in my opinion, perhaps the perfect 1970’s computing machine.

4x APU Modules with Am9511A

4x Am9511A APU Modules with Z80 CPU.

What is even more perfect is that the APU Module can be scaled for efficient parallel processing, something that would never have been contemplated back in the 1970’s.

Freewrite – First Thoughts

Having taken delivery of a beautifully sculpted object, the Freewrite, I believe it was worth the wait, and the money, in every way. I’ve built both hardware and software, and I know that it is unbelievably difficult to get a piece of hardware (including aluminium, plastic, PCB, components, assembly, testing, and the list of items goes on) constructed and shipped globally. Rightly, getting this construction process completed has been the focus of the Astrohaus team over the past year.

freewrite crop

During the many months of waiting for delivery of my Freewrite, I have purchased Cherry mechanical keyboards from both WASD and Das Keyboard (costing US$150 and more), and I use them during my day to day work and pleasure. There is nothing even remotely close to the pleasure of having Cherry MX switches under your fingertips (I prefer Clear and Brown varieties).

These mechanical keyboards can work with a Chromebook (costing US$300 or more) to produce a functionality similar to the Freewrite. But, then there are two pieces to be transported, and the whole package is not a single writing tool. Spending the time to juggle the pieces of the drafting package leads to further distractions, and the result is not substantially cheaper than the Freewrite.

The Astrohaus software game has been mediocre, as this article notes. Hopefully, getting the hardware out the door will allow them to focus on the many minor software shortcomings. As it is just software, it should be easy for them to provide OTA upgrades of the Freewrite to remedy the situation.