This covers creation of software support for the 8085 CPU within the framework of the z88dk and also with MS BASIC 4.7. Specifically, the 8085 undocumented instructions will be covered, and some usage possibilities provided.
Future work is to build a re-entrant IEEE floating point library specifically using the stack relative instructions found in the 8085 undocumented instructions.
8085 Microsoft BASIC 4.7
The Microsoft BASIC 4.7 source code is available from the NASCOM machine. Although the NASCOM machine was a Z80 machine there were only minor changes to the original Microsoft BASIC 8080 code. Therefore it is an ideal source to use to build a 8085 based system.
At this repository the 8085 RC2014 Microsoft BASIC is being developed. Currently fully working with the RC2014 ACIA Serial Module (from the RC2014 Classic ][). Some initial performance testing has been done, and there is little difference (< 1%) vs. the Z80 at the same frequency.
A version of Microsoft BASIC 4.7 for the 8085 CPU Module together with the Am9511A APU Module has been built, as well. This version adds the full performance of a hardware APU to the 8085 CPU providing the “complete performance package” for Microsoft BASIC.
Support for the 8085 processor is available from the z88dk. The sccz80 C Compiler, combined with the classic library and z88dk-z80asm assembler provide the necessary components.
Support for the 8085 CPU Module for the RC2014 has been provided using the underlying MS Basic as a program loader and debugging tool. This is reached through the rc2014 target basic85 subtype. This uses the standard RST serial interfaces (provided by MS Basic) and the HLOAD keyword to upload code compiled for $9000 origin (by default). Compiled programs can use any memory from $8400 through to $FFFF.
Also a rc2014 target ROM subtype acia85 has been provided to allow on-the-metal embedded applications to be written. The full 32kB of ROM and 32kB RAM is then available, with the option to toggle out the ROM if needed for CP/M or similar systems.
Within z88dk the mbf32 floating point math package has been optimised to support 8085 undocumented instructions.
The z88dk sccz80 C compiler is used for 8080, 8085 and Gameboy Z80 CPUs. This compiler is supported by the z88dk classic library. Over a few weeks, I reworked all of the sccz80 compiler support primitives (called
l_ functions) to make them reentrant, and to optimise them for the respective CPU.
I’ve also reworked all of the z88dk string functions to support callee for the 8085 CPU. The callee calling mechanism is substantially faster than the standard calling convention. Also I’ve changed the loop mechanism for 8080 / 8085 / GBZ80 to use a faster mechanism. This consumes 5 bytes more for each function used, but reduces the loop overhead from 24 cycles per iteration to 14 cycles per iteration. Quite a substantial saving for extensively used functions like
memset(), for example.
8085 Undocumented Instructions
Over the years since launch several very useful undocumented instructions designed into the 8085 have been found. These instructions are particularly useful for building stack relative code, such as required for high level languages or reentrant functions. However, perhaps because of corporate politics, these useful instructions were never announced, and thus were never widely implemented.
The z88dk-z80asm assembler provides synthetic instructions to simplify code for the different variants (it has also recently become a macro assembler) to simplify programming. These instructions are usually a useful sequence of normal instructions that can be issued with no side effects (eg. setting flags) that may streamline combined 8085 / z80 programming.
Discussion on the Instructions
Some things to think about (and then do).
- Use the Underflow Indicator (K or UI) flag with 16 bit decrement and
JP NKinstructions to manage loops, like
LDIRemulation, more cleanly. 16 bit decrement overflow flag
Kis set on -1, not on 0, so pre-decrement loop counter.
- Use the
LD DE,SP+ninstruction with
LD HL,(DE)to grab from and
LD (DE),HLto store parameters on the stack. Can use this with a math library to make it reentrant, for example, and also relieves pressure on the small number of registers.
- Use the
LD DE,SP+ninstruction with
LD SP,HLto quickly set up the stack frame. For example
LD SP,HLto establish 256-n stack frame.
RL DEtogether with
EX DE,HLto rotate 32 bit fields.
RL DEtogether with
ADD HL,HLto shift 32 bit fields.
ADD DE,DEto offset into tables and structures.
SUB HL,BCfor 16 bit subtraction.
EX (SP),HLprovides another “16-bit register”, if
SP+2is the location of the return, and
SP+4is the location of first variable.
- Learn how signed arithmetic can be improved using the
Since we know that the 8085 undocumented opcodes are available in every 8085 device they can be relied upon for any 8085 system. The challenge will be to take existing 8080 programs, such as Microsoft Basic and CP/M, and implement improvements using these 8085 specific instructions.
In reworking the z88dk sccz80
l_ primitives to make them reentrant and to optimise them for the 8085 CPU, I have found the
LD DE,SP+n instruction very important. Using this instruction it is possible to use the stack as effectively as static variable storage locations. The alternative available on the 8080 (and Z80)
LD HL,N , ADD HL,SP takes 21 cycles, and clears the Carry flag. With the few registers available on the 8080 losing the Carry flag to provide state causes further cycle expense, spared with the 8085 alternative.
To load a single stack byte using
LD DE,SP+n , LD A,(DE) is only 4 cycles slower than loading a static byte using
LD A,(**). Also, loading a stack word using
LD DE,SP+n , LD HL,(DE) is only 4 cycles slower than loading a static word using
LD HL,(**). Given that variables can be used in-situ from the stack or pushed onto the stack from registers rather than requiring the overhead of the value being previously loaded into the static location, this small overhead translates into about 3 stack accesses for free compared to static variables.
One small design oversight in the Program Status Word of the 8085 is however quite annoying. The flags register contains a single bit that always reads as 0. A
$FFFF pushed to
AF is read back as
$FF7F. This means that unlike in the Z80, it is not possible to use a
POP AF , PUSH AF pair as a temporary stack store, which invalidates
AF as one of the only 3 additional 16-bit registers as an option, making things even tighter when juggling the stack. I’d call it annoying
RL DE and
SUB HL,BC instructions are very useful to build 16-bit multiply and divide routines effectively. They have contributed to useful optimisations of these primitives. The saving in bytes over equivalent 8080 implementations has allowed for partial loop unrolling, which also speeds up the routines by reducing loop overhead. Initially, I was concerned that the
SUB HL,BC function didn’t include the Carry flag. But in hindsight it is not possible to effectively carry into the registers, and using the 8 bit
SUB A,C , SBC A,B instructions via the
A register is the way to manage long arithmetic.
Recently the LD DE,SP+n and LD HL,(DE) or LD A,(DE) instructions were used to replace the sccz80 z80 stack access routine LD HL,n, and ADD HL,SP followed by CALL l_gint or CALL l_gchar. Also the stack store routine CALL l_pint was replaced by LD (DE),HL. These small changes to the optimisation process have substantially improved the 8085 benchmarks, in both code size and performance, and they are now often better than similar z80 benchmarks.
The next challenge was to build a CP/M-IDE version for the 8085 CPU. The ingredients are ACIA serial drivers adapted for 8085, IDE and diskio drivers for 8085, and the ChaN FatFs library compiled for 8085, plus a 8085 adapted BIOS.
When looking at the IDE drivers written previously for Z80 it was obvious that I’d gone out of my way to use Z80 instructions, which were actually slower than using 8080 instructions. So, I took the opportunity to rewrite an integrated solution for both Z80 and 8080/8085, for future maintenance.
The new CP/M-IDE 8085 code is very similar to the existing ACIA and SIO serial Z80 code, by design. I’ve tried to minimise the differences where ever possible. The remaining differences are mainly in the BIOS code, and relate to initialisation of the 8085 interrupts and the different CRT code used between Z80 and 8085 systems.
Am9511A (Intel 8231) APU Support
To use the APU math library with CP/M, the library just needs to be linked with
zcc +cpm -clib=8085 -v -O2 n-body.c -o nbody --math-am9511_8085 -lndos -create-app
The Whetstone Benchmark for RC2014 results (7.3728MHz, hand timed) are:
- 8085+MBF32 -> 78.2 Seconds -> 12.8 kWhetstone.
- 8085+AM9511 -> 30.4 Seconds. -> 32.9 kWhetstone.
And for the n-body Benchmark the RC2014 results (7.3728MHz, hand timed) are:
- 8085+MBF32 -> 252.3 Seconds.
- 8085+AM9511 -> 69.3 Seconds.
So the the 8085+APU system is 2.5x to 3.6x faster than the best 8085 software maths library.
And what is very interesting is that these numbers also align very closely with the Z80+APU results.