8085 Software

This covers creation of software support for the 8085 CPU within the framework of the z88dk and also with MS BASIC 4.7. Specifically, the 8085 undocumented instructions will be covered, and some usage possibilities provided.

8085 Microsoft BASIC 4.7

The Microsoft BASIC 4.7 source code is available from the NASCOM machine. Although the NASCOM machine was a Z80 machine there were only minor changes to the original Microsoft BASIC 8080 code. Therefore it is an ideal source to use to build a 8085 based system.

At this repository the 8085 RC2014 Microsoft BASIC is being developed. Currently fully working with the RC2014 ACIA Serial Module (from the RC2014 Classic ][). Some initial performance testing has been done, and there is little difference (< 1%) vs. the Z80 at the same frequency.

A version of Microsoft BASIC 4.7 for the 8085 CPU Module together with the Am9511A APU Module has been built, as well. This version adds the full performance of a hardware APU to the 8085 CPU providing the “complete performance package” for Microsoft BASIC.

Z88DK Support

Support for the 8085 processor is available from the z88dk. The sccz80 C Compiler, combined with the classic library and z88dk-z80asm assembler provide the necessary components.

Support for the 8085 CPU Module for the RC2014 has been provided using the underlying MS Basic as a program loader and debugging tool. This is reached through the rc2014 target basic85 subtype. This uses the standard RST serial interfaces (provided by MS Basic) and the HLOAD keyword to upload code compiled for $9000 origin (by default). Compiled programs can use any memory from $8400 through to $FFFF.

Also a rc2014 target ROM subtype acia85 has been provided to allow on-the-metal embedded applications to be written. The full 32kB of ROM and 32kB RAM is then available, with the option to toggle out the ROM if needed for CP/M or similar systems.

Within z88dk the mbf32 floating point math package has been optimised to support 8085 undocumented instructions.

The z88dk sccz80 C compiler is used for 8080, 8085 and Gameboy Z80 CPUs. This compiler is supported by the z88dk classic library. Over a few weeks, I reworked all of the sccz80 compiler support primitives (called l_ functions) to make them reentrant, and to optimise them for the respective CPU.

I’ve also reworked all of the z88dk string functions to support callee for the 8085 CPU. The callee calling mechanism is substantially faster than the standard calling convention. Also I’ve changed the loop mechanism for 8080 / 8085 / GBZ80 to use a faster mechanism. This consumes 5 bytes more for each function used, but reduces the loop overhead from 24 cycles per iteration to 14 cycles per iteration. Quite a substantial saving for extensively used functions like memcpy() and memset(), for example.

8085 Undocumented Instructions

Over the years since launch several very useful undocumented instructions designed into the 8085 have been found. These instructions are particularly useful for building stack relative code, such as required for high level languages or reentrant functions. However, perhaps because of corporate politics, these useful instructions were never announced, and thus were never widely implemented.

There is a reference to these instructions and their use in Intel mnemonics, but I prefer to use Zilog mnemonics. So I’ve modified the CLR table to support the 8085.

The z88dk-z80asm assembler provides synthetic instructions to simplify code for the different variants (it has also recently become a macro assembler) to simplify programming. These instructions are usually a useful sequence of normal instructions that can be issued with no side effects (eg. setting flags) that may streamline combined 8085 / z80 programming.

Discussion on the Instructions

Some things to think about (and then do).

  • Use the Underflow Indicator (K or UI) flag with 16 bit decrement and JP KJP NK instructions to manage loops, like LDIR emulation, more cleanly. 16 bit decrement overflow flag K is set on -1, not on 0, so pre-decrement loop counter.
  • Use the LD DE,SP+n instruction with LD HL,(DE) to grab from and LD (DE),HL to store parameters on the stack. Can use this with a math library to make it reentrant, for example, and also relieves pressure on the small number of registers.
  • Use the LD DE,SP+n instruction with LD SP,HL to quickly set up the stack frame. For example LD HL,SP+n, DEC H, LD SP,HL to establish 256-n stack frame.
  • Use RL DE together with EX DE,HL to rotate 32 bit fields.
  • Use RL DE together with ADD HL,HL to shift 32 bit fields.
  • Use RL DE as ADD DE,DE to offset into tables and structures.
  • Use SUB HL,BC for 16 bit subtraction.
  • Remember EX (SP),HL provides another “16-bit register”, if SP+2 is the location of the return, and SP+4 is the location of first variable.
  • Learn how signed arithmetic can be improved using the K flag.

Since we know that the 8085 undocumented opcodes are available in every 8085 device they can be relied upon for any 8085 system. The challenge will be to take existing 8080 programs, such as Microsoft Basic and CP/M, and implement improvements using these 8085 specific instructions.

In reworking the z88dk sccz80 l_ primitives to make them reentrant and to optimise them for the 8085 CPU, I have found the LD DE,SP+n instruction very important. Using this instruction it is possible to use the stack as effectively as static variable storage locations. The alternative available on the 8080 (and Z80) LD HL,N , ADD HL,SP takes 21 cycles, and clears the Carry flag. With the few registers available on the 8080 losing the Carry flag to provide state causes further cycle expense, spared with the 8085 alternative.

To load a single stack byte using LD DE,SP+n , LD A,(DE) is only 4 cycles slower than loading a static byte using LD A,(**). Also, loading a stack word using LD DE,SP+n , LD HL,(DE) is only 4 cycles slower than loading a static word using LD HL,(**). Given that variables can be used in-situ from the stack or pushed onto the stack from registers rather than requiring the overhead of the value being previously loaded into the static location, this small overhead translates into about 3 stack accesses for free compared to static variables.

One small design oversight in the Program Status Word of the 8085 is however quite annoying. The flags register contains a single bit that always reads as 0. A $FFFF pushed to AF is read back as $FF7F. This means that unlike in the Z80, it is not possible to use a POP AF , PUSH AF pair as a temporary stack store, which invalidates AF as one of the only 3 additional 16-bit registers as an option, making things even tighter when juggling the stack. I’d call it annoying AF.

The RL DE and SUB HL,BC instructions are very useful to build 16-bit multiply and divide routines effectively. They have contributed to useful optimisations of these primitives. The saving in bytes over equivalent 8080 implementations has allowed for partial loop unrolling, which also speeds up the routines by reducing loop overhead. Initially, I was concerned that the SUB HL,BC function didn’t include the Carry flag. But in hindsight it is not possible to effectively carry into the registers, and using the 8 bit SUB A,C , SBC A,B instructions via the A register is the way to manage long arithmetic.

Recently the LD DE,SP+n and LD HL,(DE) or LD A,(DE) instructions were used to replace the sccz80 z80 stack access routine LD HL,n, and ADD HL,SP followed by CALL l_gint or CALL l_gchar. Also the stack store routine CALL l_pint was replaced by LD (DE),HL. These small changes to the optimisation process have substantially improved the 8085 benchmarks, in both code size and performance, and they are now often better than similar z80 benchmarks.

CP/M-IDE 8085

The next challenge was to build a CP/M-IDE version for the 8085 CPU. The ingredients are ACIA serial drivers adapted for 8085, IDE and diskio drivers for 8085, and the ChaN FatFs library compiled for 8085, plus a 8085 adapted BIOS.

Modules required for CP/M-IDE 8085

When looking at the IDE drivers written previously for Z80 it was obvious that I’d gone out of my way to use Z80 instructions, which were actually slower than using 8080 instructions. So, I took the opportunity to rewrite an integrated solution for both Z80 and 8080/8085, for future maintenance.

The new CP/M-IDE 8085 code is very similar to the existing ACIA and SIO serial Z80 code, by design. I’ve tried to minimise the differences where ever possible. The remaining differences are mainly in the BIOS code, and relate to initialisation of the 8085 interrupts and the different CRT code used between Z80 and 8085 systems.

CP/M-IDE Modules installed in RC2014 Backplane 8

Am9511A (Intel 8231) APU Support

I’ve just added the Am9511 (Intel 8231) APU math library for the 8085 CP/M and other 8085 targets within z88dk. So now the 8085+Am9511 support is pretty much rounded out and complete.

To use the APU math library with CP/M, the library just needs to be linked with --math-am9511_8085.
For example:
zcc +cpm -clib=8085 -v -O2 n-body.c -o nbody --math-am9511_8085 -lndos -create-app

Just working through some maths benchmarks on my CP/M-IDE System now.

The Whetstone Benchmark for RC2014 results (7.3728MHz, hand timed) are:

  • 8085+MBF32 -> 78.2 Seconds -> 12.8 kWhetstone.
  • 8085+AM9511 -> 30.4 Seconds. -> 32.9 kWhetstone.

And for the n-body Benchmark the RC2014 results (7.3728MHz, hand timed) are:

  • 8085+MBF32 -> 252.3 Seconds.
  • 8085+AM9511 -> 69.3 Seconds.

So the the 8085+APU system is 2.5x to 3.6x faster than the best 8085 software maths library.
And what is very interesting is that these numbers also align very closely with the Z80+APU results.

One thought on “8085 Software

  1. Pingback: 8085 CPU on the Z80 Bus | feilipu

Leave a comment