8085 Software

This covers creation of software support for the 8085 CPU within the framework of the z88dk and also with MS BASIC 4.7. Specifically, the 8085 extended instructions will be covered, and some usage possibilities provided.

8085 Microsoft BASIC 4.7

The Microsoft BASIC 4.7 source code is available from the NASCOM machine. Although the NASCOM machine had a Z80 CPU there were only minor changes to the original Microsoft BASIC 8080 CPU code. Therefore it is an ideal source to use to build a 8085 based system.

At this repository the 8085 RC2014 Microsoft BASIC is being developed. Currently fully working with the RC2014 ACIA Serial Module (from the RC2014 Classic ][). Some initial performance testing has been done, and there is little difference (< 1%) vs. the Z80 at the same frequency.

The significant characteristics include.

  • The serial interface is configured for 115200 baud with 8n2 setting and RTS hardware handshake.
  • ACIA 6850 interrupt driven serial I/O supporting the hardware double buffer, together with a large receive buffer of 255 bytes, to allow efficient pasting of BASIC into the editor. The receive RTS handshake shows full before the buffer is totally filled to allow run-on from the sender.
  • Interrupt driven serial transmission, with a 63 byte buffer, to ensure the CPU is not held waiting during transmission.
  • A RAM redirection jump table, starting at 0x8000, enables the important RST instructions and the 8085 interrupt vectors to be reconfigured by the user.
  • Both an Intel HEX HLOAD statement and software RESET statement have been added. This allows you to easily upload 8085 assembly or compiled C programs, and then run them as described. The HLOAD statement automatically adjusts the upper RAM limit for BASIC and enters the program origin into the USRLOC location.
  • Added MEEK and MOKE statements which allow bulk memory to be examined in 16 byte blocks, and support continuous editing (assembly language entry) of memory. Addresses and values can be entered as signed decimal integers, or as alternatively as hexadecimal numbers using the & keyword.
  • The WIDTH command has been extended to support setting the comma column screen width using WIDTH I,J where I is the screen width and J is the comma column screen width.
  • Instruction and code flow tuning result in faster execution. Both the 8085 and Z80 versions of the code have been optimised by approximately 10% over the originally released 8080 code.
  • Support for the Am9511A APU Module provides a 3x to 5x faster execution of assembly or C floating point programs (with respect to equivalent C programs using software floating point libraries).

A version of Microsoft BASIC 4.7 for the 8085 CPU Module together with the Am9511A APU Module has been built, as well. This version adds the full performance of a hardware APU to the 8085 CPU providing the “complete performance package” for Microsoft BASIC.

Z88DK Support

Support for the 8085 processor is available from the z88dk. The sccz80 C Compiler, combined with the classic library and z88dk-z80asm assembler provide the necessary components.

Support for the 8085 CPU Module for the RC2014 has been provided using the underlying MS BASIC as a program loader and debugging tool (as described above). This is reached through the rc2014 target using the basic85 subtype. This uses the standard RST serial interfaces (provided by MS BASIC) and the HLOAD keyword to upload code compiled for $9000 origin (by default). However, compiled programs can use any memory from $8400 through to $FFFF.

Also a rc2014 target ROM using the subtype acia85 has been provided to allow on-the-metal embedded applications to be written. The full 32kB of ROM and 32kB RAM is then available, with the option to toggle out the ROM if needed for 64 kB RAM for CP/M or similar systems.

Within z88dk the mbf32 floating point math package has been optimised to support the 8085 extended instructions.

The z88dk sccz80 C compiler is used for 8080, 8085 and Gameboy Z80 CPUs. This compiler is supported by the z88dk classic library. Over a few weeks, I reworked all of the sccz80 compiler support primitives (called l_ functions) to make them reentrant, and to optimise them for the respective CPU.

I’ve also reworked all of the z88dk string functions to support callee for the 8085 CPU. The callee calling mechanism is substantially faster than the standard calling convention. Also I’ve changed the loop mechanism for 8080 / 8085 / GBZ80 to use a faster mechanism. This consumes 5 bytes more for each function used, but reduces the loop overhead from 24 cycles per iteration to 14 cycles per iteration. Quite a substantial saving for extensively used functions like memcpy() and memset(), for example.

8085 Extended Instructions

Over the years since launch 10 very useful extended instructions designed into the 8085 have been found and documented. These instructions are particularly useful for building stack relative code, such as required for high level languages or reentrant functions. However, perhaps because of corporate politics, these useful instructions were never announced, and thus were never widely implemented. These instructions remained effectively undocumented until they were formalised in the Tundra CA80C85B CPU datasheet.

There is a reference to these instructions and their use in Intel mnemonics, but I prefer to use Zilog mnemonics. So I’ve modified the CLR table to support the 8085.

The z88dk-z80asm assembler also provides synthetic instructions to simplify code for the different variants (it has also recently become a macro assembler) to simplify programming. These instructions are usually a useful sequence of normal instructions that can be issued with no side effects (eg. setting flags) that may streamline combined 8085 / z80 programming.

Discussion on the Extended Instructions

Some things to think about (and then do).

  • Use the Underflow Indicator (K or UI) flag with 16 bit decrement and JP KJP NK instructions to manage loops, like LDIR emulation, more cleanly. 16 bit decrement overflow flag K is set on -1, not on 0, so pre-decrement loop counter.
  • Use the LD DE,SP+n instruction with LD HL,(DE) to grab from and LD (DE),HL to store parameters on the stack. Can use this with a math library to make it reentrant, for example, and also relieves pressure on the small number of registers.
  • Use the LD DE,SP+n instruction with LD SP,HL to quickly set up the stack frame. For example LD HL,SP+n, DEC H, LD SP,HL to establish 256-n stack frame.
  • Use RL DE together with EX DE,HL to rotate 32 bit fields.
  • Use RL DE together with ADD HL,HL to shift 32 bit fields.
  • Use RL DE as ADD DE,DE to offset into tables and structures.
  • Use SUB HL,BC for 16 bit subtraction, and for comparative operations.
  • Remember EX (SP),HL provides another “16-bit register”, if SP+2 is made the location of the return by pushing to create a register, and SP+4 then becomes the location of first variable.
  • Use the K flag together with JP K,nnnn to improve signed integer comparison operations.

Since we know that the 8085 extended opcodes are available in every 8085 device they can be relied upon for any 8085 system. The challenge will be to take existing 8080 programs, such as Microsoft Basic and CP/M, and implement improvements using these 8085 specific instructions.

In reworking the z88dk sccz80 l_ primitives to make them reentrant and to optimise them for the 8085 CPU, I have found the LD DE,SP+n instruction very important. Using this instruction it is possible to use the stack effectively as static variable storage locations. The alternative available on the 8080 (and Z80) LD HL,N , ADD HL,SP takes 21 cycles, and clears the Carry flag. With the few registers available on the 8080 losing the Carry flag to provide state causes further cycle expense, spared with the 8085 alternative.

To load a single stack byte using LD DE,SP+n , LD A,(DE) is only 4 cycles slower than loading a static byte using LD A,(**). Also, loading a stack word using LD DE,SP+n , LD HL,(DE) is only 4 cycles slower than loading a static word using LD HL,(**). Given that variables can be used in-situ from the stack or pushed onto the stack from registers rather than requiring the overhead of the value being previously loaded into the static location, this small overhead translates into about 3 stack accesses for free compared to static variables.

One small design oversight in the Program Status Word of the 8085 is however quite annoying. The flags register contains a single bit that always reads as 0. A $FFFF pushed to AF is read back as $FF7F. This means that unlike in the Z80, it is not possible to use a POP AF , PUSH AF pair as a temporary stack store, which invalidates AF as one of the only 3 additional 16-bit registers as an option, making things even tighter when juggling the stack. I’d call it annoying AF.

The RL DE and SUB HL,BC instructions are very useful to build 16-bit multiply and divide routines effectively. They have contributed to useful optimisations of these primitives. The saving in bytes over equivalent 8080 implementations has allowed for partial loop unrolling, which also speeds up the routines by reducing loop overhead. Initially, I was concerned that the SUB HL,BC function didn’t include the Carry flag. But in hindsight it is not possible to effectively carry into the registers, and using the 8 bit SUB A,C , SBC A,B instructions via the A register is the way to manage long arithmetic.

Recently the LD DE,SP+n and LD HL,(DE) or LD A,(DE) instructions were used to replace the sccz80 z80 stack access routine LD HL,n, and ADD HL,SP followed by CALL l_gint or CALL l_gchar. Also the stack store routine CALL l_pint was replaced by LD (DE),HL. These small changes to the optimisation process have substantially improved the 8085 benchmarks, in both code size and performance, and now they are often on a par with similar z80 benchmarks.

More recently I’ve been wondering about the sub hl,bc instruction. Why was it so important to have a subtraction instruction when two complement numbers can be subtracted easily using the addition instruction? Why did sub hl,bc win its place in the limited opcode space?

Well it turns out (I believe anyway) that the 16 bit subtraction instruction sub hl,bc is really intended for efficient comparison operations. In C it is quite common to end loops with equality ( == ) or inequality ( != ) operators, and these can be quickly implemented with the sub hl,bc instruction. Subtracting two numbers quickly generates a signal (zero flag) to either equality or inequality depending on what you want to test.

How useful could it be to have a special case for certain comparison operators? Well a quick look at the ChaN FatFS code shows more than 500 equality operations, and 300 inequality operations. Similarly the z88dk regex library function has many equality and inequality operations included.

So I recently included special cases of equality and inequality comparisons using sub hl,bc into the z88dk 8085 library support. The improved 8085 equality operation looks like this, compared to the original 8080 compatible version here. And as a result the regex library function test is now faster on 8085 than it is on z80!

Also all signed integer comparisons can be optimised through use of the K flag and sub hl,bc which brings them to the same performance as the z80.

CP/M-IDE 8085

The next challenge was to build a CP/M-IDE version for the 8085 CPU. The ingredients are ACIA serial drivers adapted for 8085, IDE (or Compact Flash) diskio drivers for 8085, and the ChaN FatFs library compiled for 8085, plus a 8085 adapted BIOS.

Modules required for CP/M-IDE 8085

When looking at the IDE drivers written previously for Z80 it was obvious that I’d gone out of my way to use Z80 instructions, which were actually slower than using 8080 instructions. So, I took the opportunity to rewrite an integrated solution for both Z80 and 8080/8085, for future maintenance.

The new CP/M-IDE 8085 code is very similar to the existing ACIA and SIO serial Z80 code, by design. I’ve tried to minimise the differences where ever possible. The remaining differences are mainly in the BIOS code, and relate to initialisation of the 8085 interrupts and the different CRT code used between Z80 and 8085 systems.

CP/M-IDE Modules installed in RC2014 Backplane 8

Am9511A (Intel 8231) APU Support

I’ve just added the Am9511 (Intel 8231) APU math library for the 8085 CP/M and other 8085 targets within z88dk. So now the 8085+Am9511 support is pretty much rounded out and complete.

To use the APU math library with CP/M, the library just needs to be linked with --math-am9511.
For example:
zcc +cpm -clib=8085 -v -O2 n-body.c -o nbody --math-am9511 -lndos -create-app

Just working through some maths benchmarks on my CP/M-IDE System now.

The Whetstone Benchmark for RC2014 results (7.3728MHz, hand timed) are:

  • 8085+MBF32 -> 78.2 Seconds -> 12.8 kWhetstone.
  • 8085+AM9511 -> 30.4 Seconds. -> 32.9 kWhetstone.

And for the n-body Benchmark the RC2014 results (7.3728MHz, hand timed) are:

  • 8085+MBF32 -> 252.3 Seconds.
  • 8085+AM9511 -> 69.3 Seconds.

So the 8085+APU system is 2.5x to 3.6x faster than the best 8085 software maths library. And what is also interesting is that these numbers align very closely with the Z80+APU results.

3 thoughts on “8085 Software

  1. Pingback: 8085 CPU on the Z80 Bus | feilipu

  2. Just came across your site while looking for an 8085 IDE that supports the undocumented commands. I have a system that was designed in the mid 80s where I have not been able to decompile the machine code correctly because the designer used undocumented commands. Can your 8085 software decompile machine code (with undocumented commands)? Any comments or recommendations would be greatly appreciated. TIA

Leave a reply to feilipu Cancel reply