MS BASIC Machine Monitor for RC2014

There has been quite bit of interest in assembly or machine monitors for the RC2014 lately, so I’ve taken the opportunity to write one in MS (NASCOM) BASIC which will work for RC2014 Mini / Micro / Classic ][ machines, but equally well for any machine running a version of MS Basic 4.7 or later.

RC2014 Cylon

The functional goal was to provide the same user tools as the NASBUG monitor, but to be maintainable and easily customisable by the user. As the monitor is written in BASIC it works equally well for both Z80 based RC2014 machines and 8085 based RC2014 machines using my 8085 CPU Module (for example).

The code is published in the RC2014 BASIC Programs Monitor repository.

A key issue for me was that it needed to fit completely into the BASIC program memory found below the default z88dk assembly loading address of 0x9000 for the RC2014 target.

By using BASIC language for the monitor it enabled me to focus on the functionality, as BASIC provided the command line support via INPUT(), string tokenisation via MID$(), ASC(), and VAL(), and mathematical functions.

Program Development

The trickiest piece of code is producing a signed integer address from the 4 digit hexadecimal ascii string. Unsurprisingly, I didn’t write this very nice code segment. I found it provided as an example in the back of the NASCOM BASIC User Manual (Appendix I). Kudos to the NASCOM team there for their 40 years of foresight.

However, Fred W. has provided a better simpler solution for getting an integer from a hexadecimal string and now that solution has been implemented.

With that key function done, then the MID$() function was used to tokenise the command string into 4 (or less) digit strings to convert into hexadecimal. Depending on the function required either signed integers were then produced to be an address or where a length was needed an unsigned integer was provided.

Finally an IF THEN tree was made to process the command string, found using the very convenient ASC() function. Since the commands are sparse, I thought that a simple decision tree would be best.

As noted, a key goal for me was that it needed to fit completely into the BASIC program memory found below the default assembly loading address 0x9000. The program fits with over 100 Bytes free, currently. But the stripped version has still over 1200 Bytes free, so there is quite a lot more code that could be added if needed.

Program Usage

The BASIC monitor is useful to casually poke around in the RAM or ROM of a running MS BASIC RC2014 machine to get a feeling of how variables and strings are stored, for example.

It can be used to enter assembled binary code into the RAM, and then run it. The assembly code can use the facilities (serial I/O, RST table, and MS Basic functions) as needed. Some further information on Assembly can be found here.

Further, it can be used together with the Zen Editor/Assembler to write and then assemble larger programs for the RC2014. When the monitor is used together with Zen, the monitor can be loaded (copy / pasted) into the BASIC Command Line, then using the BASIC HLOAD command Zen can be loaded (cat) into the RAM at 0x9000, and launched either from the monitor or directly using ?USR(0). Once editing and assembly is finished, Zen can be exited with Q and then the newly assembled program can be run from its origin by entering the monitor with run, and then E xxxx yy, where yy is a signed integer input parameter.

Using this combination of Zen and the BASIC monitor it is possible to develop, examine, and modify complex Z80 and 8085 assembly code in a very comfortable environment.

Commands for a BASIC Monitor (Syntax borrowed from NASBUG)

A – hexadecimal arithmetic

A xxxx yyyy – Responds with: SSSS DDDD JR JJ. SSSS is sum of xxxx and yyyy. Values in Hexadecimal. DDDD is difference of xxxx and yyyy, yyyy-xxxx. Values in Hexadecimal. JJ is displacement required in a Jump Relative instruction which starts at xxxx, to cause a jump to yyyy. Value in Decimal.

C – copy

C xxxx yyyy zzzz Copy a block of length zzzz from xxxx to yyyy. One byte is copied at a time, starting with the first byte, so if there is an overlap in the two areas data may be destroyed. This command is useful for filling a block with a single value. Make yyyy one greater than xxxx and put the required value into address xxxx using M. Set zzzz to the number of bytes required. Values in Hexadecimal.

E – execute

E xxxx yy – Execute program at xxxx, supplying integer input parameter yy. The USRLOC location depends on the specific MS BASIC ROM implemented. The value can be easily adjusted in the monitor source code. Values in Hexadecimal.

I – intelligent copy

I xxxx yyyy zzzz – Like the Copy command but copies to ensure it will not cause data corruption in an overlapping section. Values in Hexadecimal.

M – modify store

M xxxx – Modify memory starting at address xxxx. The address is displayed followed by the current data. The data value may then be changed. Continuous entry of new data values is supported. Values in Hexadecimal.

Q – quit

Q – Quit to BASIC Immediate Mode (Command Line).

T – tabulate

T xxxx yy – Tabulate (display) a block of memory starting at xxxx and continuing to yy-1. Values in Hexadecimal.

Further Development

As the monitor is written in BASIC it is easy to maintain for all hardware or CPU types wherever MS BASIC is found, and to further develop to support new functions. As there are about 1200 Bytes free below the default origin (in the stripped version), there is space for the user to add their own preferred functions or modifications to existing functions.

8085 CPU based ISSF Target Turner

My club uses pneumatic systems to turn the ISSF Targets, which are controlled by a timing system. One of the members asked me to help build a phone interface for the systems.

The systems are used for many courses of fire, and there are quite a few options to manage. On the front panel there is a RESET, which is tied to the CPU RESET, and a FACE button which returns the targets to face the shooter for scoring.

Target Turner Front Panel.

It turns out that the retired systems are based on a 8085 CPU, in the classic minimum configuration with an 8155 providing 256 Bytes of RAM, and input and output ports. There is a 2732 UV PROM holding the program.

CPU Board for the Target Turner.

So, how do we get these devices online? My thoughts are to add a serial port so that the system can be controlled remotely, then to use an additional WiFi enabled device which can present a web interface to the Range Officer to control proceedings.

Existing ROM

First step is to see what is going on under the hood here. So using the TL-866 the binary code on the ROM was read, and then using z88dk-dis the existing code could be interpreted.

It was interesting to see a very simple method of operation in the existing ROM. The system can only change course of fire if it is RESET, when it reads the position of the switches, and then halts awaiting an interrupt to trigger the course of fire. When the string is finished it will return to repeat the same course of fire.

Timing was based on a delay circuit providing 500ms of delay per unit. Perhaps it is not 100% accurate, but good enough for the application.

I believe that I found a bug that has been latent in the device for the last 40 years. It seems that an address byte was reversed, which would cause a jump into empty addresses. Not sure why no one realised that previously.

Building Serial Interface

I’m planning to build a simple serial interface which will read a character, and then change the course of fire based on that character. Initialising the course of fire can be then done by the web interface, by triggering an interrupt, or by using the wired front panel interface.

After asking the experts I learned that the SID/SOD pins on the 8085 can be used as a bit-bang serial port. In fact that is the standard way of building a serial port for early systems. The code for building serial transmission is included in the early application notes.

The serial code works perfectly at 9600 baud on this 3MHz system. Since only one character will be received and a few transmitted on boot, there are no performance issues to consider.

I’ve written the upgrade code to replicate the front panel selection process, and to allow the system to behave exactly as before when no serial input is available. When a serial command is available, which is triggered by activity on the RST6.5 line, then the system will set a different course of fire than is shown on the front panel. The string can be triggered either by the front panel, or by the interrupt related to the serial interface.

ESP-32 Web Interface

Following a bit of a search the Adafruit HUZZAH32 Breakout presented itself as the best solution to web enable the Target Turner. It can be powered by 5V, and the RX is protected against 5V input by a diode.

The physical interface is going to be a FTDI Basic style connector. Using this connector will allow me to best the 8085 first, and then build the web interface and test separately from the Target Turner. The last step will be to integrate the two devices into a system.

Using the simple serial character interface, it should be possible to present an active web page to the Range Officer.

There are many, eg, tutorials on how to build active web pages using the ESP-32 and WebSockets.

More when this is progressed further.

Three Rings for the Z80

Over the past few years I’ve implemented a number of interfaces for Z80 peripherals based on the principal of the interrupt driven ring buffer. Each implementation of a ring exhibits its own peculiarities, based on the specific hardware. But essentially I have but one ring to bring them all and in the darkness bind them.

This is some background on how these interfaces work, why they’re probably fairly optimal at what they do, and things to consider if extending these to other platforms and devices.

The ring buffer is a mechanism which allows a producer and a consumer of information to do so with a timing to suit their needs, and to do it without coordinating their timing.

The Wikipedia defines a circular buffer, or ring buffer,  as a data structure that uses a single fixed-size buffer as if it were connected end-to-end. The most useful property of the ring buffer is that it does not need to have its elements relocated as they are added or consumed. It is best suited to be a FIFO buffer.

Background

Over the past few years, I’ve used the ring buffer mechanism written by Dean Camera in many AVR projects. These include interrupt driven USART interfaces, a digital audio delay loop, and a packet assembly and play-out buffer for a digital walkie-talkie.

More recently, I’ve been working with Z80 platforms and I’ve taken that experience into building interrupt driven ring buffer mechanisms for peripherals on the Z80 bus. These include three rings for three different USART implementations, and a fourth ring for an Am9511A APU.

But firstly, how does the ring buffer work? For the details, the Wikipedia entry on circular buffers is the best bet. But quickly, the information (usually a byte, but not necessarily) is pushed into the buffer by the producer, and it is removed by the consumer.

The producer maintains a pointer to where it is inserting the data. The consumer maintains a pointer to where it is removing the data. Both producer and consumer have access to a count of how many items there are in the buffer and, critically, the act of counting entries present in the buffer and adding or removing data must be synchronised or atomic.

8 Bit Optimisation

The AVR example code is written in C and is not optimised for the Z80 platform. By using some platform specific design decisions it is possible to substantially optimise the operation of a general ring buffer, which is important as the Z80 is fairly slow.

The first optimisation is to assume that the buffer is exactly one page or 256 bytes. The advantage we have there is that addressing in Z80 is 16 bits and if we’re only using the lowest 8 bits of addressing to address 256 bytes, then we simply need to align the buffer onto a single 256 byte page and then increment through the lowest byte of the buffer address to manage the pointer access.

If 256 bytes is too many to allocate to the buffer, then if we use a power of 2 buffer size, and then align the buffer within the memory so that it falls on the boundary of the buffer size, the calculation for the pointers becomes simple masking (rather than a decision and jump). Simple masking ensures that no jumps are taken, which means that the code flow or delay is constant no matter which place in the buffer is been written or read.

Note that although the number of bytes allocated to the buffer is 256, the buffer cannot be filled completely. A completely full 256 byte buffer cannot be discriminated from a zero fullness buffer. This does not apply where the buffer is smaller than the full page.

With these two optimisations in place, we can now look at three implementations of USART interfaces for the Z80 platform. These are the MC6580 ACIA , the Zilog SIO/2, and the Z180 ASCI interface. There is also the Am9511A interface, which is a little special as it has multiple independent ring buffers, and has multi-byte insertion.

Implementations

To start the discussion, let us look at the ACIA implementation for the RC2014 CP/M-IDE bios. I have chosen this file because all of the functions are contained in one file, which provides an easier overview. The functions are identical to those found in the z88dk RC2014 ACIA device directory.

Using the ALIGN key word of the z88dk, the ring buffer itself is placed on a page boundary, in the case of the receive buffer of 256 bytes, and on the buffer size boundary, in the case of the transmit buffer of 2^n bytes.

Note that although where the buffer is smaller than a full page all of the bytes in the buffer could be used, because the buffer counter won’t overflow, but I haven’t made that additional optimisation in my code. So no matter how many bytes are allocated to a buffer, one byte always remains unused.

Once the buffer is located, the process of producing and consuming data is left to either put or get functions which write to, or read from the buffer as and when they choose to. There is no compulsion for the main program flow to write or read at a particular time, and therefore the flow of code is never delayed. This is optimum from the point of view of minimising delay and maximising compute time. Additional functions such as flushpeek, and poll are also provided to simplify program flow, and init to set up the peripheral and initialise the buffers on first use.

With the buffer available then the interrupt function can do its work. Once an interrupt from the peripheral is signalled, the interrupt code checks to see whether a byte has been received. If not then the interrupt (in the case of the ACIA and ASCI) must have been triggered by the transmit hardware becoming available.

If in fact a byte has been received by the peripheral then the interrupt code recovers the byte, and checks there is room in the buffer to store it. If not, then the byte is simply abandoned. If there is space, then the byte is stored, and the buffer count is incremented. It is critical that these two items happen atomically, which in the case of an interrupt is the natural situation.

If the transmission hardware has signalled that it is free, then the buffer is checked for an available byte to transmit. If none is found then the transmit interrupt is disabled. Otherwise the byte is retrieved from the buffer and written to the transmit hardware while the buffer count is decremented.

If the transmit buffer count reaches zero when the current byte is transmitted, then the interrupt must disable further transmit interrupts to prevent the interrupt being called unnecessarily (i.e. with the buffer fullness being empty).

Multi-byte Receive

Both the SIO and ASCI have multi-byte hardware FIFO buffers available. This is to prevent over-run of the hardware should the CPU be unable to service the receive interrupt in sufficient time. This could happen if the CPU is left with its general interrupt disabled for some time.

In this situation, the SIO receive interrupt and the ASCI interrupt have the capability to check for additional bytes before continuing.

Transmit cut-through

One additional feature worth discussing is the presence of a transmit cut-through, which minimises delay when writing the “first byte”. Because the Z80 processor is relatively slow compared to a serial interface, it is common for the transmit interface to be idle when the first byte of a sequence of bytes is written. In this situation writing the byte into the transmit buffer, and then signalling a pseudo interrupt (by calling the interrupt routine) would be very costly. In the case of the first byte it is much more effective simply to cut-through and write directly to the hardware.

Atomicity

For the ring buffer to function effectively, the atomicity of specific operations must be guaranteed. During an interrupt in Z80 further interrupts are typically not permitted, so within the interrupt we have a degree of atomicity. The only exception to this rule is the Z80 Non Maskable Interrupt (NMI), but since this interrupt is not compatible with CP/M it has never been used widely and is therefore not a real issue.

For the buffer get function the only concern is that the retrieval of a byte is atomically linked to the number of bytes in the buffer.

For the put function it is similar, however as the transmit interrupt needs to be enabled by the put function atomcity is required to ensure that this process is not interrupted.

Interrupt Mode

Across the three implementations there are three different Z80 interrupt modes in play. The Motorola ACIA is not a Zilog Z80 peripheral, so it can only signal a normal interrupt, and can therefore (without some dirty tricks) only work in Interrupt Mode 1. For the RC2014 implementation it is attached to INT or RST38 and therefore when an interrupt is triggered it is up to the interrupt routine to determine why an interrupt has been raised. This leads to a fairly long and slow interrupt code.

The Z180 ASCI has two ports and is attached to the Z180 internal interrupt structure, which works effectively similarly to the Z80 Interrupt Mode 2, although it is actually independent from the Z80 interrupt mode. Each Z180 internal interrupt is separately triggered, however it still cannot discern between a receive and a transmit event. So the interrupt handling is essentially similar to that of the ACIA.

The Zilog SIO/2 is capable of being attached to the Z80 in Interrupt Mode 2. This means that the SIO is capable of being configured to load the Z80 address lines during an interrupt with a specific vector for each interrupt cause. The interrupts for transmit empty, received byte, transmit error, and receive error are all signalled separately via an IM2 Interrupt Vector Table. This leads to concise and fast interrupts, specific to the cause at hand. The SIO/2 is the most efficient of all the interfaces described here.

Multi-byte buffers

For interest, the Am9511A interface uses two buffers, one for the one byte commands, and one for the two byte operand pointers. The command buffer is loaded with actions that the APU needs to perform, including some special (non hardware) commands to support loading and unloading operands from the APU FILO.

A second Am9511A interface also uses two buffers, one for one byte commands, and one for either two or four byte operands. This mechanism in not as nice as storing pointers as in the above driver, but is required for situations where the Z180 is operating with paged memory.

I’ve revised this above solution again and do it with three byte operand (far) pointers, as that makes for a much simplified user experience. The operands don’t have to be unloaded by the user. They simply appear auto-magically…