ReGIS Graphics for Arduino & RC2014

ReGIS graphics can be used to build high quality vector graphical output for embedded (Serial) devices, for example Arduino devices or retro-computers such as the RC2014 running CP/M, with the graphics output being displayed on a Windows 10 desktop. This method uses WSL Ubuntu with XTerm to support graphical output for Serial connected embedded devices, and a Windows based X Server to host the display.

Background

From the Wikipedia, ReGIS, short for Remote Graphic Instruction Set, is a vector graphics markup language developed by Digital Equipment Corporation (DEC) for later models of their famous VT series of computer terminals. ReGIS supports vector graphics consisting of lines, circular arcs, and similar shapes. VT series terminals supporting ReGIS generally allow graphics and text to be mixed on-screen, which makes construction of graphs and charts relatively easy.

ReGIS is useful to display information that can be more easily interpreted as a diagram or a graph, such as plotting planet motion. It can also be used for vector graphics such as the GLX Gears program.

Today, where DEC VT series terminals are impractical to obtain, the easiest method to display ReGIS vector graphics is to use the XTerm terminal emulator. The XTerm fully supports ReGIS, but this capability is not enabled in general release versions. We’ll need to compile our own version with ReGIS enabled.

XTerm was developed for the X Windows System running on versions of Unix or Linux. So to use it on a Linux desktop machine running (e.g. Ubuntu) is quite straightforward. But if we want to use it on a Windows 10 desktop we will need to jump through some additional hoops and that is the point of this story.

The X window system (X) is a client / server windowing system developed in the mid 1980’s to support remote windowing. Unlike most earlier display protocols, X was specifically designed to be used over network connections rather than on an integral or attached display device. X features network transparency, which means an X client program running on a computer somewhere on a network can display its user interface (window) on an X Server running on some other computer on the network. The X Server is typically the provider of graphics resources and keyboard/mouse events to X clients, meaning that the X Server is usually running on the computer in front of a human user, while the X client applications run anywhere on the network and communicate with the user’s computer to request the rendering of graphics content and receive events from input devices including keyboards and mice.

To get XTerm to connect to a Serial interface, provided by an Arduino device or by a retro-computer such as a RC2014 CP/M machine, it is necessary to use a dumb terminal emulator to pipe the Serial bit-stream originating on the Serial-USB (FTDI) interface connection to XTerm without tampering with the non-displayable ESC code sequences that ReGIS uses for signalling. The best option is to use picocom for this role. Most other terminals “cook” the Serial to remove those ascii control characters that can’t be displayed, rather than passing them transparently.

We now have a complete solution to display ReGIS graphics originated on an Arduino or RC2014 on Unix / Linux machines. XTerm, configured with ReGIS enabled, together with picocom to pipe the Serial bit-stream is all that is required to display vector graphics on a Linux desktop.

To use Windows 10 as the desktop we will need to host a Unix / Linux machine running XTerm and picocom somewhere connected to the embedded device, and provide an X Server on the Windows 10 machine to display the window generated by XTerm. This can be done with a completely separate Linux machine, such as a Raspberry Pi or similar, but it is more convenient to do it using a Virtual Machine inside the Windows 10 platform provided by Windows Subsystem for Linux.

Microsoft provides two versions of the WSL. The first version WSL1 uses specific services and drivers to give Linux applications access to the Windows 10 kernel and file system. WSL1 supports access to Serial interfaces and the host file system. The use of the specific services and drivers between the Windows 10 kernel and Linux causes a performance penalty for most Linux applications, so WSL2 was introduced with a lightweight virtual machine, based on Hyper-V, to optimise Linux application performance. However WSL2 does not allow access to Serial interfaces or direct access to the Windows 10 file system. For most applications WSL2 is a better choice, but as we need to access the Serial interfaces directly we need to use WSL1.

With WSL1 installed, we can create an Ubuntu 22.04 LTS based Linux system to install and host our XTerm and picocom applications. And, optionally, we can build z88dk following these instructions, and install z88dk-libraries supporting ReGIS providing a C interface for RC2014 CP/M applications.

From our Windows 10 desktop machine we need to provide an X Server to enable XTerm to display its window on our desktop. There are several options available to do this. Xming (not free) and VcXsrv (open source) are two that are commonly recommended.

Another option to provide a Windows 10 X Server is MobaXterm. MobaXterm is proprietary, but is available for free for home users. It provides both an X Server (based on X.org) and secure shell (SSH) capability. SSH is particularly required where we need to connect to XTerm and picocom (the X client) running on separate Raspberry Pi hardware, but it is also useful for Windows 10 WSL. However, we can access WSL Ubuntu directly through its own terminal to initialize the XTerminal session, so for WSL Ubuntu the SSH capability is just a very convenient “nice-to-have”.

Implementation

Lets go through the steps required to get this all working…

Windows Subsystem for Linux & Ubuntu

The Windows Subsystem for Linux (WSL) allows Linux / Unix distributions to use the Windows host computer. This instruction is based on Ubuntu 22.04 LTS, but other distributions will be similar.

There is a longstanding bug in connecting USB to Serial interfaces to the WSL2 system, so it is best to use the WSL1 as a platform for development. Install WSL1 following the standard Microsoft instructions.

See Windows Subsystem for Linux enabled.

Once WSL is installed then go to the Microsoft store and install Ubuntu 22.04 LTS. There are options for both Windows 10 and Windows 11, so choose the suitable alternative.

Ubuntu 22.04 LTS for WSL installed from the Windows Store.

Confirm that you have both WSL1 and Ubuntu 22.04 LTS installed using. > wsl -l -v

And then set the default distribution to Ubuntu-22.04. > wsl -s Ubuntu-22.04

Setting the WSL version and Linux distribution.

Then launch the Ubuntu 22.04 LTS subsystem using the Windows Start menu command, to configure the new distribution. Update and upgrade the packages using apt, and make any other adjustments you need. Set up directories that you will use to build XTerm and (optionally) build z88dk. Building z88dk will ensure that you have the prerequisites to prepare .COM files for later uploading to your RC2014 CP/M.

At this stage DO NOT install XTerm or picocom from the Ubuntu repositories.

Install MobaXterm for Windows 10

From the MobaTek webside download the installer version of MobaXterm for home use. Note if you are a corporate or business, please pay for a great product.

Launch the installer. It will automatically recognise your installed WSL and Ubuntu 22.04 LTS instances and will automatically configure SSH (local) terminal access for them. It will also allow you to connect directly to your Serial attached Arduino or RC2014 retro-computer if you configure a serial connection.

Use MobaXterm to connect to your Ubuntu 22.04 installation. The advantage of doing this is that the DISPLAY environment variable is automatically set, so that later XTerm can find the Windows 10 screen with no further setting needed. However, the direct Ubuntu 22.04 terminal can alternatively be used later for other purposes with no problem.

Preparing XTerm to support ReGIS

XTerm is the only known software solution supporting ReGIS. But it doesn’t support ReGIS in the default build. You’ll need to enable ReGIS support yourself by building and installing a customised version.

From your WSL Ubuntu command line use the below recipe. If you have not built any prior code on your WSL Ubuntu installation you may need to add additional packages, as signalled by the configuration step. Use apt to install them as needed.

> sudo apt install -y build-essential libxaw7-dev libncurses-dev libxft-dev
> wget https://invisible-island.net/datafiles/release/xterm.tar.gz
> tar xf xterm.tar.gz
> cd xterm-373
> ./configure --enable-regis-graphics
> make
> sudo make install

Install picocom for Ubuntu 18.04 LTS

From Release Version 3.0 picocom implemented advanced terminal control system calls, which are unsupported by the Windows 10 WSL1 implementation. This means that the Ubuntu 22.04 LTS supported release for picocom can’t be used with WSL1.

To work around this problem the Ubuntu 18.04 LTS Release Version 2.2 of picocom for amd64 can be installed. From your Ubuntu terminal, download the picocom 2.2-2 package and install it using dpkg.

Once it is installed, it is also useful to install the lrzsz package which provides XMODEM capability for picocom and other terminals.

It is also useful to hold the picocom package to ensure that it is not automatically upgraded to a later (unusable for our purpose) version. That is done with apt-mark.

> wget http://launchpadlibrarian.net/324629316/picocom_2.2-2_amd64.deb
> sudo dpkg -i picocom_2.2-2_amd64.deb
> sudo apt install lrzsz
> sudo apt-mark hold picocom

Using ReGIS

To initialise XTerm for VT340 emulation, connecting to Windows 10 Serial device COMx, from the MobaXterm terminal use this command string. This command assumes we’re using 115200 baud 8n2 with RTS flow control, and we’re going to use XMODEM protocol to send binaries to our RC2014 or other device. Check the picocom manual to adjust the serial interface configuration to your own needs.

> xterm +u8 -geometry 132x50 -ti 340 -tn 340 -e picocom -b 115200 -p 2 -f h /dev/ttySx --send-cmd "sx -vv"

Alternatively to use XTerm for VT125 emulation use this command string.

> xterm +u8 -geometry 132x50 -ti 125 -tn 125 -e picocom -b 115200 -p 2 -f h /dev/ttySx --send-cmd "sx -vv"

At this point a new XTerm window should appear on the Windows 10 desktop, supported by the MobaXterm X Server.

Assuming that you have a program which can generate ReGIS graphics already installed on your RC2014 (or Arduino), such as the demo program provided here, then it is time to start the program from the CP/M command prompt. Otherwise, it makes sense to use XTerm (with picocom) to access your RC2014 and upload the provided ReGIS demo program, or other any other program, using XMODEM protocol and the XM.COM or XMODEM.COM program on your CP/M installation, and then launch the program from the CP/M command prompt.

The resulting XTerm window (example image shown is planet-motion) should look similar to the image below.

Windows 10 with XTerm window displaying planet-motion.

And that is then all that needs to be done to enjoy high resolution vector graphics generated by a Serial connected embedded device such as an RC2014 running CP/M or an Arduino device.

The ReGIS library for z88dk has been developed to support ReGIS on the RC2014 and other Z80 CP/M machines. Therefore programs can easily generate the required Serial codes using the C functions provided.

The ReGIS library for Arduino is hardware agnostic, and supports all Arduino architectures. It can be installed from the Arduino Library Manager.

Yet Another Z180 Project (YAZ180)

I’m thinking about a new project, something a little unusual but still with a rich history of information upon which to base the build. On Tindie, I found the RC2014 project which is a build of a Z80 platform but based on some modern components. That got me thinking. My next project must be a Z80 based project.

Why the Z80? Well, it was at one stage the most used CPU in the world, which leads to the great depth of information and experience available for designs, hardware, and software. Technically, it is advanced enough to avoid the need for a large number of ancillary chips, multiple power supplies, and multi-phase clocking that the 80088080, and other older chips needed, but still it is complex enough that in doing a project I’ll feel like I’m actually building a computer.

This year marks 40 years (yes 40 years, since launch in July 1976) of the Z80, and still as a design and platform it looks like it will continue to be relevant into the future. So rather than building yet-another-ARM project. It looks like I will be marking the 40th anniversary of the Z80 with a new build.

Zilog_Z80

An early Z80, manufactured in June 1976.

What kind of project? The RC2014 project is an interesting starting point. It is quite simple, being a compact and robust implementation of Grant Searle’s 6 Chip Z80 Computer, but provides more resources than most of the 1980’s Z80 projects offer. After looking at some projects others have done, I think that I should aim to build something a little like a “Back to the Future” Z80 project, a DeLorean (which also appeared as a prototype in 1976) with a fusion reactor.

RC2014 Cylon

RC2014 Cylon

As an outcome I’d like the solution to be able to interact with modern interfaces such as I2C, Ethernet, SPI, and USB, with access to a large physical memory space, with great performance, and yet retain the ability to be a single-step-able experimental platform with LED bus indicators. Single stepping is something that you can’t do with an Arduino, and it is a real differentiation.

Whilst it would be attractive to design in some old-school interface devices, like the Am9511A APU or a Super I/O device, in light of today’s environment the wouldn’t contribute very much to the outcome. So the focus will be squarely on modern I/O.

Also, there is a temptation to build a CP/M system with full disk management. But again there are plenty of CP/M systems around. I’d like to build something more  of an embedded platform, that doesn’t require mass storage to run applications. It would be good if most of the basic interfaces could fit on one board, to avoid the need to build address and data bus extension. I think this will simplify the design significantly.

Design process

This is going to be an iterative process. The first step will be to build the RC2014 project, and test that I can program it.

It will be important to learn a little about Z80 assembler. Later, I may modify the RC2014 project platform.

Then, I’ll lay out a through hole prototype with minimal functionality, to test some performance ideas. If they work on through hole, then they’ll work with SMD. Using through hole also allows me to quickly fix logic or wiring errors that would take a new spin with SMD.

Finally, I’ll build a SMD device that miniaturizes the solution, and makes it more robust.

Processor selection

The Z80 has been built continuously for 40 years, and in that time many manufacturers have produced silicon and several clones have been created. The Z80 range been continuously improved through the Hitachi 64180, to the Zilog Z180, and the Zilog Aclaim eZ80 devices. Each increment has integrated more accessory components, and improved the instruction throughput, as well as increased the clock rate.

Looking at the options available, the original Z80 requires logic to get started. So rather than building serial ports and timers, it looks like the Z180 might be the right place to start. So why not go all the way to the eZ80? Well the eZ80 is not dissimilar to an AVR ATmega device, with all of the system components integrated into the one device. Using an eZ80 CPU wouldn’t be like building a computer at all. It would be much more like building an Arduino, and I’ve been there already.

Out of the Z180 options, I would select the Z8S180 (at 33MHz) because it integrates sufficient material to get started (Timers, Interrupts, MMU, & USARTs), and leaves me the option to add complexity as I get going.

Memory selection

A little research on the processor and available memory provides me with some cornerstones for the design. I will use Flash memory for the  program storage, and static RAM for the system memory. Historically, UV erasable PROM and dynamic RAM would have been used. One advantage of static RAM is that the solution is fully single-step-able. Meaning, I’ll be able to watch the address bus and data bus process each instruction as the Program Counter marches along.

The Z180 can address up to 1MB of physical address space, and it makes sense to provide the full physical memory possible. The price of 1MB of SRAM or of 256kB of Flash is almost nothing these days. As the Z80 logical address space is only 64kB, the Z180 has an inbuilt MMU to manage its physical memory to logical memory mapping.

Memory mapping

NOTE 2017 October: The details in the below discussion are no longer accurate. The memory design has moved on substantially, but this discussion remains for information only. Happily, even completely changing the memory mapping can be done with a software reconfiguration of the Memory / IO logic device.

To keep things simple (in hardware), we can use the MMU available in the Z180 to map the physical memory locations on 20 address lines, into logical addresses that suit us. The MMU can map 4kB pages of physical memory into two relocatable logical locations in the Z80 logical address space. These are called the Banked and the Common 1 locations. The Common 0 memory location begins at physical address 0x00000, and continues to the beginning of Banked memory, which then continues to the Common 1 memory address space.

Therefore the hardware (or physical mapping) will show that the 256kB Flash is located from 0x00000 to 0x3FFFF, and the 1MB RAM from 0x00000 to 0xFFFFF but the lower quarter of the RAM address space mostly masked by Flash. For programming we’ll have to move this around.

When the Z80 starts up it always begins from physical and logical address 0x0000. Therefore it is typical to put the program storage in the lower address range, and the RAM in the upper range. Given the use of the MMU available in the Z180 we initially can map the SRAM into the upper 32kB of logical address space, using the Common 1 bank and CBAR setting, leaving the first 32kB of Flash in the lower address range.

One difficulty is that there are only 3 memory spaces available, so if we want to have a C stack, global buffers and queues, and global data, then we need to put some SRAM in the Common 0 address space. Let’s uncover 8kB of 1MB SRAM and place this from 0x2000 to 0x3fff to provide this C stack and global variable data frame.

At some stage I assume I’ll want to use all of the additional Flash and SRAM available, so I’ll have to integrate programming for MMU bank switching, and RAM heap/stack switching when the need arises. At least initially there will be a statically programmable range of between 8kB Flash with 56kB SRAM to 56kB Flash with 8kB SRAM available within the logical address space, depending on the MMU initialisation settings.

To program the Z180 with a soft programmer, the physical memory addresses will need to be juggled around (glue logic) to disable the Flash and SRAM from appearing in the first 8kB Bytes of the address space. The USB soft programmer will provide program codes in the form of a remote boot-loader to load the contents of programs into the Flash, by moving the entire physical flash through a 4kB or 8kB Banked page. SRAM located in the lower 32kB address range will be loaded with a program to enable buffering and page writing of the desired programs.

Physical Address Range Run Mode Programming Mode
$00000 – $01FFF Flash (8,192B of 256kB) USB (8,192B)
$02000 – $03FFF SRAM (8,192B of 1MB) SRAM (516,096B of 1MB)
$04000 – $3FFFF Flash (245,760B of 256kB)
$40000 – $7FFFF  SRAM (768kB of 1MB)
$80000 – $BFFFF Flash (256kB of 256kB)
$C0000 – $FFFFF SRAM (256kB of 1MB)
Logical Address Range Run Mode Programming Mode
$0000 – $1FFF Flash (8,192B, Common 0) USB (8,192B)
$2000 – $3FFF SRAM (8,192B, Common 0) SRAM (24,576B, Common 0)
$4000 – $5FFF Flash (8,192B, Common 0)
$6000 – $7FFF Flash (8,192B, Banked)
$8000 – $FFFF SRAM (32,768B, Common 1) Flash (32,768B, Banked)

The programming mode (address mapping) will be entered by either a button press, or by signalling from the USB – USART interface lines. The process is to invert the Address 19 line, shifting the physical address location of Flash, and mute Address 0-12 to prevent memory being read from these locations, which allows the USB – USART FIFO to provide program opcodes. Mute by disabling the CE on both Flash and SRAM when A13 through A19 are 0. Then configure the MMU to bring the Flash into Banked logical addresses, using the SRAM to buffer 4kB page writes to Flash memory.

Using the ATMEL WinCUPL tool, it is pretty straightforward to convert the above memory mapping and below logic mapping in CUPL language description to JEDEC format that can be handled by a MiniPro TL866 EEPROM & PLD Programming tool, and programmed into an ATF16V8C  or Lattice GAL16V8D device.

MEMORY_PLD_A

CUPL Memory / IO Definitions

MEMORY_PLD_B

CUPL Memory / IO Decoding

This memory and IO mapping needs to be augmented by a secondary logic mapping for managing programming, single step, and other functions, which will be programmed into a second ATF16V8C or GAL16V8D device. The logic mapping will allow automatic programming initiation via the FT245R USB interface.

LOGIC_PLD_A

CUPL Logic Definitions

LOGIC_PLD_B

CUPL Logic Decoding

Using EEPLD devices will save significant PCB real estate, and will allow me to compensate for minor logic errors after the fact.

screenshot-from-2017-01-28-20-06-10

Schematic for Memory GAL

yaz180-logic-schematic

Schematic for Logic GAL

FCPU selection

The best Flash memory we can get easily is 55ns timing. This is bettered by SRAM, with 45ns access timing. By converting this timing to a bus frequency we can achieve 20MHz or slightly better, but allowing for some buffering or address logic delay it would be better to keep the system bus under the equivalent of 20MHz.

Using this system bus speed of approximately 20MHz then poses a question; which is the right speed? Some references point out that the Z180 is very poor at holding the correct USART rate when the CPU clock is not a magic multiple of the USART rate. This is the same issue that the AVR ATmega device faces when its USART is not driven by a magic frequency clock. Therefore, lets us set  the CPU crystal oscillator to 18.432MHz, being most appropriate magic frequency for the following design.

The Z180 system clock (or PHI) can be double, equivalent, or half the rate of the crystal oscillator base rate. Starting with 18.432MHZ oscillator clock, depending on the CPU Control Register (CCR), the system bus clock (PHI) can be halved and is operated at 9.216MHz. This is slow enough to allow most peripherals to interact with the CPU. Internally, the system bus clock (PHI) can be doubled to operate the CPU at 36.864MHz, by setting both the Clock Multiplier Register (CMR) and the CCR. This rate is slightly overclocking the Z180, but that’s what we live for. We don’t build slow computers.

As we are using 45ns SRAM and 55ns Flash, we will have to insert either 1 or 2 memory wait states when operating at 36.864MHz. This is unavoidable, because of the access speed of both memory devices. Faster SRAM may be available, but faster Flash is quite hard to obtain.

I/O Mapping

I have been thinking about the whole idea of system modularity. Actually, I don’t think the traditional method of building a backplane is such a good idea for what I want to achieve. Extending the address bus a long distance means that I’ll be investing in design and timing issues, that I’m not really sure I know how to solve. So, let’s focus on a smaller design, with on just one board for the time being.

As a computer always needs to be extended and interact with the real world, I think it would be good to add modern user interfaces to the solution. Use Address 15-13 to provide I/O selection options on the CPU Board, using the remaining output pins from the Memory ATF16V8C or GAL16V8D. This will allow flexibility to latch data into the Hex Display, or trigger breakpoints using #M1 and #Wait to allow Single Step execution from any code point.

I/O Address Range Chip Select (A15,A14,A13) Device
$0000 – $1FFF DO NOT USE ($0, b000) Internal I/O
A7-A0
Internal INT
$0000 – $00FF Registers
$2000 – $3FFF BREAK ($1, b001) Break Point Toggle Single Step Mode
$4000 – $5FFF #DIO_CS ($2, b010) PIO 82C55
A1-A0
$4000 – $4003 Registers
$6000 – $7FFF EXPANSION ($3, b011) Hold for Expansion
$8000 – $9FFF #I2C_CS2 ($4, b100) PCA9665
A1-A0
#INT2
$8000 – $8003 Registers
$A000 – $BFFF #I2C_CS1 ($5, b101) PCA9665
A1-A0
#INT1
$A000 – $A003 Registers
$C000 – $DFFF #APU_CS ($6, b110) APU_CS – Am9511A-1
A0 & #WAIT
#INT0
$C000 – $C001 Registers
$E000 – $FFFF EXPANSION ($7, b111) Hold for Expansion

Included I/O features & BOM

CPU

SRAM

Flash

Single Step – The Z-80’s #M1 pin is useful for building logic to single-instruction step the machine. You do this using the memory ready signal on #M1 to clear a 7474 flip-flop, which is clocked by a Single Step signal, to produce a #WAIT signal for the CPU. That stops the machine on the opcode fetch cycle with the address showing on the address bus and the opcode byte showing on the data bus. To move the machine ahead, you clock the flip-flop which releases the #WAIT signal, until the next #M1 clears the 7474  again.

Reset

Memory & Addressing Logic Glue

  • Programmable Logic PLD – Digikey ATF16V8B

USB – Flash programming interface

  • USB-Parallel Bus FTDI245RQ
    Note that  the Write strobe is confusing, but assume ACTIVE LOW.

USB – USART interface

  • USB-Serial FTDI232RQ

Hex 7 Segment Display – 5x Address Digits – 2x Data Digits

  • Decoder DM9368 – Digikey DM9368N-ND
  • LED Display VDMY10C0 & VDMG10C0 – Digikey VDMY10C0TR-ND
  • OR
  • LED Display with Decoder TIL311A – eBay only.

General Digital Input / Output – Being able to read and write simple digital levels is an important thing. So let’s include the 82c55 PIO device.

  • Intel CMOS 82C55 Programmable IO CP82C55A – Digikey CS82C55AZ-ND

I2C interface – This is the most important interface, which provides many extension options, and a plethora of Grove System sensors. Unfortunately the 5V nature of the system precludes using the newest really fast, deep buffer, devices with multiple bus I2C 1MHz bus interfaces, so provide 2x devices to support different applications (eg. video output and sensor acquisition on separate buses). Use the #INT1 & #INT2 interrupts.

Arithmetic Processor Unit – This lovely old chip is now also 40 years old, and was the world’s first APU (or FPU). Potentially, it is too slow to contribute, but still we’ll build it in. Need to provide a 3MHz clock to drive it (FAPU = FCPU/6), and connect its #PAUSE to #WAIT. #END connected to #INT0. RESET is ACTIVE HIGH.

AM9511A-1DC

Am9551A-1 3MHz APU

Bus interface

  • Address & Control – Octal Buffer Driver sn74abt541b – Digikey 296-14668-1-ND
  • Data – Octal Bus Transceiver sn74abth245 – Digikey 296-4140-1-ND

Power Supplies

Excluded I/O features & BOM

Ethernet – Wiznet W5300 Direct address mode requires 3FF of address space. Configure the #INT0 interrupt and implement the DMAC1 I/O to move packets quickly. Exclude this from the initial build, as it is quite a complicated 100 pin device, and it needs 3.3V supply.

USB & SPI interface – provide mass storage capability using either USB or SPI interface devices. The CH37x series is not very well documented, or readily available. There are other options for I2C-SPI bridging which can provide an SPI interface if needed.

  • CH376S – Incorporating the USB functions as described in CH375, and CH372

ADC interface – Better, higher resolution, faster chips are available with I2C interfaces.

  • 8 bit ADC with internal reference MAX158 – Digikey MAX158ACPI+-ND

Video Interface – Can be done using an external I2C device.

  • LCD Video – via I2C with a FTDI EVE

Super I/O – Floppy Disk & IDE Controller – Too hard and doesn’t bring much value to the table. Floppy drives are hard to find, and there are already 2x USART ports available on the Z8S180.

Software & References

Z80 Info

z88dk Development Kit

Small Device C Compiler

SASM Softools – Z180 (MMU aware) C compiler. Commercial Licence.

Programming the Z80 by Rodnay Zaks

The Undocumented Z80

Logic Families

FreeRTOS 9.0.0 for eZ80

Hardware Design notes

Well quite a few iterations on my aspirations and thoughts have resulted in a YAZ180_v1 schematic, that I think is now basically frozen. Some of my design decisions follow.

After deciding not to use the TIL311 LED display devices because they are not easily obtainable, I found the alternative solution using the DM9368 display driver chip and LED display devices was very consuming of space on the board. Since the board had to be no greater than a maximum of 10cm by 16cm to fit within the constraints of my Eagle Hobby licence, I decided to buy some TIL311 devices in advance, and then with the comfort of stock in hand I could use them for the design. Also, they add significantly to the retro-chique.

I would have loved to add the Wiznet W5300 device to the design to provide high speed Ethernet capability, but with a 100 pin VLSI it was just too complicated for the initial design. Next time.

I added a 82c55 interface chip. Knowing that it is very slow, requiring 2 wait states to drive it, is one thing. But the advantage of having multiple latch-able input and output ports, off one fairly compact integrated device made up for that problem. As it is a 1970s device it is also retro-chique accretive.

Designing using CUPL and the Atmel or Lattice PLDs is in my opinion much simpler than tracing schematics and thinking through 7400 series NOR, NAND gates to get the desired addressing logic. Writing the logic in “c” like syntax is much less taxing on the brain.

After deciding to implement no bus drivers, and then building a solution adding drivers and termination to every address and data line, I have rationalised down to bus drivers on data, lower address, and control lines. These are the few lines that appear on all memory and I/O devices, and are taken to every point on the board. The upper address lines only appear on the PLD, SRAM, and Flash, which are very low load modern devices, so there is no point to buffer them. Using ABT logic limits the differential delay on the address lines to 2 ns, and since the read and write select lines are also buffered, there is no issue timing issue created.

And here is the first schematic, in pdf YAZ180_V1. Errors and omissions did occur.

Hardware Layout notes

With the first major layout session behind me, I have the following notes.

Address&DataRouted

Address & Data to SRAM and FLASH

Change the SN74ABT244 for the SN74ABT541. The only difference is the pin ordering, with the 244 being optimal for counter-flowing signals, and the 541 being optimal for unidirectional signals (physically). This will help layout.

Getting 20 address lines and 8 data lines to appear on the SMD SRAM memory is challenging. I will probably reassign the address lines to suit the Z8S180 pin-out, rather than spending hours untangling the lines. It won’t be too easy to do this with the Flash device, but as it is in a PLCC through-hole socket it comes inherently with “vias” making reaching it half as hard as SMD.

Input pin and I/O pins on Z8S180 include weak latch circuits to prevent excessive current draw by the receiver, if the pin is not externally driven. External pull-up or pull-down resistors should not be less than 15 kOhms to ensure that the resistors can control the state of the latch when power is supplied.

Address&Data

All Address and Data lines routed.

afewmoreairwires

A few more air-wires remain, that will need thought.

Following about 10 hours of juggling, just two wires remain together with a plan to resolve them tomorrow.

twoairwires

Just two air-wires remain.

Finished, except for the detailed checking, which usually results in some changes. But, it is done.

The screen grabs below show with the Vcc Layer in Grey, and GND Layer in Green. I have added some basic traces on those layers to close off the routing, which are misleading. The layers are actually completely filled, normally.

This slideshow requires JavaScript.

Bill of Materials

Well, building this retro-computing machine is not quite as cheap as an ATMEL micro-controller board. But then again, it is not quite in the same league, with 1MB of RAM, and 256kB of Flash storage, together with an APU and 9 digits of LED display.

I have ordered all of the components, except the Am9511A-1 APU and the TIL311 LED display devices, from Digikey. The BOM is detailed in this link. To add to BOM, this there are the Am9511A-1 and the TIL311 devices which can be found on eBay or other auction sites.

CoL3S3MUkAAksZ5

And, the PCB looks great!

Bringing up the Prototype

The build and bringing up of the board will continue, once I get the new SMD oven sorted out.

Well, the board has been soldered, and everything looks good. The SMD oven wasn’t as useful as I thought. It turns out that I can solder the fine pitch of the SRAM and FTDI devices fairly well with a normal soldering iron.

My PLD programmer won’t work with ATMEL devices, so I’ve been waiting so long for some Lattice GAL devices to be shipped to me. Over 8 weeks in transit. Finally they’ve landed. So I soldered the rest of the board together.

I’ll have to debug my CUPL programming of the GALs to get everything working, but at least there’s no “magic smoke”.

The TIL311 LEDs are very hot, because they draw so much current. LEDs really have come a long way in efficiency over the past 30 years.

Update 9 Jan 2017. Using this Github Repo, I’ve successfully sent characters to the ASCI. It lives!

Update Friday, January 13th 2017. Initial RAM testing passed!

And now based on this testing, I’ve been able to get the ASCI0 (and ASCI1) working, so that I have NASCOM BASIC Ver 4.7 (1978) running on my 2016 designed computer!

This is a Mandelbrot program running in NASCOM Basic v4.7.

yaz180-glow

Glowing, in a dark  room.

Three major items remain to complete, before a PCB revision.
  • Completing the “in circuit programming” tools, to programme the Flash memory without removing it.
  • Writing drivers for the Intel 82C55 PIO device.
  • Writing drivers for the AMD Am9511A APU device.
Following the PCB revision, I’ll work on the I2C platform, and its drivers.

Next up, programming Flash memory using the FTDI FT245 USB-Parallel port. This was working, but some bad soldering made the on-board FT245 inoperable. Testing was completed using an external device.

Testing for the 82C55 PIO device is now done. It can do a lot more than this, but the proof of life is completed. Time to move on.

Update February 20, 2017. Despite initial misgivings, that the Am9511A APU was wired incorrectly, the proof of life for the APU is now done.

For a long time, I could see that the APU was trying, but just wasn’t making it. Two issues were involved.

Firstly, all of the datasheets assume 74 logic is being used to create the chip select from 3 to 8 address decoding, or similar solutions. What they don’t point out is that the Am9511A  APU chip select MUST be generated from the address lines, because the Z80 (Z180) IORQ line has incorrect timing to drive the Am9511A. The Am9511A needs an 30ns of chip select and address validity after WR is raised, which the Z80 IORQ timing doesn’t provide. Having infinite logic in a GAL available, I had (incorrectly) included the IORQ line in the APU_CS logic. Whilst I’d recognised the issue, it took three lines in an obscure 1981 PhD dissertation to focus the spotlight of clarity. Thank you Mr Haining.

Secondly, the Am9511A samples I have on hand won’t work at 3.072MHz. I guess they were cutting very close to the limit in the day.

img_0626

YAZ180 v1 full configuration.

I’ve finished an initial revision of the PCB to fix all of the known errata. I’ll sit on it a bit until I can double check that everything is good.

Errata

The power supply section was done very simply in the V1 build. I underestimated the requirement for 5V power, thinking that 1A should be sufficient. Before I built the V1 board I realised my error and replaced the 5V 1A regulator with a AP1506 that can supply 3A. I also changed the inductor to a lower inductance and higher current version.

I’ve now revised the power supply  section to use LM2596 3A device for the 5V supply, as this device has a larger input voltage range (up to 40V) , and I have inserted thermal vias to improve the heat spreading into the back side of the board.

The 12V supply only needs to provide about 90mA for the APU. I’ve reduced the capacity of the regulator down to 500mA, and have saved some space by using a LM2674 in SOIC-8 format. Whilst the Am9511A gets uncomfortably warm, the power supply doesn’t.

Also, I simply forgot to provide a 3.3V supply, that is required for the I2C interface devices. So this has been repaired with a 1117 linear regulator, supplied by the 5V regulator. It is the same device I used in the Goldilocks Analogue as I have some parts in hand.

I’ve swapped the USB connectors to male versions. Can’t understand why I built YAZ180 V1 with female connectors. Just silly.

I’ve swapped the location of the RESET and SINGLE_STEP buttons. It makes more sense to have the buttons doing the same function closer together.

The Z8S180 requires that its DCD0 be held low, before the USART0 can transmit. This pin is now tied to GND.

The Logic GAL device, in Registered Mode, requires to have its OE held low. This pin is now tied to GND.

To generate a RESET signal, the reset request input to the Logic GAL can’t be on Pin1 in Registered Mode, so the RESET_REQ signal was moved to Pin14. Pin1 in Registered Mode is connected exclusively to the CLK for the GAL flip-flops. In order to clock the flip-flop implemented in the YAZ180 programming mode logic, the RESET line was connected to Pin1.

The Am9511A-1 although specified at 3MHz, doesn’t pass tests at 3.072MHz. So it will have to be de-rated to 1:8 from the Z8S180 system clock. This takes the clock speed down to 2.304MHz. The clock divider chip will be replaced by a sn74ls93 from Digikey 296-3749-5-ND.

For the purposes of enjoyment, I’ve connected the Z8S180 DMA engine DREQ1 and TEND1 lines to the Am9511A SVREQ and SVACK lines. The SVREQ requires a NOT to generate the inverted Z8S180 signal for DREQ1. Although there are very few bytes loaded into the Am9511A, this should allow them to be written by the DMA engine.

Both AMD Am9511A and Intel 82C55 need the address and chip select lines to remain valid for a period after the RD or WR signal has gone high. Therefore we need to remove the existence of a valid IORQ signal from the calculation of their select line, and just rely purely on their address decoding (and absence of MREQ). The CUPL coding has been adjusted

The Am9511A supports 1 I/O Wait State, as RD & WR to Pause is 100ns. The 82C55 requires 2 I/O Wait states, as RD is 200ns minimum.

Update February 24th.

The new PCB has been ordered.

I’ll start a new post for the YAZ180 V2 when the PCB is back from manufacturing.

Arduino FreeRTOS

Arduino FreeRTOS Logo

For a long time I have been using the AVR port of FreeRTOS as the platform for my Arduino hardware habit. I’ve written (acquired, stolen, and corrupted) a plethora of different drivers and solutions for the various projects I’ve built over the last years. But, sometimes it would be nice to just try out a new piece of hardware in a solid multi-tasking environment without having to dive into the datasheets and write code. Also, when time is of the essence rewriting someone’s existing driver is just asking for stress and failure.

So recently, with an important hack-a-thon coming up, I thought it would be nice to build a robust FreeRTOS implementation that can just shim into the Arduino IDE and allow me to use the best parts of both environments, seamlessly.

Arduino IDE Core is just AVR

One of the good things about the Arduino core environment is that it is just the normal AVR environment with a simple Java IDE added. That means that all of the AVR command line tools used to build Arduino sketches will also just work my AVR port of FreeRTOS.

Some key aspects of the AVR FreeRTOS port have been adjusted to create the seamless integration with the Arduino IDE. These optimizations are not necessarily the best use of FreeRTOS, but they make the integration much easier.

FreeRTOS needs to have an interrupt timer to trigger the scheduler to check which task should be using the CPU, and to fairly distribute processing time among equivalent priority tasks. In the case of the Arduino environment all of the normal timers are configured in advance, and therefore are not available for use as the system_tick timer. However, all AVR ATmega devices have a watchdog timer which is driven by an independent 128kHz internal oscillator. Arduino doesn’t configure the watchdog timer, and conveniently the watchdog configuration is identical across the entire ATmega range. That means that the entire range of classic AVR based Arduino boards can be supported within FreeRTOS with one system_tick configuration.

The Arduino environment has only two entry point functions available for the user, setup() and loop(). These functions are written into an .ino file and are linked together with and into a main() function present in the Arduino libraries. The presence of a fixed main() function within the Arduino libraries makes it really easy to shim FreeRTOS into the environment.

The main() function in the main.c file contains a initVariant() weak attribute stub function prior to the internal Arduino initialisation setup() function. By implementing an initVariant() function execution can be diverted into the FreeRTOS environment, after calling the normal setup() initialisation, by simply continuing to start the FreeRTOS scheduler.

int main(void) // Normal Arduino main.cpp. Normal execution order.
{
init();
initVariant(); // Our initVariant() diverts execution from here.
setup(); // The Arduino setup() function.

for (;;)
{
loop(); // The Arduino loop() function.
if (serialEventRun) serialEventRun();
}
return 0;
}

Firstly, this initVariant() function is located in the variantHooks.cpp file in the FreeRTOS library. It replaces the weak attribute function definition in the Arduino core.

void initVariant(void)
{
setup(); // The Arduino setup() function.
vTaskStartScheduler(); // Initialise and run the FreeRTOS scheduler. Execution should never return to here.
}

Secondly, the FreeRTOS idle task is used to run the loop() function whenever there is no unblocked FreeRTOS task available to run. In the trivial case, where there are no configured FreeRTOS tasks, the loop() function will be run exactly as normal, with the exception that a short scheduler interrupt will occur every 15 milli-seconds (configurable). This function is located in the variantHooks.cpp file in the library.

void vApplicationIdleHook( void )
{
loop(); // The Arduino loop() function.
if (serialEventRun) serialEventRun();
}

Putting these small changes into the Arduino IDE, together with a single directory containing the necessary FreeRTOS v10.x.x files configured for AVR, is all that needs to be done to slide the FreeRTOS shim under the Arduino environment.

I have published the relevant files on Github where the commits can be browsed and the repository downloaded. The simpler solution is to install FreeRTOS using the Arduino Library Manager, or download the ZIP files from Github and install manually as a library in your Arduino IDE.

Getting Started with FreeRTOS

Ok, with these simple additions to the Arduino IDE via a normal Arduino library, we can get started.

Firstly in the Arduino IDE Library manager, from Version 1.6.8, look for the FreeRTOS library under the Type: “Contributed” and the Topic: “Timing”.

Arduino Library Manager

Arduino Library Manager

Ensure that the most recent FreeRTOS library is installed. As of writing that is v10.3.0-4.

FreeRTOS v8.2.3-6 Installed

Example of FreeRTOS v8.2.3-6 Installed

Then under the Sketch->Include Library menu, ensure that the FreeRTOS library is included in your sketch. A new empty sketch will look like this.

ArduinoIDE_FreeRTOS

Compile and upload this empty sketch. This will show you how much of your flash is consumed by the FreeRTOS scheduler. As a guide the following information was compiled using Arduino v1.8.5 on Windows 10.

// Device: loop() -> FreeRTOS | Additional Program Storage
// Uno: 444 -> 6592 | 20%
// Goldilocks: 502 -> 6684 | 5%
// Leonardo: 3462 -> 9586 | 23%
// Yun: 3456 -> 9580 | 23%
// Mega: 662 -> 6898 | 2%

Now test and upload the Blink sketch, with an underlying Real-Time Operating System. That’s all there is to having FreeRTOS running in your sketches. So simple.

Next Steps

Blink_AnalogRead.ino is a good way to take the next step as it combines two basic Arduino examples, Blink and AnalogRead into one sketch with in two separate tasks. Both tasks perform their duties, managed by the FreeRTOS scheduler.

#include

// define two tasks for Blink and AnalogRead
void TaskBlink( void *pvParameters );
void TaskAnalogRead( void *pvParameters );

// the setup function runs once when you press reset or power the board
void setup()
{
// Now set up two tasks to run independently.
xTaskCreate(
TaskBlink
, "Blink"; // A name just for humans
, 128      // This stack size can be checked and adjusted by reading the Stack Highwater
, NULL
, 2        // Priority, with 3 (configMAX_PRIORITIES - 1) being the highest, and 0 being the lowest.
, NULL );

xTaskCreate(
TaskAnalogRead
, "AnalogRead";
, 128      // Stack size
, NULL
, 1        // Priority, with 3 (configMAX_PRIORITIES - 1) being the highest, and 0 being the lowest.
, NULL );

// Now the task scheduler, which takes over control of scheduling individual tasks, is automatically started.
}

void loop()
{
// Empty. Things are done in Tasks. Never Block or delay.
}

/*--------------------------------------------------*/
/*---------------------- Tasks ---------------------*/
/*--------------------------------------------------*/

void TaskBlink(void *pvParameters) // This is a task.
{
(void) pvParameters;

// initialize digital pin 13 as an output.
pinMode(13, OUTPUT);

for (;;) // A Task shall never return or exit.
{
digitalWrite(13, HIGH); // turn the LED on (HIGH is the voltage level)
vTaskDelay( 1000 / portTICK_PERIOD_MS ); // wait for one second
digitalWrite(13, LOW); // turn the LED off by making the voltage LOW
vTaskDelay( 1000 / portTICK_PERIOD_MS ); // wait for one second
}
}

void TaskAnalogRead(void *pvParameters) // This is a task.
{
(void) pvParameters;

// initialize serial communication at 9600 bits per second:
Serial.begin(9600);

for (;;)
{
// read the input on analog pin 0:
int sensorValue = analogRead(A0);
// print out the value you read:
Serial.println(sensorValue);
vTaskDelay(1); // one tick delay (15ms) in between reads for stability
}
}

Next there are a number of examples in the FreeRTOS Quick Start Guide.

One last important thing you can do is to reduce device power consumption by not using the default loop() function for anything more than putting the MCU to sleep. This code below can be used for simply putting the MCU into a sleep mode of your choice, while no tasks are unblocked. Remember that the loop() function shouldn’t ever disable interrupts and block processing.

#include // include the Arduino (AVR) sleep functions.

loop() // Remember that loop() is simply the FreeRTOS idle task. Something to do, when there's nothing else to do.
{
// Digital Input Disable on Analogue Pins
// When this bit is written logic one, the digital input buffer on the corresponding ADC pin is disabled.
// The corresponding PIN Register bit will always read as zero when this bit is set. When an
// analogue signal is applied to the ADC7..0 pin and the digital input from this pin is not needed, this
// bit should be written logic one to reduce power consumption in the digital input buffer.

#if defined(__AVR_ATmega640__) || defined(__AVR_ATmega1280__) || defined(__AVR_ATmega1281__) || defined(__AVR_ATmega2560__) || defined(__AVR_ATmega2561__) // Mega with 2560
DIDR0 = 0xFF;
DIDR2 = 0xFF;
#elif defined(__AVR_ATmega644P__) || defined(__AVR_ATmega644PA__) || defined(__AVR_ATmega1284P__) || defined(__AVR_ATmega1284PA__) // Goldilocks with 1284p
DIDR0 = 0xFF;

#elif defined(__AVR_ATmega328P__) || defined(__AVR_ATmega168__) || defined(__AVR_ATmega8__) // assume we're using an Arduino with 328p
DIDR0 = 0x3F;

#elif defined(__AVR_ATmega32U4__) || defined(__AVR_ATmega16U4__) // assume we're using an Arduino Leonardo with 32u4
DIDR0 = 0xF3;
DIDR2 = 0x3F;
#endif

// Analogue Comparator Disable
// When the ACD bit is written logic one, the power to the Analogue Comparator is switched off.
// This bit can be set at any time to turn off the Analogue Comparator.
// This will reduce power consumption in Active and Idle mode.
// When changing the ACD bit, the Analogue Comparator Interrupt must be disabled by clearing the ACIE bit in ACSR.
// Otherwise an interrupt can occur when the ACD bit is changed.
ACSR &= ~_BV(ACIE);
ACSR |= _BV(ACD);

// There are several macros provided in the header file to actually put
// the device into sleep mode.
// SLEEP_MODE_IDLE (0)
// SLEEP_MODE_ADC (_BV(SM0))
// SLEEP_MODE_PWR_DOWN (_BV(SM1))
// SLEEP_MODE_PWR_SAVE (_BV(SM0) | _BV(SM1))
// SLEEP_MODE_STANDBY (_BV(SM1) | _BV(SM2))
// SLEEP_MODE_EXT_STANDBY (_BV(SM0) | _BV(SM1) | _BV(SM2))

set_sleep_mode( SLEEP_MODE_IDLE );

portENTER_CRITICAL();
sleep_enable();

// Only if there is support to disable the brown-out detection.
#if defined(BODS) && defined(BODSE)
sleep_bod_disable();
#endif

portEXIT_CRITICAL();
sleep_cpu(); // good night.

// Ugh. I've been woken up. Better disable sleep mode.
sleep_reset(); // sleep_reset is faster than sleep_disable() because it clears all sleep_mode() bits.
}

o that’s all there is to it. There’s nothing more to do except to read the FreeRTOS Quick Start Guide.
Further reading with manicbug, and by searching on this site too.

General Usage

FreeRTOS has a multitude of configuration options, which can be specified from within the FreeRTOSConfig.h file. To keep commonality with all of the Arduino hardware options, some sensible defaults have been selected.

The AVR Watchdog Timer is used with to generate 15ms time slices, but Tasks that finish before their allocated time will hand execution back to the Scheduler. This does not affect the use of any of the normal Timer functions in Arduino.

Time slices can be selected from 15ms up to 500ms. Slower time slicing can allow the Arduino MCU to sleep for longer, without the complexity of a Tickless idle.

Watchdog period options:

  • WDTO_15MS
  • WDTO_30MS
  • WDTO_60MS
  • WDTO_120MS
  • WDTO_250MS
  • WDTO_500MS

Note that Timer resolution is affected by integer math division and the time slice selected. Trying to accurately measure 100ms, using a 60ms time slice for example, won’t work.

Stack for the loop() function has been set at 192 bytes. This can be configured by adjusting the configIDLE_STACK_SIZE parameter. It should not be less than the configMINIMAL_STACK_SIZE. If you have stack overflow issues, just increase it. Users should prefer to allocate larger structures, arrays, or buffers using pvPortMalloc(), rather than defining them locally on the stack. Or, just declare them as global variables.

Memory for the heap is allocated by the normal malloc() function, wrapped by pvPortMalloc(). This option has been selected because it is automatically adjusted to use the capabilities of each device. Other heap allocation schemes are supported by FreeRTOS, and they can used with additional configuration.

Never use the Arduino delay() function in your Tasks, as it burns CPU cycles and doesn’t place the Task into Blocked State, allowing the Scheduler to place other lower priority tasks into Running State. Use vTaskDelay() or vTaskDelayUntil() instead. Also never delay or Block the Arduino loop() as this is being run by the FreeRTOS Idle Task, which must never be blocked.

Errors

  • Stack Overflow: If any stack (for the loop() or) for any Task overflows, there will be a slow LED blink, with 4 second cycle.
  • Heap Overflow: If any Task tries to allocate memory and that allocation fails, there will be a fast LED blink, with 100 millisecond cycle.

Compatibility

  • ATmega328 @ 16MHz : Arduino UNO, Arduino Duemilanove, Arduino Diecimila, etc.
  • ATmega328 @ 16MHz : Adafruit Pro Trinket 5V, Adafruit Metro 328, Adafruit Metro Mini
  • ATmega328 @ 16MHz : Seeed Studio Stalker
  • ATmega328 @ 16MHz : Freetronics Eleven, Freetronics 2010
  • ATmega328 @ 12MHz : Adafruit Pro Trinket 3V
  • ATmega32u4 @ 16MHz : Arduino Leonardo, Arduino Micro, Arduino Yun, Teensy 2.0
  • ATmega32u4 @ 8MHz : Adafruit Flora, Bluefruit Micro
  • ATmega1284p @ 20MHz : Freetronics Goldilocks V1
  • ATmega1284p @ 24.576MHz : Seeed Studio Goldilocks V2, Seeed Studio Goldilocks Analogue
  • ATmega2560 @ 16MHz : Arduino Mega, Arduino ADK
  • ATmega2560 @ 16MHz : Freetronics EtherMega
  • ATmega2560 @ 16MHz : Seeed Studio ADK
  • ATmegaXXXX @ XXMHz : Anything with an ATmega MCU, really.

Files and Configuration

  • Arduino_FreeRTOS.h : Must always be #include first. It references other configuration files, and sets defaults where necessary.
  • FreeRTOSConfig.h : Contains a multitude of API and environment configurations.
  • FreeRTOSVariant.h : Contains the AVR specific configurations for this port of FreeRTOS.
  • heap_3.c : Contains the heap allocation scheme based on malloc(). Other schemes are available and can be substituted (heap_1.c, heap_2.c, heap_4.c, and heap_5.c) to get a smaller binary file, but they depend on user configuration for specific MCU choice.

Goldilocks Analogue – Wrap Up

This is the final item on the Goldilocks Analogue as a design and production exercise.

Thank you for pledging on the Kickstarter Project page. Closed on November 19th 2015, with 124% funding. Now that the Kickstarter Pledges have been shipped, the Goldilocks Analogue is available on Tindie.

I sell on Tindie

I’ve been updating this post with the pre-production and production experience over the past few months.

The pre-production design materials, correcting the errata noted on Prototype Version 4, have been finalised and sent to Seeed Studio.
GoldilocksAnalogue_Seeed_20151120

The interim backer report is out, and now manufacturing quantities for procuring parts ready for the February production run have been finalised.

An updated version of the Goldilocks Analogue User Manual is available, and the Production Testing Document has reached Version 2 following inclusion of Windows 10 test procedures.

Revised Arduino IDE Variant files for Goldilocks Analogue using the Arduino core are available on Github.

Also additional optional libraries to provide support for each of the advanced features of the Goldilocks Analogue are available in the Arduino IDE Library Manager.

Arduino IDE compatibility testing revealed only a few remaining issues related to support of the ATmega1284p used in the Goldilocks Analogue. Two issues have been raised and resolved as 2 pull requests on the main Arduino IDE development path.

Both these issues have been committed into the Arduino main git tree, and they have landed in Arduino IDE Release 1.6.8.

The only remaining known issue is the limitation in the configuration of the Tones() code to use only Timer 2. We would like to use Timer 2 for the RTC. There is no other option but to use this Timer for the RTC support, so it would be good if Tones() could be configured to use a different timer.

Testing

Rather than going back over old ground, I’ll just be testing the pre-production version against the Version Prototype 4, to ensure that the things that should have improved, are improved, and that nothing has become broken.

Power Supplies

In the image below, Channel 1 (yellow) is 4.47mV of noise present at the output capacitor for the power supply, and Channel 2 (blue) is the 3.47mV supply noise present on a test Vcc pin closest to the MCU.

The significant improvement in noise level for the pre-production version at the MCU is similar to that achieved for the Prototype 4 (even slightly better), and this is probably due to reduced capacitive coupling into the ground plane by removing the ground copper from directly under the main supply inductor.

GA PP Power Supply Noise

GA PP: 5V Power Supply Noise

Remembering, for context, that 4mV is still the same order of the least significant bit for a 5V 10 bit ADC sampler, as found in the ATmega1284p, and a one bit change in the LSB of the MCP4822 input generates a 1mV change in output.

Checking the other power supplies on the board, Channel 1 (yellow) is the 3.3V positive supply, provided by a linear regulator. This supply is not used for analogue components, so the 4.0mV noise level is not critical, but never the less it is slightly less than on the Version P4.

Channel 2 (blue) below shows the -3V supply for the Operational Amplifier. This shows that the supply voltage noise of 5.9mV after filtering by a first order LC filter further smooth this supply. Compared to the Version P4 with no filtering (shown below) the noise is reduced substantially. The Version P4 shows a 10mV ramp, because it is a capacitive charge switching device. The addition of this LC filter was the one substantial change from the Prototype 4, so it is good to see the positive effect on the negative supply input to the Op Amp.

GA PP 3.3V and -3V Power Supply Noise

GA PP: 3.3V and -3V Power Supply Noise

Goldilocks Analogue Prototype 4 - 3.3V & -3V Supply Noise

GA P4: 3.3V & -3V Power Supply Noise

Analogue Output

The standard test that I’ve been using throughout the development is to feed in a 43.1Hz Sine wave generated from a 1024 value 16 bit LUT. The sampling rate is 44.1kHz, which is generated by Timer 1 to get the closest match.

The spectra and oscilloscope charts below can be directly compared to the testing done with prototype Version P4 and earlier versions of the Goldilocks Analogue.

The below chart shows the sine wave generated at the output of the Op Amp. This is exactly as we would like to see, with no compression of either the 4.096V peak, or the 0V trough.

Goldilocks Analogue – 43Hz Sine Wave – Two Channels – One Channel Inverted

GA PP: 43Hz Sine Wave – Two Channels – One Channel Inverted

Looking at the spectra generated up to 953Hz it is possible to see harmonics from the Sine Wave, and other low frequency noise.

The spectrum produced by the Goldilocks Analogue shows most distortion is below -70dB, and that the noise floor lies below -100dB. The pre-production sample shows slightly higher noise carriers than the Version P4, but the difference is not substantial.

GA PP 953Hz

GA PP: 43.1Hz Sine Wave – 953Hz Spectrum

In the spectrum out to 7.6kHz we are looking at the clearly audible range, which is the main use case for the device.

The Goldilocks Analogue has noise carriers out to around 4.5kHz, but they are all below -80dB. After 4.5kHz the only noise remains below -100dB.

GA PP 7k6Hz

GA PP: 43.1Hz Sine Wave – 7k6Hz Spectrum

The spectra out to 61kHz should show a noise carrier generated by the reconstruction frequency of 44.1kHz.

The Goldilocks Analogue shows the spectrum maintains is low noise level below -90dB right out to the end of the audible range, and further out to the reconstruction carrier at 44.1kHz.

GA PP 61kHz

GA PP: 43.1Hz Sine Wave –  61kHz Spectrum

The final spectrum shows the signal out to 976kHz. We’d normally expect to simply see the noise floor, beyond the 44.1kHz reconstruction carrier noise.

The Goldilocks Analogue has a noise carrier at around 210kHz, probably generated by the -3V supply. The noise carrier at 340kHz is generated through the 5V SMPS supply, and is absent when powered by USB socket. Aside from the two carriers mentioned, there is no further noise out to 976kHz.

GA PP 976kHz

GA PP: 43.1Hz Sine Wave – 976kHz Spectrum

The Pre-production analogue output works as specified, and is essentially identical to the analogue output on the Prototype 4. It can maintain the 72dB SNR required, of which it should theoretically be capable.

Goldilocks Analogue – Testing 4

Recap

I’ve been working on the Goldilocks Analogue now for so long that its been the centerpiece of my coding evenings for the past 18 months. This is the first time that I’ve designed a piece of hardware, and I’ve managed to make many mistakes (or learnings) along the way, so I think that every step of the process has been worthwhile.

Goldilocks Analogue Prototype 4 - Rotating

Goldilocks Analogue Prototype 4 – Rotating

In this project, I’ve learned about Digital to Analogue Converters, Operational Amplifiers, Voltage translation, Switching Power Supplies, and most importantly have gained a working knowledge of the Eagle PCB design tools.

Version History

The original Goldilocks Project was specifically about getting the ATmega1284p MCU onto a format equivalent to the Arduino Uno R3. The main goal was to get more SRAM and Flash memory into the same physical footprint used by traditional Arduino (pre-R3) and latest release Uno R3 shields.

The Goldilocks project achieved that goal, but the resulting ATmega platform still lacked one function that I believe is necessary; a high quality analogue capability. The world is analogue, but having an ADC capability, without having a corresponding digital-to-analogue capability, is like having a real world recorder (the ADC capability) with no means to playback and recover these real world recordings.

A major initiative of the Goldilocks is to bring an analogue capability to the Arduino platform via a DAC, so this project was called the Goldilocks Analogue. Following a Kickstarter campaign, the Goldilocks Analogue is now available on Tindie.

I sell on Tindie

Version 1 implemented the MCP4822 DAC buffered with an expensive, very musical, Burr Brown Operational Amplifier. Although the DAC performed exactly as designed, I neglected to provide the Op Amp with a negative supply rail, so it could not approach 0V required to reproduce the full range of output available (0v to 4.096V) from the DAC. That was a mistake.

Version 1 also reverted the USB to Serial interface to use a proper FTDI FT232R device from the ATmega32u2 MCU (the solution preferred now by Arduino). Using the ATmega32u2 (or ATmega16u2) to interface with anything not running at its own 16MHz CPU clock rate is broken and doesn’t work. This is because the ATmega devices don’t produce correct serial when their CPU clock is not a clean multiple of the USART required rate. I learned this the hard way with the Goldilocks Project.

Finally Version 1 implemented a buffered microSD card interface, using devices suitable to convert from 5V to 3.3V (SCK, MOSI, CS), and 3.3V to 5V (MISO) for the SPI interface. This was a correct implementation, but later I removed it because sometimes I’m really too smart for my own good.

Updated - Goldilocks Analogue

Goldilocks Analogue – Version P1

Some significant rework of the analogue section was required for Version 2. Firstly, I decided that audio was a significant use case for the Goldilocks Analogue so it was worth while reducing the cost of the Op Amp, and sharing that expenditure with a special purpose headphone amplifier, working in parallel with the Op Amp.

The TPA6132A2 headphone amplifier was found, and implemented into the prototype. This “DirectPath” device removes the need for large output capacitors in the signal path, as it provides a zero centred output from a single supply voltage.

Version 2 also had a lesser but fully adequate TS922A Op Amp for providing DC to full rate signals, buffering the MCP4822 DAC. I had learned that Op Amps need to be provided with a negative supply rail, if they are to achieve 0V output under load.

To create a negative supply rail for the Version 2 Op Amp, I used a pair of TPS60403 voltage inverters producing -5V from the main power supply, and then fed that signal into a TPS72301 regulator configured to produce -1.186V. This design worked very well, but needed 3 devices to produce the output and also quite a lot of board space.

Finally, for Version 2, I changed the microSD voltage translation to use the TXS0104 (TXB0104) level translation products. These special purpose devices enable the bidirectional transfer of signals over voltage transitions. Normally these two device types (TXB / TXS) work very well, but somehow I couldn’t ever get either version to work properly, which caused the microSD card to never work. This exercise was a failure, and was reverted to normal buffer logic for Version 3.

Goldilocks Analogue Version 2

Goldilocks Analogue – Version P2

Version 3 was supposed to be my final prototype prior to putting this into a crowd funding project. I made some fairly simple changes, and I was pretty happy with the outcome of Version 2 as it was.

Version 3 changed the MCP4822 DAC interface to use the USART1 interface present on the ATmega1284p MCU. Using the USART1 in MSPI Mode allows the DAC to operate independently of the normal SPI bus. This frees up timing constraints on the SPI interface so that slow microSD cards can be read and streamed to the DAC, or SPI graphics interfaces can be used as in the case of the synthesiser demonstration.

As part of the testing, I found that the ATmega1284p would operate successfully at 24.576MHz. This is a magic frequency because it allows an 8 bit timer to reproduce all of the major audio sampling rates related to 48kHz. I’ve tested using this frequency and everything is working fine.

Obviously there was some feature creep over the course of experimentation, so I decided to add two SPI memory device layouts to the PCB. This would allow up to 23LC1024 256kByte of SRAM to be addressed, together with AT25M01 256kByte of EEPROM, for example. Putting these layouts on the board meant that the JTAG interface had to be moved to the back of the PCB. This was actually not a bad thing, as it would have been impossible to use the JTAG interface with a Shield in place anyway.

I was asked to beef up the 3.3V supply capability of the Goldilocks Analogue, so I added a AP1117 device capable of up to 1A supply, upgrading from 150mA.

Goldilocks Analogue Version 3

Goldilocks Analogue – Version P3

Version 4 added the feature of amplified Microphone Input. Using the MAX9814 Mic Amp electret microphones present in the normal headsets used with smart-phones can act as an audio input to the Goldilocks Analogue. The amplified signal is presented on Pin A7, which falls outside of the normal Arduino Uno R3 footprint. For completeness, I also added a level shifted non-amplified signal (for line-in) to Pin A6.

Because of the extra microphone amplifier circuitry, I needed to reduce the footprint of the negative supply rail. So I used a regulated -3V device LTC1983 to produce the negative supply rail required for the Op Amp. There are now 3 regulators on the Goldilocks Analogue, 5V at up to 2.4A, 3.3V at up to 1A, and -3V at 100mA.

Version 4 also saw the last of the through hole components gone. Both the main crystal and the 32kHz clock crystal were reformatted into surface mount technology. This reduces the flexibility of the platform, but it makes manufacturing a bit easier.

Goldilocks Analogue - Version P4

Goldilocks Analogue – Version P4

Version 4 is the end. There will be some minor adjustments to the board for the production version. But, finally, I think this is it.

Unboxing

Prototype 4 back from manufacturing

Prototype 4 back from manufacturing

Errata

There is no power supply jumper provided. Add this to the BOM.

The EEPROM is built with SOIC8 Wide Body, but it should be Narrow Body. Fix the BOM.

Labels for the recommended power supply Voltage are partially obscured by power jack. Move the labels.

The main crystal doesn’t oscillate, because the Eagle library had an incorrect footprint.
Tested to work by rotating the crystal on its existing footprint.

Respecify the main MCU crystal to the Atmel recommended type.

Change C2 and C3 to 15pF in line with Atmel recommendation.

Change C1 and C11 to 12pF in line with Atmel recommendation.

Change R18 to 100kOhm, to bias the Analogue Line in correctly.

Add a simple LC filter to the -3.3V supply, using the known inductor and 10uF capacitor.

Tie Mic Gain to AVcc with a separable link, to produce 40dB default gain. This is the setting required for most normal microphones, so will improve the “out of box” experience for most users.

Testing

Front

Version P4 – Front

Back

Version P4 – Back

Power Supplies

First, looking at power supply noise, we’ve got a slightly better result for noise at the power supply over the Prototype 1. Prototype 1 used the EUP3476 Switched Mode supply device. Problems with obtaining ready supply of this device led to changing it to the AP6503, which is pin compatible but needs slightly different voltage selection resistors. In contrast to the Arduino Uno and other devices, this is very good.

We remember that 1mV represents the Voltage change in the least significant bit of the 12 bit MCP4822. The least significant bit of the ATmega1283p 10 bit ADC is about 4mV. Sampling voltages similar to the noise level inherent on the platform will not generate any further accuracy.

Power Supply GA 4

GA P4: 5V Power Supply Noise

Channel 1 (yellow) is 4.0mV of noise present at the output capacitor for the power supply, and represents the lowest noise inherent in the supply. Channel 2 (blue) is the 5.6mV supply noise present on a test pin closest to the MCU.

The significant improvement in noise level for the GA4 version at the MCU may be due to the currently decreased system clock rate, and therefore is subject to confirmation.

Power Supply GA1

GA P1: 5V Power Supply Noise

Checking the other power supplies on the board, Channel 1 (yellow) is the 3.3V positive supply, provided by a linear regulator. This supply is not used for analogue components, so the 4.3mV noise level is not critical.

Channel 2 (blue) below shows the -3V supply for the Operational Amplifier. This shows a 10mV supply voltage ramp, generated because it is a capacitive charge switching device. This is not particularly good, and it will be worth adding an LC filter to attempt to further smooth this supply, for the production Goldilocks Analogue.

Goldilocks Analogue Prototype 4 - 3.3V & -3V Supply Noise

GA P4: 3.3V & -3V Power Supply Noise

Microphone Input

Microphone amplifier works.

Interestingly, when the amplifier is set to 60dB gain (default setting, Channel 2, blue) it oscillates at 144kHz when not connected to anything, while it doesn’t do this with the normal gain setting of 40dB (Channel 1, yellow) based on the amplification required for typical smartphone headphones. In either case, as soon as the Mic input is grounded, the Mic output reduces to less than 4mVAC on the correct bias of 1.25VDC.

Suggest to make 40dB amplification the default setting for the production Goldilocks Analogue, but allowing the connection to be segmented for increased amplification if required.

GA4 - Mic Oscillation Comparison 60dB and 40dB

GA4 – Mic Oscillation Comparison at 40dB (CH1) and 60dB (CH2) amplification.

Line-In (PA6) seems to be biased at 1.07V rather than 1.25V. Check calculation again… Doh! R18 should be 100kOhm.

Analogue Output

The standard test that I’ve been using throughout the development is to feed in a 43.1Hz Sine wave generated from a 1024 value 16 bit LUT. The sampling rate is 44.1kHz, which is generated by Timer 1 to get the closest match.

The spectra and oscilloscope charts below can be directly compared to the testing done with previous prototype versions of the Goldilocks Analogue.

The below chart shows the sine wave generated at the output of the Op Amp. This is exactly as we would like to see, with no compression of either the 4.096V peak, or the 0V trough.

Goldilocks Analogue – 43Hz Sine Wave – Two Channels – One Channel Inverted

Goldilocks Analogue P4 – 43Hz Sine Wave – Two Channels – One Channel Inverted

Looking at the spectra generated up to 953Hz it is possible to see harmonics from the Sine Wave, and other low frequency noise.

The spectrum produced by the Goldilocks Analogue shows most distortion is below -70dB, and that the noise floor lies below -100dB.

Goldilocks Analogue – 43.1Hz Sine Wave – 953Hz Spectrum

Goldilocks Analogue P4 – 43.1Hz Sine Wave – 953Hz Spectrum

Comparing the same DAC output channel A from Op Amp (Channel 1 – Blue) with the same Headphone Amp Left (Channel 2 – Red), we see slightly more noise carriers.

Goldilocks Analogue P4 – 43.1Hz Sine Wave – 953Hz Spectrum

Goldilocks Analogue P4 – 43.1Hz Sine Wave – 953Hz Spectrum – Headphone in Red

In the spectrum out to 7.6kHz we are looking at the clearly audible range, which is the main use case for the device.

The Goldilocks Analogue has noise carriers out to around 4.5kHz, but they are all below -80dB. After 4.5kHz the only noise remains below -100dB.

Goldilocks Analogue – 43.1Hz Sine Wave – 7.6kHz Spectrum

Goldilocks Analogue P4 – 43.1Hz Sine Wave – 7.6kHz Spectrum

The spectra out to 61kHz should show a noise carrier generated by the reconstruction frequency of 44.1kHz.

The Goldilocks Analogue shows the spectrum maintains is low noise level below -90dB right out to the end of the audible range, and further out to the reconstruction carrier at 44.1kHz.

Goldilocks Analogue – 43.1Hz Sine Wave – 61kHz Spectrum

Goldilocks Analogue P4 – 43.1Hz Sine Wave – 61kHz Spectrum

The final spectrum shows the signal out to 976kHz. We’d normally expect to simply see the noise floor, beyond the 44.1kHz reconstruction carrier noise.

The Goldilocks Analogue has a noise carrier at around 210kHz. Aside from the single carrier mentioned, there is no further noise out to 976kHz.

Goldilocks Analogue – 43.1Hz Sine Wave – 976kHz Spectrum

Goldilocks Analogue P4 – 43.1Hz Sine Wave – 976kHz Spectrum

Analogue output works as specified. It can maintain the 72dB SNR required, of which it should theoretically be capable.

SPI Devices

MicroSD works.

SPI SRAM / EEPROM devices work.

Tests for REWORK

These tests are for the factory rework of the prototype boards.

Check 5V Power Supply

Plug 7-23VDC positive centre into J1.
CHECK: that there is 5VDC available on H15 DC Pin.

GAV4-5VDC

Check 3.3V and -3V Power Supply

Add a jumper to J1 from the centre pin to DC Pin.

CHECK: that 5VDC, 3.3VDC, and -3VDC is available at the test points indicated.

GA4V-Powered

Check for Internal RC Oscillator Function

Connect an AVRISP Mk2 in circuit programmer to the ICSP Socket.

Use the following command to enable the CLKO on PB1, and set other fuses correctly.

avrdude -pm1284p -cavrisp2 -Pusb -u -Ulfuse:w:0x82:m -Uhfuse:w:0xd8:m -Uefuse:w:0xfc:m

CHECK: Using an oscilloscope or counter, check that the CLKO signal on PB1 is between 7MHz and 8.1MHz (nominal value 8MHz).

GAV4-CLKO

Rework 24.576MHz Crystal & Capacitors

Identify Crystal Y2, and capacitors C2 & C3 and C1 & C11.

Screenshot from 2015-10-06 23:21:16

IMPORTANT: Y2 must be ROTATED 90 degrees (either direction) from standard footprint.

Rework Y2 to the 24.576MHz SMD Crystal Digikey 644-1053-1-ND device.

Rework C2 & C3 to 15pF.

Rework C1 & C11 to 12pF.

Check External Crystal Oscillator Function

Connect an AVRISP Mk2 in circuit programmer to the ICSP Socket.

Use the following command to enable the CLKO on PB1, and set other fuses to External Full Swing Oscillator with BOD.

avrdude	-pm1284p -cavrisp2 -Pusb -u -Ulfuse:w:0x97:m -Uhfuse:w:0xd8:m -Uefuse:w:0xfc:m

CHECK: Using an oscilloscope or counter, check that the CLKO signal on PB1 is 24.576MHz.

GAV4-CLKO

Use the following command to disable the CLKO on PB1, and set other fuses to External Full Swing Oscillator with BOD.

avrdude	-pm1284p -cavrisp2 -Pusb -u -Ulfuse:w:0xd7:m -Uhfuse:w:0xd8:m -Uefuse:w:0xfc:m

Rework R18

CHECK: DC voltage on Port A6 (ADC Input 6). Expected value 1.07VDC.

Identify resistor R18. R18 needs to be reworked to 100kOhm.

Screenshot from 2015-10-06 23:21:02

GAV4-Rework

CHECK: DC voltage on Port A6 (ADC Input 6). Confirm the expected value 1.25VDC.

Load Bootloader

Connect an AVRISP Mk2 in circuit programmer to the ICSP Socket.

Use the following command to program a bootloader using the provided HEX file.

avrdude -pm1284p -cavrisp2 -Pusb -Uflash:w:GABootMonitor.hex:a

Connect the Goldilocks Analogue to a USB port and open a serial terminal at 38400 baud to the attached FTDI USB /dev/ttyUSB0 interface.

CHECK: Enter !!! into the serial terminal, immediately following a board RESET (using the RESET button). This will enable the Boot Monitor. These are the boot monitor commands.

Bootloader>
Bootloader>H Help
0=Zero addr (for all commands)
?=CPU stats
@=EEPROM test
B=Blink LED
E=Dump EEPROM
F=Dump FLASH
H=Help
L=List I/O Ports
Q=Quit
R=Dump RAM
V=show interrupt Vectors
Y=Port blink

Bootloader>
Bootloader>? CPU stats
Goldilocks explorer stk500v2
Compiled on = Oct  7 2015
CPU Type    = ATmega1284P
__AVR_ARCH__= 51
GCC Version = 4.8.2
AVR LibC Ver= 1.8.0
CPU ID      = 1E9705
Low fuse    = D7
High fuse   = D8
Ext fuse    = FC
Lock fuse   = FF

Close the serial terminal program (disconnect from the /dev/ttyUSB0 device).

Load Test Program

With the USB interface connected to the Goldilocks Analogue, and the DTR switch in the right most position (closest to DTR text; opposite position to the pictured position).

GAV4-DTR

Use the following command to program a Test Suite using the provided HEX file.

avrdude	-pm1284p -cwiring -P/dev/ttyUSB0 -b38400 -D -v -Uflash:w:GATestSuite.hex:a

Plug a standard smartphone headset (with microphone) with a 3.5mm TRRS connector in LRGM (left, right, ground, mic) configuration into the socket.
CHECK: Sound (echo of input) should be heard from the headphones, when speaking into the microphone.

Connect the Goldilocks Analogue to a USB port and open a serial terminal at 38400 baud to the attached FTDI USB /dev/ttyUSB0 interface.
CHECK: Information on each Task’s stack “Highwater” mark should be seen on the serial terminal.

Goldilocks Analogue – Prototyping 4

Just over 6 months since the third iteration of the Goldilocks Analogue Prototyping was started, and now I’ve finished the design for a forth iteration. The Goldilocks Analogue Prototype 4 design is now finished, and I’m working out what the final bill of materials will cost to assemble into a final outcome. Testing for the Prototype 4 has begun, and everything is working as expected.

The third prototype was completely successful, and produced the improvements I was looking for. The use of the MSPI Mode on USART1 means that two SPI interfaces can be run in parallel, allowing the DAC to hold its tight timing requirements while slower SD card transactions take place (for example). This was proven through the implementation of a direct digital synthesiser, controlled by a SPI controlled touch screen.

Goldilocks Analogue - Prototype 3

Goldilocks Analogue – Prototype 3

Revision for Prototype 4

The Prototype 3 was supposed to be the final version, and it achieved everything that I set out in the original design specifications. But, then there was some feature creep.

Prototype 4 back from manufacturing

Prototype 4 back from manufacturing

In discussing the TRS 3.5mm audio socket, a better more robust TRRS version was found. The realisation that it would be possible to have a microphone input, without requiring additional board space, led me to experiment with the Adafruit breakout board for the MAX9814 Microphone amplifier, and then to build a very simple Walkie-Talkie demonstration to test the use of audio input (with the integrated ADC), simultaneously with audio output (via the DAC).

Once the use of the MAX9814 was proven, I could implement a reference circuit as an input option. The amplified microphone input is connected to Pin 7 of the Analogue Port A. Conveniently, the MAX9814 delivers the amplified signal at +1.25V with a 2V peak to peak signal. This allows the sample to fall into the range of 0V to 2.56V internal reference voltage for the ATmega ADC, providing the maximum sampling resolution with no further adjustments.

The MAX9814 also includes an integrated microphone biasing circuitry, which is designed to support normal electret microphones.

 

As an alternative input functionality, the Prototype 4 also allows for LINE level inputs. I have used a voltage divider to reference the input signal to 1.25V DC. Although a 2V peak to peak Line level input will overload the Microphone amplifier, rendering the output signal on PA7 unusable, the LINE input is routed to Pin 6 on Port A will have exactly the right range to sample using the internal ATmega ADC voltage reference.

Both Port A Pin 6 and Pin 7 are outside of the normal Arduino UNO R3 footprint, so the normal functionality of the UNO footprint is not affected by either the two input options. And if desired, the connection can be separated at a solder-jumper on the rear of the board.

The additional space required for the microphone and line level input circuitry has been created by simplifying the negative supply rail for the Op-Amp. The Op-Amp is provided to support DC to 50k sample per second analogue output. To achieve a linear output from 0v to 4.096V the Op-Amp requires a negative supply voltage. In this revision, I have used a single LTC1983 regulated supply device to provide the negative -3V supply rail. The outcome should be equivalent to the Prototype 3 solution, which used 3 devices.

Board Layout

The final board layout has been completed, and the board is now in discussion for manufacturing.

The GoldilocksAnalogueP4Schematic in PDF format.

Front of board (All Layers)

Front

This is the front of the board showing all of the layers, and the general layout of the devices. The board layout is pretty busy, but still there is sufficient prototyping capability to take all the port pins off-board, or provide on-board breakouts.

GoldilocksAnalogue_TopSilk

Top Layer

This is the Top Layer, which contains all of the devices. There are no devices on the Bottom Layer.

Route 2 (GND) Layer

The Ground Layer on Route 2 is unchanged from previous iterations, and provides a solid platform for low noise analogue circuits.

Route 15 (Vcc) Layer

The Route 15 power supply layer contains all of the supply lines, providing 5V regulated, 5V filtered for analogue AVcc, 3.3V regulated, and -3V regulated.

Bottom Layer

Back

All the pin outs are defined on the Bottom Layer. In addition to the items previously mentioned, there are two small locations where the Line and Microphone inputs can be cut, and allow the full functionality of PA6 and PA7 to be recovered.

Pin Mapping

This the map of the ATmega1284p pins to the Arduino physical platform, and their usage on the Goldilocks Analogue

Arduino
UNO R3
328p Feature 328p Pin 1284p Pin 1284p Feature Comment
Analog 0 PC0 PA0
Analog 1 PC1 PA1
Analog 2 PC2 PA2
Analog 3 PC3 PA3
Analog 4 SDA PC4 PA4 PC1 I2C -> Bridge Pads
Analog 5 SCL PC5 PA5 PC0 I2C -> Bridge Pads
Reset Reset PC6 RESET Separate Pin
Digital 0 RX PD0 PDO RX0
Digital 1 TX PD1 PD1 TX0
Digital 2 INT0 PD2 PD2 INT0 / RX1 USART1
Digital 3 INT1 / PWM2 PD3 PD3 INT1 / TX1 USART1
-> MCP4822 SPI MOSI
Digital 4 PD4 PD4 PWM1 / XCK1 16bit PWM
-> MCP4822 SPI SCK
Digital 5 PWM0 PD5 PD5 PWM1 16bit PWM
Digital 6 PWM0 PD6 PD6 PWM2
Digital 7 PD7 PD7 PWM2
Digital 8 PB0 PB2 INT2 <- _INT/SQW DS3231
Digital 9 PWM1 PB1 PB3 PWM0
Digital 10 _SS / PWM1 PB2 PB4 _SS / PWM0 SPI
Digital 11 MOSI / PWM2 PB3 PB5 MOSI SPI
Digital 12 MISO PB4 PB6 MISO SPI
Digital 13 SCK PB5 PB7 SCK SPI
 (Digital 14 PB0  T0 -> SDCard SPI _SS
 (Digital 15) PB1  T1 -> MCP4822 SPI _SS
SCL PC0 SCL I2C – Separate
SDA PC1 SDA I2C – Separate
PC2 TCK JTAG <- _CARD_DETECT
for uSD Card
PC3 TMS JTAG -> MCP4822 _LDAC
PC4 TDO JTAG -> SRAM SPI _SS
PC5 TDI JTAG -> EEPROM SPI _SS
PC6 TOSC1 <- 32768Hz Crystal
PC7 TOSC2 -> 32768Hz Crystal
XTAL1 PB6
XTAL2 PB7
 (Analog 6) PA6 -> LINE Input
 (Analog 7) PA7 -> MIC Input

Goldilocks Analogue Synthesizer

For the past year, I’ve been prototyping an Arduino clone, the Goldilocks Analogue, which incorporates advanced analogue output capabilities into the design of the original Goldilocks with ATmega1284p AVR MCU and uSD card cage. Recently the design scope crept up to include two SPI memory devices (EEPROM, SRAM, FRAM), and microphone audio input. But, before I go through another prototype cycle, I thought it would be a good idea to build some demonstration applications, showcasing the capabilities of an arduino R3 compatible platform with integrated analogue output and have some fun with audio.

Goldilocks Analogue Prototype 3

Some of the initial tests I’ve built include some 8 bit algorithmic music and, using two Goldilocks Analogue prototype devices, a digital walkie talkie using Xbee radios. They were fun, but don’t really demonstrate the full range of the audio capabilities of the platform.

It seemed appropriate to build a synthesizer using the Goldilocks Analogue as the platform, and a Gameduino 2 shield incorporating a FDTI FT800 EVE GPU, and see how close I could get to a musical outcome.

Research

Before randomly building something that made a bunch of squeaky sounds, I thought the best thing to do is to learn something about the field of analogue synthesizers and synthesizing audio.

I also obtained some simple analogue synthesizers from Korg to see exactly what they produce, so I could copy them. Some people write that this monotron analogue synthesizer family are good examples of a low cost musical instrument. I found it very interesting to examine the wave forms produced by the various settings.

Using the features of the two Korg devices, I was able to define the goal for the synthesizer that I wanted to build using the Goldilocks Analogue.

The Korg monotron DUO has two voltage controlled oscillators (VCO1 and VCO2), which produce square waves. The VCO1 has a pitch setting, which defines the basic frequency at which the ribbon keyboard operates. The ribbon keyboard can be set to have a major scale, a minor scale, a full chromatic scale, or be a ribbon with no set notes. For clarity, the pitch on the DUO is analogue, so there is no guarantee that the notes generated by the ribbon keyboard will be in tune.

The VCO2 pitch can be modified either below or above the pitch of the VCO1. In its middle section, with some care, it can be matched exactly to the VCO1 setting. The switch allows either just the VCO1 or both VCO1 and VCO2 to produce sound. A separate XMOD intensity knob allows the VCO2 to modulate the frequency of the VCO1 oscillator, producing cross-modulation.

The monotron DUO contains the famous Korg MS-20 resonant low pass filter, which can be adjusted for both cut-off frequency and intensity of the resonant frequency. Setting the filter values allows the square wave noise generated by the two oscillators to be shaped into very interesting tones.

The Korg monotron DELAY is a very different device from the DUO. It has two oscillators, but only one at audio frequencies. The audio oscillator produces a saw-tooth wave at a frequency controlled by the ribbon keyboard. On the monotron DELAY there is no capability for playing specific notes as the keyboard is only available in ribbon mode. The second oscillator of the monotron DELAY is a low frequency oscillator (LFO), which can be adjusted from 1Hz up to about 30Hz. This LFO can produce either a triangle wave or a square wave to modulate the main audio oscillator. This is used mainly to apply vibrato to musical tones, or to produce very unusual tone ramps. The intensity and pitch of the LFO are controlled by knobs.

The Korg low pass filter present in the monotron DELAY is only adjustable for its cutoff frequency, so it is less flexible and interesting than the monotron DUO implementation.

The monotron DELAY is really built to showcase the analogue space delay functionality, which can be adjusted in both length of delay, and in intensity of feedback. With about 1 second of delay and 100% or more feedback possible, very short sequences of notes can be played and then built upon.

I’m not particularly musical, but I spent some very pleasant hours playing with the two Korg synthesizers experimenting with the sounds available from their very simple platforms, and used their capabilities to guide me in what to build into my Goldilocks Analogue synthesizer.

The next piece of research was to understand how to generate analogue wave forms using direct digital synthesis, and then how to modify sound of the wave forms using convolution or modulation in the time domain.

Design Specification

Having the two Korg devices as an inspiration, and reading about the original Moog synthesizer capabilities from the 1970’s, made the specification pretty straight forward.

Goldilocks Analogue GUI

The Goldilocks Analogue synthesizer has three oscillators, two of which operate at audio frequencies, being VCO1 and VCO2, and one low frequency oscillator, being LFO. The VCO1 is tuned in octaves at correct concert pitch, so that notes played would be at the right frequency. The VCO2 is pitched relative to the VCO1 pitch, and would range minus one octave to plus one octave (or half the VCO1 frequency to double the VCO1 frequency). The LFO is adjustable over the range from 1 Hz to 40 Hz.

I had decided to let each oscillator take one of two wave forms. For VCO1 I initially chose square wave, and saw tooth wave, to be able to replicate the exact sound of the Korg devices. I’ve since decided to move the saw tooth wave to the VCO2, and replaced it with a sine wave on VCO1. It is good to have the pure tone at the correct frequency for tuning instruments. An A4 from the Goldilocks Analogue Synthesizer will, for example, always be 440Hz.

For VCO2 I selected a triangle wave and a saw tooth wave. And, for the LFO there is a sine wave and a triangle wave available. I should point out that changing the wave form available to each oscillator is no more complicated that replacing the look-up table associated with the setting, and there is space available in the ATmega1284p to store at least another 4 separate wave form tables in flash memory, even without extending to on-board SPI EEPROM, or uSD storage.

In the mixing section the intensity or volume of each of VCO1 and VCO2 can be set. It is possible to turn off either oscillator. The intensity of the LFO effect is controlled too. The LFO modulates both the VCO1 and the VCO2. The final input is the cross modulation of VCO1 by the VCO2. Very interesting tonality is created by modulating VCO1 by pitches very close to its own frequency.

Each note is put through an exponential Attack and Release envelope, to give the note some shape. The mixed signal is then be sent to the voltage controlled filter. Using the current set up, the sample rate is 16,000 samples/second, which is enough to produce 6 octaves. The upper two octaves remain implemented, but are not reconstructed accurately. I have implemented a Biquad IIR filter to enable the output to be high, low, or band pass filtered. The default set up is for low pass filtering. The filter -3dB frequency, and the ringing levels can be adjusted for different musical effect.

Following the filter stage, the signal enters the space delay stage. The space delay stage can have only about half a second of delay, because of the RAM limitations (16kByte) of the ATmega1284p. So up to 6700 16 bit samples are supported by the space delay function. Samples are recovered from the delay buffer, and mixed with the new signals, then injected back into the delay loop. This creates an infinite loop of samples, depending on the amount of feedback set by the FEEDBACK control.

The final signal output level is controlled by a MASTER volume control. Additionally, an EEPROM STO and RCL capability for the settings has been implemented. Only the most recent settings are stored, which can be recalled when power is restored.

As the keyboard notes are generated using a look up table, multiple keyboard tuning options are possible. I have implemented Concert Tuning (A4 = 440Hz) and Equal Temperament (commonly used for pianos), and Verdi or Stradivari tuning (C4 = 256Hz) with Just Intonation Equal Fifths as an alternative. There is a toggle to chose between either these two options. Any tuning can be generated, and then loaded as the note table.

GUI Implementation

The GUI of the solution depends on a Gameduino 2 screen, which is based on the FTDI Chip FT800 EVE GPU device. The FT800 was the first EVE GPU available from FTDI and it can only support single touch. This limitation makes it only partially useful as a product to support this application. The most interesting sounds are generated by bending the controls whilst playing the notes. Fortunately there are newer EVE GPU devices that support multi-touch and they would make a better platform if this synthesizer were to become more than just a demonstration.

The GUI makes extensive use of FT800 co-processor widget capabilities being dials, toggles, keys, and text. Some examples below.

// text
FT_GPU_CoCmd_Text_P(phost, 300,  8, 27, OPT_CENTER, PSTR(&amp;quot;VCF&amp;quot;));
FT_GPU_CoCmd_Text_P(phost, 300, 25, 26, OPT_CENTER, PSTR(&amp;quot;CUTOFF&amp;quot;));
FT_GPU_CoCmd_Text_P(phost, 300, 95, 26, OPT_CENTER, PSTR(&amp;quot;PEAK&amp;quot;));

// toggles
FT_API_Write_CoCmd(TAG(LFO_WAVE));
FT_GPU_CoCmd_Toggle_P(phost, 13,242,46,18, OPT_3D, synth.lfo.wave, PSTR(&amp;quot;SIN&amp;quot; &amp;quot;\xFF&amp;quot; &amp;quot;TRI&amp;quot;));

FT_API_Write_CoCmd(TAG(KBD_TOGGLE));
FT_GPU_CoCmd_Toggle_P(phost, 405,130,60,26, OPT_3D, synth.kbd_toggle, PSTR(&amp;quot;CONCRT&amp;quot; &amp;quot;\xFF&amp;quot; &amp;quot;VERDI&amp;quot;));

// dials
FT_API_Write_CoCmd(TAG(DELAY_FEEDBACK));
FT_GPU_CoCmd_Dial(phost, 365,125,20, OPT_3D, synth.delay_feedback); // DELAY FEEDBACK

FT_API_Write_CoCmd(TAG(MASTER));
FT_GPU_CoCmd_Dial(phost, 440,55,26, OPT_3D, synth.master); // MASTER

The integrated touch tracking capability makes it very easy to parse touch into specific commands.

readTag = FT_GPU_HAL_Rd8(phost, REG_TOUCH_TAG);

if (readTag &amp;gt; 0x80)// tag is greater than 0x80 and therefore is a dial.
{
	TrackRegisterVal.u32 = FT_GPU_HAL_Rd32(phost, REG_TRACKER);

	switch (TrackRegisterVal.touch.tag)
	{
	case (VCO1_PITCH):
		synth.vco1.pitch = TrackRegisterVal.touch.value &amp;amp; 0xe000;
		break;
	// continues...
	}

This integrated touch tracking capability can return which dial (slider / scroll bar) has been touched, and the relative position of the touch. This same position value can then be used in the display command to set the position of the dial (slider / scroll bar), providing direct feedback on the GUI.

The main GUI task simply calls the touch function, and if there is a touch recorded the GUI is updated, and the revised settings entered into the analogue audio control structure. Otherwise if there are no touches recorded there are no processor cycles wasted updating the display. The FT800 EVE GPU continues to display the same content until a new display list is loaded into the GPU memory.

When a keyboard touch is recorded, the tone generation information is updated, and this then directly impacts the output tone generated by the audio section.

//  setting the phase increment for VCO1 is frequency * LUT size / sample rate.
//  &amp;lt;&amp;lt; 1 in SAMPLE_RATE is residual scale to create 24.8 fixed point number.
// The LUT is already pre-scaled &amp;lt;&amp;lt; 7 in the calculation.
// The LUT can't be pre-scaled to &amp;lt;&amp;lt; 8 because this creates numbers too large for uint32_t to hold,
// and we want to allow the option to vary the SAMPLE_RATE at compilation time, so it has to stay in the calculation.
synth.vco1.phase_increment = (uint32_t)pgm_read_dword(synth.note_table_ptr + stop * NOTES + note) / (SAMPLE_RATE &amp;gt;&amp;gt; 1);

// set the VCO2 phase increment to be -1 octave to +1 octave from VCO1, with centre dial frequency identical.
if (synth.vco2.pitch &amp;amp; 0x8000) // upper half dial
	synth.vco2.phase_increment = ((synth.vco1.phase_increment &amp;gt;&amp;gt; 4) * synth.vco2.pitch ) &amp;gt;&amp;gt; 11;
else // lower half dial
	synth.vco2.phase_increment = (synth.vco1.phase_increment &amp;gt;&amp;gt; 1) + (((synth.vco1.phase_increment &amp;gt;&amp;gt; 4) * synth.vco2.pitch) &amp;gt;&amp;gt; 12);

// set the LFO phase increment to be from 0 Hz to 32 Hz.
synth.lfo.phase_increment = ((uint32_t)synth.lfo.pitch * LUT_SIZE / ((uint32_t)SAMPLE_RATE &amp;lt;&amp;lt; 4) );

The phase increment desired, respective to the relevant tone desired, is read from a look up table containing 8 octaves each of 12 notes for VCO1. VCO2 phase increment is then set as a proportion of VCO1. And LFO phase increment is set to range from 0 to around 30 Hz. With this information, and the selected wave form look up table, the audio implementation can do its thing.

Audio Implementation

The synthesizer audio section is implemented in one function, that is executed each time a new sample is generated. This means at 12,000 samples/ second sample generation frequency, we have 83 micro seconds to generate the final sample to be pushed to the Goldilocks Analogue MCP4822 12 bit dual channel DAC.

The current sample generation routine takes under 45 micro seconds to complete with 3 Oscillators running, so there is a little head room still available. With some further coding improvements it was possible to raise the sample frequency to 16,000 samples/sec as the sample generation frequency. The below logic trace shows the main SPI interface (SCK, MISO, MOSI, _SS) delivering commands to the EVE GPU, and the lower MSPI interface (MSPI SCK, MSPI MOSI, MSPI PING) providing the calculated samples, every 83 micro seconds, to the DAC.

Goldilocks Analogue Synthesizer, with 3 Oscillators operating.

Goldilocks Analogue Synthesizer, with 3 Oscillators operating.

It is clear to see that two EVE GPU transactions are being interrupted by the DAC output, but because the main SPI interface is not changing state the transaction is faultlessly resumed once the DAC interrupt is completed.

In contrast, when there are no oscillators running because no key is pressed, the sample generation routine takes just 28 micro seconds to complete. The logic trace below shows the change of state from 0 to 3 oscillators.

Goldilocks Analogue, with no Oscillators operating.

Goldilocks Analogue, with no Oscillators operating.

There is little time available to calculate sample values in real time, so all of the samples are pre-calculated and are stored in look-up tables (LUT). Each LUT contains 4096 16 bit samples, which gives 12 significant bits of accuracy for the values. I chose 4096 samples because the ATmega1284p has sufficient storage to support multiple tables of this size in its flash memory. Smaller LUTs would sacrifice accuracy, and larger LUTs would compromise on the number of available wave forms.

I have prepared LUTs for sine wave, square wave, triangle wave, and saw tooth wave options. Another advantage of the LUT approach is that better bandwidth optimised LUT values can be substituted without changing the code. Also, LUTs allow completely arbitrary waveforms could be used if desired to obtain specific timbre or nuances of sound.

The sample generation code starts with the LFO oscillator using a direct digital synthesis model. Each oscillator sample is calculated identically by stepping through the LUT with a phase increment based on the frequency of the note required, but VCO2 phase increment is modified by the LFO output and the VCO1 phase increment is modified by both VCO2 and LFO outputs.

Code shown here assumes that both LFO and VCO2 output wave forms have already been calculated.

///////////// Now do the VCO1 ////////////////////

// This will be modulated by the VCO2 value (depending on the XMOD intensity),
// and the LFO intensity.
if( synth.vco1.toggle )
{
	// Increment the phase (index into waveform LUT) by the calculated phase increment.
	// Both the phase and phase_increment are stored as 24.8 in uint32_t.
	// The fractional component of the phase and phase_increment is needed to ensure the wave
	// is tracked accurately.
	synth.vco1.phase += synth.vco1.phase_increment;

	// calculate how much the LFO affects the VCO1 phase increment
	if (synth.lfo.toggle)
	{
		// increment the phase (index into LUT) by the calculated phase increment including the LFO output.
		synth.vco1.phase += (uint32_t)outLFO; // increment on the fractional component 8.8, limiting the effect.
	}

	// calculate how much the VCO2 XMOD affects the VCO1 phase increment
	if (synth.vco2.toggle)
	{
		// increment the phase (index into LUT) by the calculated phase increment including the LFO output.
		synth.vco1.phase += (uint32_t)outXMOD; // increment on the fractional component 8.8, limiting the effect.
	}

	// if we've gone over the waveform LUT boundary -&amp;gt; loop back
	synth.vco1.phase &amp;amp;= 0x000fffff; // this is a faster way doing the table
						// wrap around, which is possible
						// because our table is a multiple of 2^n.
						// Remember the lowest byte (0xff) is fractions of LUT steps.
						// The table is 0xfff.ff bytes long.

	currentPhase = (uint16_t)(synth.vco1.phase &amp;gt;&amp;gt; 8); // remove the fractional phase component.

	// get first sample from the defined LUT for VCO1 and store it in temp1
	temp1 = pgm_read_word(synth.vco1.wave_table_ptr + currentPhase);
	++currentPhase; // go to next sample

	currentPhase &amp;amp;= 0x0fff;	// check if we've gone over the boundary.
				// we can do this because it is a multiple of 2^n.

	// get second sample from the LUT for VCO1 and put it in temp2
	temp2 = pgm_read_word(synth.vco1.wave_table_ptr + currentPhase);

	// interpolate between samples
	// multiply each sample by the fractional distance
	// to the actual location value
	frac = (uint8_t)(synth.vco1.phase &amp;amp; 0x000000ff); // fetch the lower 8bits

	// the optimised assembly code Multiply routines come from Open Music Labs.
	MultiSU16X8toH16Round(temp3, temp2, frac);

	// scaled sample 2 is now in temp3, and since we are done with
	// temp2, we can reuse it for the next result
	MultiSU16X8toH16Round(temp2, temp1, 0xff - frac);
	// temp2 now has the scaled sample 1
	temp2 += temp3; // add samples together to get an average
	// our resultant wave is now in temp2

	// set amplitude with volume
	// multiply our wave by the volume value
	MultiSU16X16toH16Round(outVCO1, temp2, synth.vco1.volume);
	// our VCO1 wave is now in outVCO1
}

The next piece of the audio process is to mix the two oscillators VCO1 and VCO2, and then calculate the space delay required. This is where the resonant low pass filter is implemented.

////////////// mix the two oscillators //////////////////
// irrespective of whether a note is playing or not.
// combine the outputs
temp1 = (outVCO1 &amp;gt;&amp;gt; 1) + (outVCO2 &amp;gt;&amp;gt; 1);

///////// Resonant Low Pass Filter here  ///////////////
IIRFilter( &amp;amp;filter, &amp;amp;temp1);

///////// Do the space delay function ///////////////////

// Get the number of buffer items we have, which is the delay.
MultiU16X16toH16Round( buffCount, (uint16_t)(sizeof(int16_t) * DELAY_BUFFER), synth.delay_time);

// Get a sample back from the delay buffer, some time later,
if( ringBuffer_GetCount(&amp;amp;delayBuffer) &amp;gt;= buffCount )
{
	temp0.u8[1] = ringBuffer_Pop(&amp;amp;delayBuffer);
	temp0.u8[0] = ringBuffer_Pop(&amp;amp;delayBuffer);
}
else // or else wait until we have samples available.
{
	temp0.i16 = 0;
}

if (synth.delay_time) // If the delay time is set to be non zero,
{
	// do the space delay function, irrespective of whether a note is playing or not,
	// and combine the output sample with the delayed sample.
	temp1 += temp0.i16;

	// multiply our sample by the feedback value
	MultiSU16X16toH16Round(temp0.i16, temp1, synth.delay_feedback);
}
else
	ringBuffer_Flush(&amp;amp;delayBuffer);	// otherwise flush the buffer if the delay is set to zero.

// and push it into the delay buffer if buffer space is available
if( ringBuffer_GetCount(&amp;amp;delayBuffer) &amp;lt;= buffCount )
{
	ringBuffer_Poke(&amp;amp;delayBuffer, temp0.u8[1]);
	ringBuffer_Poke(&amp;amp;delayBuffer, temp0.u8[0]);
}
// else drop the space delay sample (probably because the delay has been reduced).

////////////// Finally, set the output volume //////////////////
// multiply our wave by the volume value
MultiSU16X16toH16Round(temp2, temp1, synth.master);

// and output wave on both A &amp;amp; B channel, shifted to (+)ve values only because this is what the DAC needs.
*ch_A = *ch_B = temp2 + 0x8000;

This generates the required output waveforms that make the Goldilocks Analogue Synthesiser work.

The second order Biquad IIR filter code has been implemented in a general way, enabling multiple filters to be applied to the sample train. Set up for Low Pass, Band Pass, and for High Pass have been implemented. The coefficients and state variables for each filter are maintained in a structure.

//========================================================
// second order IIR -- &amp;quot;Direct Form I Transposed&amp;quot;
//  a(0)*y(n) = b(0)*x(n) + b(1)*x(n-1) +  b(2)*x(n-2)
//                   - a(1)*y(n-1) -  a(2)*y(n-2)
// assumes a(0) = IIRSCALEFACTOR = 32 (to increase calculation accuracy).

// http://en.wikipedia.org/wiki/Digital_biquad_filter
// https://www.hackster.io/bruceland/dsp-on-8-bit-microcontroller
// http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt

typedef struct {
	uint16_t sample_rate;	// sample rate in Hz
	uint16_t cutoff;	// normalised cutoff frequency, 0-65536. maximum is sample_rate/2
	uint16_t peak;		// normalised Q factor, 0-65536. maximum is Q_MAXIMUM
	int16_t b0,b1,b2,a1,a2;	// Coefficients in 8.8 format
	int16_t xn_1, xn_2;	//IIR state variables
	int16_t yn_1, yn_2;	//IIR state variables
} filter_t;

void setIIRFilterLPF( filter_t *filter ) // Low Pass Filter Setting
{
	if ( !(filter-&amp;gt;sample_rate) )
		filter-&amp;gt;sample_rate = SAMPLE_RATE;

	if ( !(filter-&amp;gt;cutoff) )
		filter-&amp;gt;cutoff = UINT16_MAX &amp;gt;&amp;gt; 1; // 1/4 of sample rate = filter-&amp;gt;sample_rate&amp;gt;&amp;gt;2

	if ( !(filter-&amp;gt;peak) )
		filter-&amp;gt;peak =  (uint16_t)(M_SQRT1_2 * UINT16_MAX / Q_MAXIMUM); // 1/sqrt(2) effectively

	double frequency = ((double)filter-&amp;gt;cutoff * (filter-&amp;gt;sample_rate&amp;gt;&amp;gt;)) / UINT16_MAX;
	double q = (double)filter-&amp;gt;peak * Q_MAXIMUM / UINT16_MAX;
	double w0 = (2.0 * M_PI * frequency) / filter-&amp;gt;sample_rate;
	double sinW0 = sin(w0);
	double cosW0 = cos(w0);
	double alpha = sinW0 / (q * 2.0f);
	double scale = IIRSCALEFACTOR / (1 + alpha); // a0 = 1 + alpha

	filter-&amp;gt;b0	= \
	filter-&amp;gt;b2	= float2int( ((1.0 - cosW0) / 2.0) * scale );
	filter-&amp;gt;b1	= float2int(  (1.0 - cosW0) * scale );

	filter-&amp;gt;a1	= float2int( (-2.0 * cosW0) * scale );
	filter-&amp;gt;a2	= float2int( (1.0 - alpha) * scale );
}

// interim values in 24.8 format
// returns y(n) in place of x(n)
void IIRFilter( filter_t *filter, int16_t * xn )
{
	int32_t yn;	// current output
	int32_t  accum;	// temporary accumulator

	// sum the 5 terms of the biquad IIR filter
	// and update the state variables
	// as soon as possible
	MultiS16X16to32(yn,filter-&amp;gt;xn_2,filter-&amp;gt;b2);
	filter-&amp;gt;xn_2 = filter-&amp;gt;xn_1;

	MultiS16X16to32(accum,filter-&amp;gt;xn_1,filter-&amp;gt;b1);
	yn += accum;
	filter-&amp;gt;xn_1 = *xn;

	MultiS16X16to32(accum,*xn,filter-&amp;gt;b0);
	yn += accum;

	MultiS16X16to32(accum,filter-&amp;gt;yn_2,filter-&amp;gt;a2);
	yn -= accum;
	filter-&amp;gt;yn_2 = filter-&amp;gt;yn_1;

	MultiS16X16to32(accum,filter-&amp;gt;yn_1,filter-&amp;gt;a1);
	yn -= accum;

	filter-&amp;gt;yn_1 = yn &amp;gt;&amp;gt; (IIRSCALEFACTORSHIFT + 8); // divide by a(0) = 32 &amp;amp; shift to 16.0 bit outcome from 24.8 interim steps

	*xn = filter-&amp;gt;yn_1; // being 16 bit yn, so that's what we return.
}

Hardware Implementation

I sell on Tindie

The Goldilocks Analogue Prototype 3 is working very well, and it has resolved some of the issues of the second prototype. Using the USART1 MSPIM mode to drive the MCP4822 DAC allows the GUI to use the SPI bus for the Gameduino 2 GUI without conflicts. This is the only way that the rigorous timing for audio output can be maintained, given the heavy SPI usage required to drive the GPU co-processor.

Goldilocks Analogue - Prototype 3

The Atmel AVR ATmega1284p in the Goldilocks Analogue Prototype 3 is running at 24.576MHz. This is significantly above the specification (20MHz at 5V), but remembering that the specification for AVR ATmega devices covers an extended temperature range (that would kill a human) and it is unlikely that the Goldilocks Analogue would be used in extreme temperature situations, I’ve had no problems with this processor frequency to date.

There are two reasons for over-clocking the ATmega1284p. The first is that it is simply not possible to make the required calculations within the time budget available at the maximum specification CPU frequency of 20MHz or even more extreme at the standard Arduino rate of 16MHz.

The second reason is related to the generation of exact audio sampling frequencies. With a CPU clock of 24.576MHz, the 8 bit timer with pre-scaling can generate EXACT audio sample timing at 8kHz, 12kHz, 16kHz, 32kHz, and 48kHz. Using a 16 bit timer, we can also generate very close approximations to 44.1kHz, if required.

The routine to transfer samples does not need to consume precious 16 bit timer resources, which are useful to produce PWM for motor control. Retaining the capability to manage two motors (using the two 16 bit timers) is fairly important outcome.

The interrupt for generating the wave forms does only two things; write the sample values to the DAC, and then calculate the new sample value for the next sample time. The samples are written to the DAC first to ensure that the output is not jittered by the possibility of variable processing time in the audio handler routine. This can happen if (for example) one of the VCO is turned off, removing the sample calculation code from the code execution path.

ISR(TIMER0_COMPA_vect) __attribute__ ((hot, flatten));
ISR(TIMER0_COMPA_vect)
{
	// MCP4822 data transfer routine
	// move data to the MCP4822 - done first for regularity (reduced jitter).
	// &amp;amp;'s are necessary on data_in variables
	DAC_out (ch_A_ptr, ch_B_ptr);

	// audio processing routine - do whatever processing on input is required - prepare output for next sample.
	// Fire the global audio handler, if set.
	if (audioHandler!=NULL)
		audioHandler(ch_A_ptr, ch_B_ptr);
}

XBee Walkie Talkie

I’m building an advanced Arduino clone based on the AVR ATmega1284p MCU with some special features including a 12 bit DAC MCP4822, headphone amplifier, 2x SPI Memory (SRAM, EEPROM), and a SD Card. There are many real world applications for analogue outputs, but because the Arduino platform doesn’t have integrated DAC capability there are very few published applications for analogue signals. A Walkie Talkie is one example of using digital and analogue together to make a simple but very useful project.

Two Goldilocks Analogue prototypes with XBee radios, and Microphone amplifiers.

Two Goldilocks Analogue prototypes with XBee radios, and Microphone amplifiers.

The actual Walkie Talkie functionality is really only a few lines of code, but it is built on a foundation of analogue input (sampling), analogue output on the SPI bus to the MCP4822 DAC, sample timing routines, and the XBee digital radio platform. Let’s start from the top and then dig down through the layers.

XBee Radio

I am using XBee Pro S2B radios, configured to communicate point to point. For the XBee Pro there needs to be one radio configured as the Coordinator, and the other as a Router. There are configuration guides on the Internet.

I have configured the radios to wait the maximum inter-character time before sending a packet, which implies that the packets will be set only when full (84 bytes). This maximises the radio throughput. Raw throughput is 250 kbit/s, but the actual user data rate is limited to about 32 kbit/s. This has an impact on the sampling rate and therefore quality of speech that can be transmitted.

Using 10 bit samples companded using A-Law to 8 bit code words, I have found that about 3 kHz sampling generates about as much data as can be transmitted without compression. I’m leaving G.726 compression for another project.

The XBee radios are configured in AT mode, which acts as a transparent serial pipe between the two endpoints. This is the simplest way to connect two devices via digital radio. And it allowed me to do simple testing, using wire, before worrying about whether the radio platform was working or not.

XBee Packet Reception in Purple

XBee Packet Reception in Purple

Looking at the tracing of a logic analyser, we can see the XBee data packets arriving on the (purple) Rx line of the serial port. The received packet data is stored into a ring buffer, and played out at a constant rate. I have allowed up to 255 bytes in the receive ring buffer, and this will be sufficient because the XBee packet size is 84 bytes.

The samples to be transmitted to the other device are transmitted on the (blue) Tx line, more or less in each sample period even though they are buffered before transmission. The XBee radio buffers these bytes for up to 0xFF inter-symbol periods (configuration), and only transmits a packet to the other endpoint when it has a full packet.

Sampling Rate

Looking at the bit budget for the transmission link, we need to calculate how much data can be transmitted without overloading the XBee radio platform, and causing sample loss. As we are not overtly compressing the voice samples, we have 8 bit samples times 3,000 Hz sampling or 24 kbit/s to transmit. This seems to work pretty well. I have tried 4 kHz sampling, but this is too close to the theoretical maximum, and doesn’t work too effectively.

Sample rate of 3,000 Hz seems to be the optimum.

Sample rate of 3,000 Hz seems to be the optimum.

Looking at the logic analyser, we can see the arrival of a packet of bytes commencing with 0x7E and 0x7C on the Rx line. Both the Microphone amplifier and the DAC output are biased around 0x7F(FF), so we can read that the signal levels captured and transmitted here are very low. The sample rate shown is 3,000 Hz.

Sample Processing

I have put a “ping” on one output to capture when the sampling interrupt is being processed (yellow). We can see that the amount of time spent in the interrupt processing is very small for this application, relative to the total time available. Possibly some kind of data compression could be implemented.

DAC and sample processing

DAC and sample processing

During the sampling interrupt, there are two major activities, generating an audio output, by placing a sample onto the DAC, and then reading the ADC to capture an audio sample and transmit it to the USART buffer.

This is done by the audioCodec_dsp function, which is called from the code in a timer interrupt.

void audioCodec_dsp( uint16_t * ch_A,  uint16_t * ch_B)
{
	int16_t xn;
	uint8_t cn;
	/*----- Audio Rx -----*/

	/* Get the next character from the ring buffer. */

	if( ringBuffer_IsEmpty( (ringBuffer_t*) &(xSerialPort.xRxedChars) ) )
	{
		cn = 0x80 ^ 0x55; // put A-Law nulled signal on the output.
	}
	else if (ringBuffer_GetCount( &(xSerialPort.xRxedChars) ) > (portSERIAL_BUFFER_RX>>1) ) // if the buffer is more than half full.
	{
		cn = ringBuffer_Pop( (ringBuffer_t*) &(xSerialPort.xRxedChars) ); // pop two samples to catch up, discard first one.
		cn = ringBuffer_Pop( (ringBuffer_t*) &(xSerialPort.xRxedChars) );
	}
	else
	{
		cn = ringBuffer_Pop( (ringBuffer_t*) &(xSerialPort.xRxedChars) ); // pop a sample
	}

	alaw_expand1(&cn, &xn);	// expand the A-Law compression

	*ch_A = *ch_B = (uint16_t)(xn + 0x7fff); // move the signal to positive values, put signal out on A & B channel.

	/*----- Audio Tx -----*/

	AudioCodec_ADC( &mod7_value.u16 );	// sample is 10bits left justified.

	xn = mod7_value.u16 - 0x7fe0;	// center the sample to 0 by subtracting 1/2 10bit range.

	IIRFilter( &tx_filter, &xn);	// filter transmitted sample train

	alaw_compress1(&xn, &cn);	// compress using A-Law

	xSerialPutChar( &xSerialPort, cn);	// transmit the sample
}

I am using the AVR 8 bit Timer0 to generate the regular sample intervals, by triggering an interrupt. By using a MCU FCPU frequency which is a binary multiple of the standard audio frequencies, we can generate accurate reproduction sampling rates by using only the 8 bit timer with a clock prescaler of 64. To generate odd audio frequencies, like 44,100 Hz, the 16 bit Timer1 can be used to get sufficient accuracy without requiring a clock prescaler.

The ATmega1284p ADC is set to free-run mode, and is scaled down to 192 kHz. While this is close to the maximum acquisition speed documented for the ATmega ADC, it is still within the specification for 8 bit samples.

ISR(TIMER0_COMPA_vect) __attribute__ ((hot, flatten));
ISR(TIMER0_COMPA_vect)
{
#if defined(DEBUG_PING)
  // start mark - check for start of interrupt - for debugging only (yellow trace)
  PORTD |= _BV(PORTD7); // Ping IO line.
#endif

  // MCP4822 data transfer routine
  // move data to the MCP4822 - done first for regularity (reduced jitter).
  DAC_out (ch_A_ptr, ch_B_ptr);

  // audio processing routine - do whatever processing on input is required - prepare output for next sample.
  // Fire the global audio handler which is a call-back function, if set.
  if (audioHandler!=NULL)
    audioHandler(ch_A_ptr, ch_B_ptr);

#if defined(DEBUG_PING)
  // end mark - check for end of interrupt - for debugging only (yellow trace)
  PORTD &= ~_BV(PORTD7);
#endif
}

This interrupt takes 14 us to complete, and is very short relative to the 333 us we have for each sample period. This gives us plenty of time to do other processing, such as running a user interface or further audio processing.

SPI Transaction

At the final level of detail, we can see the actual SPI transaction to output the incoming sample to the MCP4822 DAC.

SPI DAC Transaction

SPI MCP4822 DAC Transaction

As I have built this application on the Goldilocks Analogue Prototype 2 which uses the standard SPI bus, the transaction is normal. My later prototypes are using the Master SPI Mode on USART 1 of the ATmega1284p, which slightly accelerates the SPI transaction through double buffering, and frees the normal SPI bus for simultaneous reading or writing to the SD Card or SPI Memory, for audio streaming. In the Walkie Talkie application there is no need to capture the audio, so there’s no down side to using the older prototypes and the normal SPI bus.


void DAC_out(const uint16_t * ch_A, const uint16_t * ch_B)
{
  DAC_command_t write;

  if (ch_A != NULL)
  {
    write.value.u16 = (*ch_A) >> 4;
    write.value.u8[1] |= CH_A_OUT;
  }
  else // ch_A is NULL so we turn off the DAC
  {
    write.value.u8[1] = CH_A_OFF;
  }

  SPI_PORT_SS_DAC &= ~SPI_BIT_SS_DAC; // Pull SS low to select the Goldilocks Analogue DAC.
  SPDR = write.value.u8[1]; // Begin transmission ch_A.
  while ( !(SPSR & _BV(SPIF)) );
  SPDR = write.value.u8[0]; // Continue transmission ch_A.

  if (ch_B != NULL) // start processing ch_B while we're doing the ch_A transmission
  {
    write.value.u16 = (*ch_B) >> 4;
    write.value.u8[1] |= CH_B_OUT;
  }
  else // ch_B is NULL so we turn off the DAC
  {
    write.value.u8[1] = CH_B_OFF;
  }

  while ( !(SPSR & _BV(SPIF)) ); // check we've finished ch_A.
  SPI_PORT_SS_DAC |= SPI_BIT_SS_DAC; // Pull SS high to deselect the Goldilocks Analogue DAC, and latch value into DAC.

  SPI_PORT_SS_DAC &= ~SPI_BIT_SS_DAC; // Pull SS low to select the Goldilocks Analogue DAC.
  SPDR = write.value.u8[1]; // Begin transmission ch_B.
  while ( !(SPSR & _BV(SPIF)) );
  SPDR = write.value.u8[0]; // Continue transmission ch_B.
  while ( !(SPSR & _BV(SPIF)) ); // check we've finished ch_B.
  SPI_PORT_SS_DAC |= SPI_BIT_SS_DAC; // Pull SS high to deselect the Goldilocks Analogue DAC, and latch value into DAC.
}

Wrap Up

Using a few pre-existing tools and a few lines of code, it is possible to quickly build a digitally encrypted walkie talkie, capable of communicating (understandable, but certainly not high quality) voice. And, there ain’t no CB truckers going to be listening in on the family conversations going forward.

This was a test of adding microphone input based on the MAX9814 to the Goldilocks Analogue. I will be revising the Prototype 3 and will add in a microphone amplification circuit to support applications needing audio input, like this walkie talkie example, or voice changers, or vocal control music synthesizers.

I’m also running the ATmega1284p devices at the increased frequency of 24.576 MHz, over the standard rate of 20 MHz. This specific frequency allows very precise reproduction of audio samples from 48 kHz down right down to 4 kHz (or even down to 1,500 Hz). The extra MCU clock cycles per sample period are very welcome when it comes to generating synthesised music, or encoding audio samples.

Code as usual on Github AVRfreeRTOS repository. Also, a call out to Shuyang at SeeedStudio who’s OPL is awesome, and is the source of many components and PCBs.

Implementing NASA EEFS on AVR ATmega

I am building a variant of the Arduino platform which will have an analogue output capability in the form of a dual channel DAC, called Goldilocks Analogue. The DAC can be used to generate variable DC voltage levels that might be used as part of a PID control system, and it can also generate AC voltages up to about 50kHz if it can be fed with sufficient samples to produce the required signal. To generate a 44.1kHz audio signal the DAC has to receive a stream of data, with a new sample every 22us without fail.

44.1kHz samples using USART MSPI output.

44.1kHz samples using USART MSPI output.

Finding an answer to the question of how to reliably stream data to the DAC is the background to this post.

Looking for a way to structure and assemble a combination of many WAV files on a host PC for storage onto to the AVR ATmega MCU, I needed a system that would support:

  • Editing and assembly of files on a host PC (Linux, Windows, Mac), in to a package.
  • Transferring a package of files to the AVR ATmega (Arduino) device very simply.
  • Can read and write files to the storage medium very quickly, and without jitter.
  • Simple implementation in the avr-libc environment.

Initially I was looking at using the FAT File System on a SD Card to provide the required capability, but I found that SD Cards are quite slow when writing data to their FLASH medium. Often taking 100ms or more to complete a write cycle. A SD Card read cycle also takes quite a long time, when the FAT file system must be inspected prior to reading or writing a specific block of information. The SD Card is great for storing Mega Bytes of information, but is not optimal for jitter free read and write applications.

So I started looking at chip storage based on the SPI bus as a mechanism to store large numbers of samples for playback, or to store large amounts of acquired data samples. There are many alternatives using different technologies for SPI storage devices. These range from EEPROM storage, through to SRAM and also newer FRAM technologies. Storage capabilities with up to 1Mbit seem to be quite good value. For my application 1Mbit of storage would allow about 16 seconds of reasonable quality audio to be retrieved with minimal issues for complexity, jitter, and delay.

So I redesigned the Goldilocks Analogue to incorporate space to have two SPI memory (EEPROM, FRAM, SRAM) devices on the board.

Goldilocks Analogue - 2x SPI Memory Devices

Goldilocks Analogue – 2x SPI Memory Devices

Goldilocks Analogue

Goldilocks Analogue

Implementing a method to read and write bytes to these storage devices is very straightforward. There are many libraries available supporting the SPI storage devices of various types. But none of them supported assembling a package of files on a host PC, and then transferring this to the AVR device in a simple manner. So the hunt for a solution to this issue brought me to the NASA EEFS solution.

NASA EEFS

NASA has been releasing their Core Flight System with Open Source licencing over the past few years. The Core Flight System (CFS) is a recognition that many satellite and deep space missions have very common core requirements and that successive missions were simply cloning previous mission software and then owning changes going forward, with learning being improved in a serial manner. The CFS enabled missions that were developing in parallel to push improvements in the platform CFS code back into the general solution for peer and successive missions to benefit from.

The CFS is layered and each layer hides its implementation, enabling the internals of the layer to be changed without affecting other layers’ implementation. Within the CFS Platform Abstraction Layer there is a module designed to support the management of flight software packages on non-volatile storage, called the EEPROM File System (EEFS).

The EEFS is a very small (approximately 2% of the flight software) piece of code that implements the storage and retrieval of all flight system software from flash storage devices. It was designed by NASA GSFC to support similar outcomes as what I needed for my application:

  • Generate a flight software (or general embedded system) executable image on the development workstation. This feature allows the embedded file system to be generated with a known CRC and loaded on to the target processor as a single image. This is a big advantage over formatting a file system on the image, then transferring each file to the file system on the target.
  • Prove that the file system is correct and reliable. Because the EEPROM file system is simple, the code size is small, making it easy to review and find errors.
  • Patch the files in the file system. Due to the simple layout of the EEPROM file system, it is very easy to patch the files in the file system, if the need arises. This can be helpful in deeply embedded systems such as satellite data systems.
  • Dump and understand the file system format. Because the EEPROM file system is simple, it is easy to dump the contents of the EEPROM or PROM memory and determine the contents of each file.

The EEFS is basically a configurable slot-based file system. The file system can be pre-configured with a certain number empty files of known sizes, or known files with specific “spare bytes”, and written with a CRC into an image. The File Allocation Table is a fixed size and contains a fixed number of file slots, together with the location and maximum size for each slot. The File Headers for each slot contain all the information about each File. Changing a file does not impact the FAT, and therefore does not affect other files in the File System.

An EEFS image is created with a tool called geneepromfs, which is a command line tool compiled for the respective host upon which it is used. It reads an input file specifying the files that are to be assembled into the EEFS image, together with the number of empty file slots and their size, and it outputs a complete EEFS image ready to be burnt on the EEPROM, FRAM, or SRAM storage device.

So the EEFS looks like a perfect solution to my requirements. Let’s go to Github and clone the EEFS repository, and get started.

AVR Implementation of EEFS

The EEFS code is supplied for VxWorks or RTEMS platforms, along with a standalone implementation design for bare metal designs. To get the standalone design to work with the AVR ATmega, and my freeRTOS platform of choice, there were two major pieces of work.

Firstly, to develop a generalised SPI interface layer that would allow me to select the actual SPI device installed on the Goldilocks Analogue at compile time. This was necessary because each individual SPI storage device has slightly different command requirements (EEPROM ready check, different address byte numbers), and it made good sense to unify the interface into a single function with compile time options.

Secondly, I needed to revise the pointer calculations inherent in the EEFS code. The NASA GSFC code is based on the availability of 32 bit pointers, and does 32 bit calculations to locate information within the file system. But, on the AVR ATmega platform the inherent pointer size is 16 bits, and many of the advanced pointer arithmetic calculations used in the code would fail.

When I finished the major work, I reduced the return values of most functions to 1 byte error codes, which shaved almost 2,000 bytes of program code off the end result. On the AVR ATmega platform, it is well worth saving 2,000 bytes.

I have built a simple FRAM test program that can write files from a SD Card to the EEFS SPI device, and then edit (read, modify, write) files on the EEFS SPI device for test purposes. This shows how the resulting EEFS library can be best used.

As usual code on Sourceforge AVRfreeRTOS, and also forked on AVR EEFS Github.

ATmega Arduino USART in SPI Master Mode MSPIM

The AVR ATmega MCU used by the Arduino Uno and its clones and peers (Leonardo, Pro, Fio, LilyPad, etc) and the Arduino Mega have the capability to use their USART (Universal Serial Asynchronous Receiver Transmitter), also known as the Serial Port, as an additional SPI bus interface in SPI Master mode. This fact is noted in the datasheets of the ATmega328p, ATmega32u4, and the ATmega2560 devices at the core of the Arduino platforms, but until recently it hasn’t meant much to me.

Over the past 18 months I’ve been working on an advanced derivative of the Arduino platform, using an ATmega1284p MCU at its core. I consider the ATmega1284p device the “Goldilocks” of the ATmega family, and as such the devices I’ve built have carried that name. Recently I have been working on a platform which has some advanced analogue output capabilities incorporating the MCP4822 dual channel DAC, together with a quality headphone amplifier, and linear OpAmp for producing buffered AC and DC analogue signals. This is all great, but when it comes down to outputting continuous analogue samples to produce audio it is imperative that the sample train is not interrupted or the music simply stops!

The issue is that the standard configuration of the Arduino platform (over)loads the SPI interface with all of the SPI duties. In the case of the Goldilocks and other Arduino style devices I have ended up having the MicroSD card, some SPI EEPROM and SRAM, and the MCP4822 DAC all sharing same SPI bus. This means that the input stream of samples from the MicroSD card are interfering and time-sharing with the output sample stream to the DAC. The MicroSD card has a lot of latency, often taking hundreds of milliseconds to respond to a command, whereas the DAC needs a constant stream of samples with no jitter and no more than 22us between each sample. That is a conflict that is difficult to resolve. Even using large buffers is not a solution, as when streaming audio it is easy to consume MBytes of information; which is orders of magnitude more than can be buffered anywhere on the ATmega platform.

Other solutions using a DAC to generate music have used a “soft SPI” and bit-banging techniques to work around the issue. But this creates a performance limitation as the maximum sample output rate is strongly limited by the rate at which the soft SPI port can be bit-banged. There has to be a better way.

USART in SPI mode

The better way to attach SPI Slave devices to the ATmega platform is referenced in this overlooked datasheet heading: “USART in SPI mode”. Using the USART in “Master SPI Mode” (MSPIM) is may be limiting if you need to use the sole serial port to interact with the Arduino (ATmega328p), but once the program is loaded (in the case of using a bootloader) there is often no further need to use the serial port. But for debugging if there is only one USART then obviously it becomes uncomfortable to build a system based on the sole USART in SPI mode.

However in the case of the Goldilocks ATmega1284p MCU with two USARTs, the Arduino Leonardo with both USB serial and USART, and the Arduino Mega ATmega2560 MCU with four USARTs, there should be nothing to stop us converting their use to MSPIM buses according to need.

Excuse me for being effusive about this MSPIM capability in the AVR ATmega. It is not exactly a secret as it is well documented and ages old, but it is a great feature that I’ve simply not previously explored. But now I have explored it, I think it is worthwhile to write about my experience. Also, I think that many others have also overlooked this USART MSPIM capability, because of the dearth of objective review to be found on the ‘net.

Any ATmega datasheet goes into the detailed features and operation of the USART in SPI mode. I’ll go into some of the features in detail and what it means for use in real life.

  • Full Duplex, Three-wire Synchronous Data Transfer – The MSPIM does not rely on having a Slave Select line on a particular pin, and further it doesn’t rely on having both MOSI and MISO lines active at the same time. This means that it is possible to attach a SPI Slave device that doesn’t use the _SS to begin or end transactions with just two pins, being the XCK pin and the Tx pin. If a _SS is required (as in the MCP4822) then only three wires are required. The fact that the MISO (Rx) pin is optional saves precious pins too.
  • Master Operation – The MSPIM only works in SPI Master mode, which means that it is only really useful for connecting accessories. But in the Arduino world, that is what we are doing 99% of the time.
  • Supports all four SPI Modes of Operation (Mode 0, 1, 2, and 3) – yes, it does.
  • LSB First or MSB First Data Transfer (Configurable Data Order) – yes, it does.
  • Queued Operation (Double Buffered) – The MSPIM inherits the USART Tx double buffering capability. This is a function not available on the standard SPI interface and is a great thing. For example, to output a 16bit command two writes to the I/O register can follow each other immediately, and the resulting XCK has no delay between each Byte output. To output a stream of bytes the buffer empty flag can be used as a signal to load the next available byte, ensuring that if the next byte can be loaded with 16 instructions then we can generate a constant stream of bytes. In contrast with the standard SPI interface transmission is not buffered and therefore in Master Mode we’re invariably wait-looping before sending the next byte. This wastes cycles in between each byte in recognising completion, and then loading the next byte for transmission.
  • High Resolution Baud Rate Generator – yes, it is. The MSPIM baud rate can be set to any rate up to half the FCPU clock rate. Whilst there may be little need to run the MSPIM interface at less than the maximum for pure SPI transactions, it is possible to to use this feature, together with double buffered transmission, to generate continuous arbitrary binary bit-streams at almost any rate.
  • High Speed Operation (fXCKmax = FCPU/2) – The MSPIM runs at exactly the same maximum clock speed as the standard SPI interface, but through the double buffering capability mentioned above the actual byte transmission rate can be significantly greater.
  • Flexible Interrupt Generation – The MSPIM has all the same interrupts as the USART from which it inherits its capabilities. In particular the differentiation between buffer space available flag / interrupt and transmission complete flag / interrupt capabilities make it possible to develop useful arbitrary byte streaming solutions.

Implementation Notes

As the USART in normal mode and the USART in MSPIM mode are quite similar in operation there is little that needs to be written. The data sheet has a very simple initialisation code example, which in practice is sufficient for getting communications going. I would note that as there is no automatic Slave Select management, the _SS line needs to be manually configured as an output, and then set (high) appropriately until such time as the attached SPI device is to be addressed. Also note that the XCKn (USART synchronous clock output) needs to be set as an output before configuring the USART for MSPIM. And also to note that the transmission complete flag (TXCn) is not automatically cleared by reading (it is only automatically cleared if an Interrupt is processed), and needs to be manually cleared before commencing a transmission (by writing a 1 to the TXCn bit) it you are planning to use it to signal transaction completion in your code. The Transmit and Receive Data Register (UDRn) is also not automatically cleared, and needs to be flushed before use if a receive transaction is to synchronised to the transmitted bytes.

So the implementation of a simple initialisation code fragment looks like this:

SPI_PORT_DIR_SS |= SPI_BIT_SS;   // Set SS as output pin.
SPI_PORT_SS |= SPI_BIT_SS;       // Pull SS high to deselect the SPI device.
UBRR1 = 0x0000;
DDRD |= _BV(PD4);                // Setting the XCK1 port pin as output, enables USART SPI master mode (this pin for ATmega1284p)
UCSR1C = _BV(UMSEL11) | _BV(UMSEL10) | _BV(UCSZ10) | _BV(UCPOL1);
                                 // Set USART SPI mode of operation and SPI data mode 1,1. UCPHA1 = UCSZ10
UCSR1B = _BV(TXEN1);             // Enable transmitter. Enable the Tx (also disable the Rx, and the rest of the interrupt enable bits set to 0 too).
                                 // Set baud rate. IMPORTANT: The Baud Rate must be set after the Transmitter is enabled.
UBRR1 = 0x0000;                  // Where maximum speed of FCPU/2 = 0x0000

And a fragment of the code to transmit a 16 bit value looks like this. Note with this example there is no need to wait for the UDREn flag to be set between bytes, as we are only writing two bytes into the double transmit buffer. This means that the 16 clocks are generated on XCKn with no gap in delivery.

UCSR1A = _BV(TXC1);              // Clear the Transmit complete flag, all other bits should be written 0.
SPI_PORT_SS &= ~_BV(SPI_BIT_SS); // Pull SS low to select the SPI device.
UDR1 = write.value.u8[1];        // Begin transmission of first byte.
UDR1 = write.value.u8[0];        // Continue transmission with second byte.
while ( !(UCSR1A & _BV(TXC1)) ); // Check we've finished, by waiting for Transmit complete flag.
SPI_PORT_SS |= _BV(SPI_BIT_SS);  // Pull SS high to deselect the SPI device.

Results

Looking at the output generated by the two different SPI interfaces on the AVR ATmega, it is easy to see the features in action. In the first image we can see that the two bytes of the 16 bit information for the DAC are separated, as loading the next byte to be transmitted requires clock cycles AFTER the transmission completed SPIF flag has been raised.

DAC control using SPI bus.

DAC control using SPI bus.

In the case of the MSPIM output, we can’t recognise where the two bytes are separated, and the end of the transaction is triggered by the Transaction complete flag. This example shows that the MSPIM can be actually faster than the standard SPI interface, even though the maximum clock speed in both cases is FCPU/2.

DAC control using USART MSPIM bus.

DAC control using USART MSPIM bus.

The final image shows the Goldilocks DAC generating a 44.1kHz output signal, with dual 12 bit outputs. Whilst this is not fully CD quality, comparisons with other DAC solutions available on the Arduino platform have been favourable.

44.1kHz samples using USART MSPI output.

44.1kHz samples using USART MSPI output.

Conclusion

I am now convinced to use the USART MSPIM capability for the Goldilocks Analogue, and I think that it is time to write some generalised MSPIM interface routines to go into my AVRfreeRTOS Sourceforge repository to make it easy to use this extremely powerful capability.