Goldilocks Analogue Synthesizer

For the past year, I’ve been prototyping an Arduino clone, the Goldilocks Analogue, which incorporates advanced analogue output capabilities into the design of the original Goldilocks with ATmega1284p AVR MCU and uSD card cage. Recently the design scope crept up to include two SPI memory devices (EEPROM, SRAM, FRAM), and microphone audio input. But, before I go through another prototype cycle, I thought it would be a good idea to build some demonstration applications, showcasing the capabilities of an arduino R3 compatible platform with integrated analogue output and have some fun with audio.

Goldilocks Analogue Prototype 3

Some of the initial tests I’ve built include some 8 bit algorithmic music and, using two Goldilocks Analogue prototype devices, a digital walkie talkie using Xbee radios. They were fun, but don’t really demonstrate the full range of the audio capabilities of the platform.

It seemed appropriate to build a synthesizer using the Goldilocks Analogue as the platform, and a Gameduino 2 shield incorporating a FDTI FT800 EVE GPU, and see how close I could get to a musical outcome.


Before randomly building something that made a bunch of squeaky sounds, I thought the best thing to do is to learn something about the field of analogue synthesizers and synthesizing audio.

I also obtained some simple analogue synthesizers from Korg to see exactly what they produce, so I could copy them. Some people write that this monotron analogue synthesizer family are good examples of a low cost musical instrument. I found it very interesting to examine the wave forms produced by the various settings.

Using the features of the two Korg devices, I was able to define the goal for the synthesizer that I wanted to build using the Goldilocks Analogue.

The Korg monotron DUO has two voltage controlled oscillators (VCO1 and VCO2), which produce square waves. The VCO1 has a pitch setting, which defines the basic frequency at which the ribbon keyboard operates. The ribbon keyboard can be set to have a major scale, a minor scale, a full chromatic scale, or be a ribbon with no set notes. For clarity, the pitch on the DUO is analogue, so there is no guarantee that the notes generated by the ribbon keyboard will be in tune.

The VCO2 pitch can be modified either below or above the pitch of the VCO1. In its middle section, with some care, it can be matched exactly to the VCO1 setting. The switch allows either just the VCO1 or both VCO1 and VCO2 to produce sound. A separate XMOD intensity knob allows the VCO2 to modulate the frequency of the VCO1 oscillator, producing cross-modulation.

The monotron DUO contains the famous Korg MS-20 resonant low pass filter, which can be adjusted for both cut-off frequency and intensity of the resonant frequency. Setting the filter values allows the square wave noise generated by the two oscillators to be shaped into very interesting tones.

The Korg monotron DELAY is a very different device from the DUO. It has two oscillators, but only one at audio frequencies. The audio oscillator produces a saw-tooth wave at a frequency controlled by the ribbon keyboard. On the monotron DELAY there is no capability for playing specific notes as the keyboard is only available in ribbon mode. The second oscillator of the monotron DELAY is a low frequency oscillator (LFO), which can be adjusted from 1Hz up to about 30Hz. This LFO can produce either a triangle wave or a square wave to modulate the main audio oscillator. This is used mainly to apply vibrato to musical tones, or to produce very unusual tone ramps. The intensity and pitch of the LFO are controlled by knobs.

The Korg low pass filter present in the monotron DELAY is only adjustable for its cutoff frequency, so it is less flexible and interesting than the monotron DUO implementation.

The monotron DELAY is really built to showcase the analogue space delay functionality, which can be adjusted in both length of delay, and in intensity of feedback. With about 1 second of delay and 100% or more feedback possible, very short sequences of notes can be played and then built upon.

I’m not particularly musical, but I spent some very pleasant hours playing with the two Korg synthesizers experimenting with the sounds available from their very simple platforms, and used their capabilities to guide me in what to build into my Goldilocks Analogue synthesizer.

The next piece of research was to understand how to generate analogue wave forms using direct digital synthesis, and then how to modify sound of the wave forms using convolution or modulation in the time domain.

Design Specification

Having the two Korg devices as an inspiration, and reading about the original Moog synthesizer capabilities from the 1970’s, made the specification pretty straight forward.

Goldilocks Analogue GUI

The Goldilocks Analogue synthesizer has three oscillators, two of which operate at audio frequencies, being VCO1 and VCO2, and one low frequency oscillator, being LFO. The VCO1 is tuned in octaves at correct concert pitch, so that notes played would be at the right frequency. The VCO2 is pitched relative to the VCO1 pitch, and would range minus one octave to plus one octave (or half the VCO1 frequency to double the VCO1 frequency). The LFO is adjustable over the range from 1 Hz to 40 Hz.

I had decided to let each oscillator take one of two wave forms. For VCO1 I initially chose square wave, and saw tooth wave, to be able to replicate the exact sound of the Korg devices. I’ve since decided to move the saw tooth wave to the VCO2, and replaced it with a sine wave on VCO1. It is good to have the pure tone at the correct frequency for tuning instruments. An A4 from the Goldilocks Analogue Synthesizer will, for example, always be 440Hz.

For VCO2 I selected a triangle wave and a saw tooth wave. And, for the LFO there is a sine wave and a triangle wave available. I should point out that changing the wave form available to each oscillator is no more complicated that replacing the look-up table associated with the setting, and there is space available in the ATmega1284p to store at least another 4 separate wave form tables in flash memory, even without extending to on-board SPI EEPROM, or uSD storage.

In the mixing section the intensity or volume of each of VCO1 and VCO2 can be set. It is possible to turn off either oscillator. The intensity of the LFO effect is controlled too. The LFO modulates both the VCO1 and the VCO2. The final input is the cross modulation of VCO1 by the VCO2. Very interesting tonality is created by modulating VCO1 by pitches very close to its own frequency.

Each note is put through an exponential Attack and Release envelope, to give the note some shape. The mixed signal is then be sent to the voltage controlled filter. Using the current set up, the sample rate is 16,000 samples/second, which is enough to produce 6 octaves. The upper two octaves remain implemented, but are not reconstructed accurately. I have implemented a Biquad IIR filter to enable the output to be high, low, or band pass filtered. The default set up is for low pass filtering. The filter -3dB frequency, and the ringing levels can be adjusted for different musical effect.

Following the filter stage, the signal enters the space delay stage. The space delay stage can have only about half a second of delay, because of the RAM limitations (16kByte) of the ATmega1284p. So up to 6700 16 bit samples are supported by the space delay function. Samples are recovered from the delay buffer, and mixed with the new signals, then injected back into the delay loop. This creates an infinite loop of samples, depending on the amount of feedback set by the FEEDBACK control.

The final signal output level is controlled by a MASTER volume control. Additionally, an EEPROM STO and RCL capability for the settings has been implemented. Only the most recent settings are stored, which can be recalled when power is restored.

As the keyboard notes are generated using a look up table, multiple keyboard tuning options are possible. I have implemented Concert Tuning (A4 = 440Hz) and Equal Temperament (commonly used for pianos), and Verdi or Stradivari tuning (C4 = 256Hz) with Just Intonation Equal Fifths as an alternative. There is a toggle to chose between either these two options. Any tuning can be generated, and then loaded as the note table.

GUI Implementation

The GUI of the solution depends on a Gameduino 2 screen, which is based on the FTDI Chip FT800 EVE GPU device. The FT800 was the first EVE GPU available from FTDI and it can only support single touch. This limitation makes it only partially useful as a product to support this application. The most interesting sounds are generated by bending the controls whilst playing the notes. Fortunately there are newer EVE GPU devices that support multi-touch and they would make a better platform if this synthesizer were to become more than just a demonstration.

The GUI makes extensive use of FT800 co-processor widget capabilities being dials, toggles, keys, and text. Some examples below.

// text
FT_GPU_CoCmd_Text_P(phost, 300,  8, 27, OPT_CENTER, PSTR("VCF"));
FT_GPU_CoCmd_Text_P(phost, 300, 25, 26, OPT_CENTER, PSTR("CUTOFF"));
FT_GPU_CoCmd_Text_P(phost, 300, 95, 26, OPT_CENTER, PSTR("PEAK"));

// toggles
FT_GPU_CoCmd_Toggle_P(phost, 13,242,46,18, OPT_3D, synth.lfo.wave, PSTR("SIN" "\xFF" "TRI"));

FT_GPU_CoCmd_Toggle_P(phost, 405,130,60,26, OPT_3D, synth.kbd_toggle, PSTR("CONCRT" "\xFF" "VERDI"));

// dials
FT_GPU_CoCmd_Dial(phost, 365,125,20, OPT_3D, synth.delay_feedback); // DELAY FEEDBACK

FT_GPU_CoCmd_Dial(phost, 440,55,26, OPT_3D, synth.master); // MASTER

The integrated touch tracking capability makes it very easy to parse touch into specific commands.

readTag = FT_GPU_HAL_Rd8(phost, REG_TOUCH_TAG);

if (readTag > 0x80)// tag is greater than 0x80 and therefore is a dial.
	TrackRegisterVal.u32 = FT_GPU_HAL_Rd32(phost, REG_TRACKER);

	switch (TrackRegisterVal.touch.tag)
	case (VCO1_PITCH):
		synth.vco1.pitch = TrackRegisterVal.touch.value & 0xe000;
	// continues...

This integrated touch tracking capability can return which dial (slider / scroll bar) has been touched, and the relative position of the touch. This same position value can then be used in the display command to set the position of the dial (slider / scroll bar), providing direct feedback on the GUI.

The main GUI task simply calls the touch function, and if there is a touch recorded the GUI is updated, and the revised settings entered into the analogue audio control structure. Otherwise if there are no touches recorded there are no processor cycles wasted updating the display. The FT800 EVE GPU continues to display the same content until a new display list is loaded into the GPU memory.

When a keyboard touch is recorded, the tone generation information is updated, and this then directly impacts the output tone generated by the audio section.

//  setting the phase increment for VCO1 is frequency * LUT size / sample rate.
//  << 1 in SAMPLE_RATE is residual scale to create 24.8 fixed point number.
// The LUT is already pre-scaled << 7 in the calculation.
// The LUT can't be pre-scaled to << 8 because this creates numbers too large for uint32_t to hold,
// and we want to allow the option to vary the SAMPLE_RATE at compilation time, so it has to stay in the calculation.
synth.vco1.phase_increment = (uint32_t)pgm_read_dword(synth.note_table_ptr + stop * NOTES + note) / (SAMPLE_RATE >> 1);

// set the VCO2 phase increment to be -1 octave to +1 octave from VCO1, with centre dial frequency identical.
if (synth.vco2.pitch & 0x8000) // upper half dial
	synth.vco2.phase_increment = ((synth.vco1.phase_increment >> 4) * synth.vco2.pitch ) >> 11;
else // lower half dial
	synth.vco2.phase_increment = (synth.vco1.phase_increment >> 1) + (((synth.vco1.phase_increment >> 4) * synth.vco2.pitch) >> 12);

// set the LFO phase increment to be from 0 Hz to 32 Hz.
synth.lfo.phase_increment = ((uint32_t)synth.lfo.pitch * LUT_SIZE / ((uint32_t)SAMPLE_RATE << 4) );

The phase increment desired, respective to the relevant tone desired, is read from a look up table containing 8 octaves each of 12 notes for VCO1. VCO2 phase increment is then set as a proportion of VCO1. And LFO phase increment is set to range from 0 to around 30 Hz. With this information, and the selected wave form look up table, the audio implementation can do its thing.

Audio Implementation

The synthesizer audio section is implemented in one function, that is executed each time a new sample is generated. This means at 12,000 samples/ second sample generation frequency, we have 83 micro seconds to generate the final sample to be pushed to the Goldilocks Analogue MCP4822 12 bit dual channel DAC.

The current sample generation routine takes under 45 micro seconds to complete with 3 Oscillators running, so there is a little head room still available. With some further coding improvements it was possible to raise the sample frequency to 16,000 samples/sec as the sample generation frequency. The below logic trace shows the main SPI interface (SCK, MISO, MOSI, _SS) delivering commands to the EVE GPU, and the lower MSPI interface (MSPI SCK, MSPI MOSI, MSPI PING) providing the calculated samples, every 83 micro seconds, to the DAC.

Goldilocks Analogue Synthesizer, with 3 Oscillators operating.

Goldilocks Analogue Synthesizer, with 3 Oscillators operating.

It is clear to see that two EVE GPU transactions are being interrupted by the DAC output, but because the main SPI interface is not changing state the transaction is faultlessly resumed once the DAC interrupt is completed.

In contrast, when there are no oscillators running because no key is pressed, the sample generation routine takes just 28 micro seconds to complete. The logic trace below shows the change of state from 0 to 3 oscillators.

Goldilocks Analogue, with no Oscillators operating.

Goldilocks Analogue, with no Oscillators operating.

There is little time available to calculate sample values in real time, so all of the samples are pre-calculated and are stored in look-up tables (LUT). Each LUT contains 4096 16 bit samples, which gives 12 significant bits of accuracy for the values. I chose 4096 samples because the ATmega1284p has sufficient storage to support multiple tables of this size in its flash memory. Smaller LUTs would sacrifice accuracy, and larger LUTs would compromise on the number of available wave forms.

I have prepared LUTs for sine wave, square wave, triangle wave, and saw tooth wave options. Another advantage of the LUT approach is that better bandwidth optimised LUT values can be substituted without changing the code. Also, LUTs allow completely arbitrary waveforms could be used if desired to obtain specific timbre or nuances of sound.

The sample generation code starts with the LFO oscillator using a direct digital synthesis model. Each oscillator sample is calculated identically by stepping through the LUT with a phase increment based on the frequency of the note required, but VCO2 phase increment is modified by the LFO output and the VCO1 phase increment is modified by both VCO2 and LFO outputs.

Code shown here assumes that both LFO and VCO2 output wave forms have already been calculated.

///////////// Now do the VCO1 ////////////////////

// This will be modulated by the VCO2 value (depending on the XMOD intensity),
// and the LFO intensity.
if( synth.vco1.toggle )
	// Increment the phase (index into waveform LUT) by the calculated phase increment.
	// Both the phase and phase_increment are stored as 24.8 in uint32_t.
	// The fractional component of the phase and phase_increment is needed to ensure the wave
	// is tracked accurately.
	synth.vco1.phase += synth.vco1.phase_increment;

	// calculate how much the LFO affects the VCO1 phase increment
	if (synth.lfo.toggle)
		// increment the phase (index into LUT) by the calculated phase increment including the LFO output.
		synth.vco1.phase += (uint32_t)outLFO; // increment on the fractional component 8.8, limiting the effect.

	// calculate how much the VCO2 XMOD affects the VCO1 phase increment
	if (synth.vco2.toggle)
		// increment the phase (index into LUT) by the calculated phase increment including the LFO output.
		synth.vco1.phase += (uint32_t)outXMOD; // increment on the fractional component 8.8, limiting the effect.

	// if we've gone over the waveform LUT boundary -> loop back
	synth.vco1.phase &= 0x000fffff; // this is a faster way doing the table
						// wrap around, which is possible
						// because our table is a multiple of 2^n.
						// Remember the lowest byte (0xff) is fractions of LUT steps.
						// The table is 0xfff.ff bytes long.

	currentPhase = (uint16_t)(synth.vco1.phase >> 8); // remove the fractional phase component.

	// get first sample from the defined LUT for VCO1 and store it in temp1
	temp1 = pgm_read_word(synth.vco1.wave_table_ptr + currentPhase);
	++currentPhase; // go to next sample

	currentPhase &= 0x0fff;	// check if we've gone over the boundary.
				// we can do this because it is a multiple of 2^n.

	// get second sample from the LUT for VCO1 and put it in temp2
	temp2 = pgm_read_word(synth.vco1.wave_table_ptr + currentPhase);

	// interpolate between samples
	// multiply each sample by the fractional distance
	// to the actual location value
	frac = (uint8_t)(synth.vco1.phase & 0x000000ff); // fetch the lower 8bits

	// the optimised assembly code Multiply routines come from Open Music Labs.
	MultiSU16X8toH16Round(temp3, temp2, frac);

	// scaled sample 2 is now in temp3, and since we are done with
	// temp2, we can reuse it for the next result
	MultiSU16X8toH16Round(temp2, temp1, 0xff - frac);
	// temp2 now has the scaled sample 1
	temp2 += temp3; // add samples together to get an average
	// our resultant wave is now in temp2

	// set amplitude with volume
	// multiply our wave by the volume value
	MultiSU16X16toH16Round(outVCO1, temp2, synth.vco1.volume);
	// our VCO1 wave is now in outVCO1

The next piece of the audio process is to mix the two oscillators VCO1 and VCO2, and then calculate the space delay required. This is where the resonant low pass filter is implemented.

////////////// mix the two oscillators //////////////////
// irrespective of whether a note is playing or not.
// combine the outputs
temp1 = (outVCO1 >> 1) + (outVCO2 >> 1);

///////// Resonant Low Pass Filter here  ///////////////
IIRFilter( &filter, &temp1);

///////// Do the space delay function ///////////////////

// Get the number of buffer items we have, which is the delay.
MultiU16X16toH16Round( buffCount, (uint16_t)(sizeof(int16_t) * DELAY_BUFFER), synth.delay_time);

// Get a sample back from the delay buffer, some time later,
if( ringBuffer_GetCount(&delayBuffer) >= buffCount )
	temp0.u8[1] = ringBuffer_Pop(&delayBuffer);
	temp0.u8[0] = ringBuffer_Pop(&delayBuffer);
else // or else wait until we have samples available.
	temp0.i16 = 0;

if (synth.delay_time) // If the delay time is set to be non zero,
	// do the space delay function, irrespective of whether a note is playing or not,
	// and combine the output sample with the delayed sample.
	temp1 += temp0.i16;

	// multiply our sample by the feedback value
	MultiSU16X16toH16Round(temp0.i16, temp1, synth.delay_feedback);
	ringBuffer_Flush(&delayBuffer);	// otherwise flush the buffer if the delay is set to zero.

// and push it into the delay buffer if buffer space is available
if( ringBuffer_GetCount(&delayBuffer) <= buffCount )
	ringBuffer_Poke(&delayBuffer, temp0.u8[1]);
	ringBuffer_Poke(&delayBuffer, temp0.u8[0]);
// else drop the space delay sample (probably because the delay has been reduced).

////////////// Finally, set the output volume //////////////////
// multiply our wave by the volume value
MultiSU16X16toH16Round(temp2, temp1, synth.master);

// and output wave on both A & B channel, shifted to (+)ve values only because this is what the DAC needs.
*ch_A = *ch_B = temp2 + 0x8000;

This generates the required output waveforms that make the Goldilocks Analogue Synthesiser work.

The second order Biquad IIR filter code has been implemented in a general way, enabling multiple filters to be applied to the sample train. Set up for Low Pass, Band Pass, and for High Pass have been implemented. The coefficients and state variables for each filter are maintained in a structure.

// second order IIR -- "Direct Form I Transposed"
//  a(0)*y(n) = b(0)*x(n) + b(1)*x(n-1) +  b(2)*x(n-2)
//                   - a(1)*y(n-1) -  a(2)*y(n-2)
// assumes a(0) = IIRSCALEFACTOR = 32 (to increase calculation accuracy).


typedef struct {
	uint16_t sample_rate;	// sample rate in Hz
	uint16_t cutoff;	// normalised cutoff frequency, 0-65536. maximum is sample_rate/2
	uint16_t peak;		// normalised Q factor, 0-65536. maximum is Q_MAXIMUM
	int16_t b0,b1,b2,a1,a2;	// Coefficients in 8.8 format
	int16_t xn_1, xn_2;	//IIR state variables
	int16_t yn_1, yn_2;	//IIR state variables
} filter_t;

void setIIRFilterLPF( filter_t *filter ) // Low Pass Filter Setting
	if ( !(filter->sample_rate) )
		filter->sample_rate = SAMPLE_RATE;

	if ( !(filter->cutoff) )
		filter->cutoff = UINT16_MAX >> 1; // 1/4 of sample rate = filter->sample_rate>>2

	if ( !(filter->peak) )
		filter->peak =  (uint16_t)(M_SQRT1_2 * UINT16_MAX / Q_MAXIMUM); // 1/sqrt(2) effectively

	double frequency = ((double)filter->cutoff * (filter->sample_rate>>)) / UINT16_MAX;
	double q = (double)filter->peak * Q_MAXIMUM / UINT16_MAX;
	double w0 = (2.0 * M_PI * frequency) / filter->sample_rate;
	double sinW0 = sin(w0);
	double cosW0 = cos(w0);
	double alpha = sinW0 / (q * 2.0f);
	double scale = IIRSCALEFACTOR / (1 + alpha); // a0 = 1 + alpha

	filter->b0	= \
	filter->b2	= float2int( ((1.0 - cosW0) / 2.0) * scale );
	filter->b1	= float2int(  (1.0 - cosW0) * scale );

	filter->a1	= float2int( (-2.0 * cosW0) * scale );
	filter->a2	= float2int( (1.0 - alpha) * scale );

// interim values in 24.8 format
// returns y(n) in place of x(n)
void IIRFilter( filter_t *filter, int16_t * xn )
	int32_t yn;	// current output
	int32_t  accum;	// temporary accumulator

	// sum the 5 terms of the biquad IIR filter
	// and update the state variables
	// as soon as possible
	filter->xn_2 = filter->xn_1;

	yn += accum;
	filter->xn_1 = *xn;

	yn += accum;

	yn -= accum;
	filter->yn_2 = filter->yn_1;

	yn -= accum;

	filter->yn_1 = yn >> (IIRSCALEFACTORSHIFT + 8); // divide by a(0) = 32 & shift to 16.0 bit outcome from 24.8 interim steps

	*xn = filter->yn_1; // being 16 bit yn, so that's what we return.

Hardware Implementation

I sell on Tindie

The Goldilocks Analogue Prototype 3 is working very well, and it has resolved some of the issues of the second prototype. Using the USART1 MSPIM mode to drive the MCP4822 DAC allows the GUI to use the SPI bus for the Gameduino 2 GUI without conflicts. This is the only way that the rigorous timing for audio output can be maintained, given the heavy SPI usage required to drive the GPU co-processor.

Goldilocks Analogue - Prototype 3

The Atmel AVR ATmega1284p in the Goldilocks Analogue Prototype 3 is running at 24.576MHz. This is significantly above the specification (20MHz at 5V), but remembering that the specification for AVR ATmega devices covers an extended temperature range (that would kill a human) and it is unlikely that the Goldilocks Analogue would be used in extreme temperature situations, I’ve had no problems with this processor frequency to date.

There are two reasons for over-clocking the ATmega1284p. The first is that it is simply not possible to make the required calculations within the time budget available at the maximum specification CPU frequency of 20MHz or even more extreme at the standard Arduino rate of 16MHz.

The second reason is related to the generation of exact audio sampling frequencies. With a CPU clock of 24.576MHz, the 8 bit timer with pre-scaling can generate EXACT audio sample timing at 8kHz, 12kHz, 16kHz, 32kHz, and 48kHz. Using a 16 bit timer, we can also generate very close approximations to 44.1kHz, if required.

The routine to transfer samples does not need to consume precious 16 bit timer resources, which are useful to produce PWM for motor control. Retaining the capability to manage two motors (using the two 16 bit timers) is fairly important outcome.

The interrupt for generating the wave forms does only two things; write the sample values to the DAC, and then calculate the new sample value for the next sample time. The samples are written to the DAC first to ensure that the output is not jittered by the possibility of variable processing time in the audio handler routine. This can happen if (for example) one of the VCO is turned off, removing the sample calculation code from the code execution path.

ISR(TIMER0_COMPA_vect) __attribute__ ((hot, flatten));
	// MCP4822 data transfer routine
	// move data to the MCP4822 - done first for regularity (reduced jitter).
	// &'s are necessary on data_in variables
	DAC_out (ch_A_ptr, ch_B_ptr);

	// audio processing routine - do whatever processing on input is required - prepare output for next sample.
	// Fire the global audio handler, if set.
	if (audioHandler!=NULL)
		audioHandler(ch_A_ptr, ch_B_ptr);

Gameduino 2 with Goldilocks and EVE

My Gameduino 2 was delivered just a few weeks ago, and I’ve spent too much time with it already. It is the latest Kickstarter project by James Bowman. James has written a Gameduino 2 Book too.

Sourcecode for the below examples is located at GitHub AVRfreeRTOS lib_ft800 and the application is located at GA_Synth.

Recently, I’ve used the Gameduino 2 together with the Goldilocks Analogue to implement a multi-oscillator audio synthesizer GUI, using many FTDI EVE GPU co-processor widgets. The use of widgets linked with the integrated touch functionality really simplifies the programming of complex GUIs.

The ability to add a large touch screen, with integrated audio and accelerometer to any Arduino project is a great thing. Previously, you had to move to 32 bit processors with LVDS interfaces to work with LCD screens, but the new FT800 EVE Graphical Processing Unit (GPU) integrates all of the graphic issues and allow you to drive it with a very high level object orientated graphics language. For example it takes just one command to create an entire clock face with hour, minute, and second-hands.

The Gameduino 2, via the FT800 EVE chip, provides the following capabilities:

  • 32-bit internal color precision
  • OpenGL-style command set
  • 256 KBytes of video RAM
  • smooth sprite rotate and zoom with bilinear filtering
  • smooth circle and line drawing in hardware – 16x antialiased
  • JPEG loading in hardware
  • audio tones and WAV audio output
  • built-in rendering of gradients, text, dials, sliders, clocks and buttons
  • intelligent touch capabilities, where objects can be tagged and recognised.

The FT800 runs the 4.3 inch 480×272 TFT touch panel screen at 60 Hz and drives a mono headphone output.

EVE Block Diagram

First off, there’s a demo of some of the capabilities of the Gameduino 2. I’ll come to the drivers later, but the Arduino compatible platform used here is the Goldilocks ATmega1284P from Freetronics. The Goldilocks is in my opinion the best platform to use with the Gameduino 2. Firstly there is the extra RAM and Flash capabilities in line with the ATmega1284p MCU. But also importantly the Goldilocks holds the Pre-R3 Arduino Uno connector standard, with the SPI pins located correctly on Pins 11, 12, and 13. And the INT0 interrupt located on Pin 2. This means that it can be used with the Gameduino 2, out of the box. No hacking required.

The Goldilocks Analogue is an upgraded replacement for the original Goldilocks, with many additional features. Thank you for pledging on the Goldilocks Analogue Kickstarter Project page, which was successfully closed on November 19th 2015, with 124% funding. Now that the Kickstarter pledges have been shipped, the new Goldilocks Analogue is now available on Tindie.

I sell on Tindie

Must be addicted to these touch screens. I’ve just received an Australian designed 4D Systems FT843 Screen. It has possibly an identical screen to the Gameduino 2, but is based on a R3 Arduino shield format (SPI on ICSP) called the ADAM (Arduino Display Adapter Module), which means that it will work on any current Arduino hardware, without hacking. The FT843 ADAM supports a RESET line, which resolves the only problem I’ve noted with the Gameduino 2. Unfortunately, audio is not supported by a 3.5mm jack but rather by a pin-out option. The FT843 uses Swizzle 0, unlike the Gameduino 2 which uses Swizzle 3, and has the Display SPI Select on either D9 or D4 rather than on D8 like the Gameduino 2. Other than these simple configuration options, it similar.

4D Systems FT843 on Goldilocks 1284p

4D Systems FT843 on Goldilocks 1284p


The screen shows 5 sets of demonstrations. These demos are provided by FTDI, and typically in an Arduino Uno you would have to choose which of the 5 sets you want to see. With the extra capabilities of the Goldilocks, it is possible to load all of them simultaneously in 110kB of flash.

Set 0 focusses on individual commands that are loaded into the Display List. The Display List is essentially a list of commands that is executed or rendered for each frame of display. A Display List will be rendered indefinitely, until it is swapped by another Display List. Two Display Lists are maintained in a double buffering arrangement. One is written, whilst the other is displayed.

Set 1 exhibits some of the co-processor command capabilities, that allow complex objects to be created with only one command. A clock, slider, dial, or a rows of buttons can be created easily in this manner.

Set 2 shows the JPEG image rendering capabilities in RGB and in 8 bit mono.

Set 3 demonstrates custom font capabilities. There are 16 fonts available in the ROM of the FT800 EVE, but you can add your own as is desired.

Set 4 shows some advanced co-processor capabilities, such as touch tag recognition, no touch (zero MCU activity) screensaver, capturing screen sketches, and inbuilt audio options.

The main screen shows an analogue clock that is drawn with one co-processor command. Real time is generated by a 32,768Hz Crystal driving the Goldilocks Timer 2 for a system clock. The accuracy of the clock is limited only by the accuracy of the watch crystal, and I’ve built mine with a 5ppm version, which should be enough to keep within a few seconds per month.

Sample Application

The FTDI provided sample application covers most of the available commands and options for the FT800 EVE GPU.

The FT_SampleApp.h file contains definitions of functions implemented for the main application. These code snippets are not really useful beyond demonstrations of capability of the GPU, but never the less demonstrate how each specific feature of the FT800 EVE GPU can be utilised.


Because the FT800 EVE GPU has a very capable object orientated graphics language, the FTDI drivers present a very capable high level interface to the user. FTDI have prepared an excellent starting point from which I could easily make customisations suitable for the AVR ATmega Arduino hardware that I prefer to use.

The FTDI driver set is separated into a Command Layer, and into a Hardware Abstraction Layer (HAL). This separation makes it easy to customise for the AVR ATmega platform, but retains the standard FTDI command language for easy implementation of their example applications, and portability of code written for their command language.

To use the FT800 EVE drivers for the Gameduino 2 it is only necessary to include the FT_Platform.h file in the main program. This file contains references to all of the other files needed.

#include "../lib_ft800/FT_DataTypes.h"
#include "../lib_ft800/FT_X11_RGB.h"
#include "../lib_ft800/FT_Gpu.h"
#include "../lib_ft800/FT_Gpu_Hal.h"
#include "../lib_ft800/FT_Hal_Utils.h"
#include "../lib_ft800/FT_CoPro_Cmds.h"
#include "../lib_ft800/FT_API.h"

The FT_DataTypes.h file contains FTDI type definitions for the specific data types needed for the FT800 EVE GPU. This is mainly used to abstract the drivers for varying MCU. For the AVR it is not absolutely necessary, but it will help when the code is used on other platforms.

The FT_X11_RGB.h file contains the standard colour set used in X11 colours and on the Web, which are stored PROGMEM. I’ve written a small macro that will insert these into commands needing 24 bit colour settings. These colours will be stored and referenced from PROGMEM when they are called from either of the X11 specific macros defined in FT_Gpu.h If they are not called from the program, they will be discarded by the linker and not waste space in the final linked program.

X11 Colours

The FT_Gpu.h file contains all the definitions for command and register setting options. I have significantly rearranged the layout and comments in this file, compared to the FTDI version. Hopefully it is arranged in a way that allows options applying to specific commands and registers to be quickly located.

By writing DL commands to the Display List which are configured by the options in the FT_Gpu.h file it is possible to control most of the low level functions in the FT800 EVE GPU. The Display List is used by the FT800 GPU to render the screen, so it is only the contents of the active Display List that appear on the screen.

In the FT_Gpu_Hal.h file the commands specific to the SPI bus (or the I2C bus if this transfer mechanism is being used) are defined.

I have simplified out some HAL options provided by FTDI for high performance MCU, that might be constrained writing to the SPI bus at only 30MHz, the maximum FT800 SPI bus rate. The Goldilocks SPI bus only runs at 11MHz, and the standard Arduino Uno SPI bus only runs at 8 MHz, so those optimisations don’t help, and they also consume RAM for streaming buffers.

But, I have integrated a multi-byte SPI transfer into the HAL, which don’t use additional RAM buffer space, as they write via a pointer. This is probably the best way to work the SPI bus in the Arduino environment. I have also implemented multi byte SPI transfer directly from the PROGMEM for Strings, and for precomputed commands.

As a preferred option, I’ve implemented PROGMEM storage of Strings for all commands. The commands utilising RAM storage of Strings are retained for compatibility, and to allow computed Strings to be used.

All of the FTDI provided commands now have optional *_P variants which take PROGMEM strings, rather than RAM strings. This saves eleven hundred bytes of RAM used for strings, just in the demonstration programs provided by FTDI and shown in the Demo!

The FT_Hal_Util.h file contains some simple utility macros.

The FT_CoPro_Cmds.h file contains definitions for all of the available co-processor commands. These command are written to the co-processor command buffer, and are used to generate low level commands that appear in the Display List and be rendered for each frame.

Many of the co-processor commands replicate functionality of setting specific registers with options via the Display List GPU commands. This is useful because it is possible to programme the co-processor to implement a task and remain at the object orientated view of the screen, even though the a individual command may be a simple GPU setting that could have been done at Display List command level. Having all the commands available at co-processor level obviates the need to switch between the two “modes” of operation and thought.

I extracted a few of the standard functions that are needed irrespective of the specific application into an API. The FT_API.h file contains these simple command sequences, for booting up the Gameduino 2, and for managing the screen brightness. It also contains precalculated simplified sin, cos, and atan functions useful when drawing circles and clocks.

The API level also contains calls on the Hardware Abstraction Layer that are simply passed through. These calls are flattened by avr-gcc to save digging ourselves into a stack wasting function call hole.

And, of course, everything is integrated into the freeRTOS v8.0.0 port that I support on Sourceforge, AVRfreeRTOS, which gives non-blocking timing, tasks, semaphores, queues, and all aspects of freeRTOS that are so great.

As an example of the power of this combination of freeRTOS and the FT800 object orientated command language we can describe the method used to create an accurate well rendered clock on the Gameduino 2 screen. Using the 3 commands below, we obtain the clock face seen in my demo video main screen.

time(&currentTime); // get a time stamp in current seconds elapsed from Midnight, Jan 1 2000 UTC (the Y2K 'epoch'), as maintained by freeRTOS.
localtime_r(&currentTime, &calendar); // converts the time stamp pointed to by currentTime into broken-down time in a calendar structure, expressed as Local time.
FT_GPU_CoCmd_Clock(phost, FT_DispWidth - (FT_DispHeight/2), FT_DispHeight/2, FT_DispHeight/2 - 20, OPT_3D, calendar.tm_hour, calendar.tm_min, calendar.tm_sec, 0); // draw a clock in 3D rendering.

I’ve updated the clock function to include a touch screen time setting interface. Using the FT800 Touch Tags, and Button generation, this process is really incredibly easy.


I’ve taken the liberty of borrowing some of James’ pictures for this story. They can originally be found here.

Gameduino 2 Pinout

Note that because of the wrap around connector and cable for the LCD screen, it is not possible to use the Arduino R3 pin out. The SPI bus pins are located at the traditional location on Pin 11 though Pin 13. Unless you want to hack your board, you’re limited to using Arduino Uno style boards.

Gameduino 2 Shield

Unfortunately, the FTDI FT800 Reset pin has not been implemented by the Gameduino 2. Using an ISP to programme the Arduino usually “accidentally” puts the FT800 EVE GPU into an unsupported state. This means that the Gameduino 2 and Arduino usually have to be power-cycled or hard Reset following each programming iteration. It would have been good to tie the FT800 Reset pin to the Arduino Reset pin via a short (ms) delay chip, to obviate the need to remove power to generate the hard Reset for the FT800.

Hello World & other examples

I thought it might be interesting to compare the code required to achieve the demonstration outcomes that James Bowman provides on the Gameduino2 site, with the code required to achieve the same result using freeRTOS and the FTDI style driver. So I’ve implemented three simple examples, “Hello World”, “Sprites”, and “Blobs” from his library.

All of the examples have been built using an Arduino Uno ATmega328p as the MCU hardware platform.


The Hello World application simply initialises the Gameduino2, sets the colour to which the screen shall be cleared, and then writes text with the OPT_CENTER option to center it in the X and Y axis. As there is no delay, this is written as often and as fast as the MCU can repeat the loop.


void setup()

void loop()
  GD.cmd_text(240, 136, 31, OPT_CENTER, "Hello world");

The same result can be generated in C using freeRTOS and the FTDI Drivers. I have commented extensively within the code below.

/* freeRTOS Scheduler include files. */
/* these four header files encompass the full freeRTOS real-time OS features,
   of multiple prioritised tasks each with their own stack space, queues for moving data,
   and scheduling tasks, and semaphores for controlling execution flows */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this variable comes from.
                                               // It is automatically included, so this line is actually unnecessary.

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice.
                                               // Saves a few bytes of stack because the return from main() is not implemented.

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD.
                                               // typically multiple concurrent tasks are defined,
                                               // but in this case to replicate the Arduino environment, just one is implemented.

/* Main program loop */
int main(void)
  xTaskCreate(            // create a task to write on the Gameduino 2 LCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                // number of bytes for this task stack
    ,  NULL
    ,  3		  // priority of this task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();  // now freeRTOS has taken over, and the pre-emptive scheduler is running.


/* Tasks                                                     */

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
  (void) pvParameters;

  FT_API_Boot_Config();  // initialise the Gameduino 2.

  while(1)               // a freeRTOS task should never return
    FT_API_Write_CoCmd( CMD_DLSTART );                       // initialise and start a Display List
//  FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0x10, 0x30, 0x00) ); // set the colour to which the screen is cleared (using RGB triplets) as in GD2 library OR
    FT_API_Write_CoCmd( CLEAR_COLOR_X11(FORESTGREEN) );      // set the colour to which the screen is cleared (using X11 colour definitions)
    FT_API_Write_CoCmd( CLEAR(1,1,1) );                      // clear the screen

    FT_GPU_CoCmd_Text_P(phost,FT_DispWidth/2, FT_DispHeight/2, 31, OPT_CENTER, PSTR("Hello world"));
      // write "Hello World" to X and Y centre of screen using OPT_CENTER  with the largest font 31
      // The string "Hello world" is stored in PROGMEM
      // Functions with *_P all use PROGMEM Strings (and don't consume RAM)
      // FT_DispWidth and FT_DispHeight are global variables set to orientate us in a flexible consistent way,
      // without hard coding the screen resolution.

    FT_API_Write_CoCmd( DISPLAY() );                         // close the Display List (DL) opened by CMD_DLSTART()
    FT_API_Write_CoCmd( CMD_SWAP );                          // swap the active Display List (double buffering), to display the new "Hello World" commands written to the Display List


The Sprites application is similar to the original one built for the Gameduino, but here each sprite is rotating around a random point. The 2001 random points are stored in a PROGMEM array sprites. This takes 8K of flash. A second PROGMEM array circle holds the 256 XY coordinates to make the sprite move in a circle. The only RAM used is a single byte t used to keep track of the current rotation position, by counting iterations.


#include "sprites_assets.h"

void setup()
  GD.copy(sprites_assets, sizeof(sprites_assets));

static byte t;

void loop()
  byte j = t;
  uint32_t v, r;

  int nspr = min(2001, max(256, 19 * t));

  PROGMEM prog_uint32_t *pv = sprites;
  for (int i = 0; i < nspr; i++) {
    v = pgm_read_dword(pv++);
    r = pgm_read_dword(circle + j++);
    GD.cmd32(v + r);

  GD.LineWidth(28 * 16);
  GD.Vertex2ii(240 - 110, 136, 0, 0);
  GD.Vertex2ii(240 + 110, 136, 0, 0);


  GD.cmd_number(215, 110, 31, OPT_RIGHTX, nspr);
  GD.cmd_text( 229, 110, 31, 0, "sprites");


The code in freeRTOS is similar. I have commented within the code.

/* freeRTOS Scheduler include files. */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

// The include file containing the sprite graphics, and the special command sequence
#include "sprites_assets.h

/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this variable comes from

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD


/* Main program loop */
int main(void)
  xTaskCreate(             // create a task to write on the Gameduino 2 LCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                 // number of bytes for this task stack
    ,  NULL
    ,  3                   // priority of task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();   // now freeRTOS has taken over, and the pre-emptive scheduler is running


/* Tasks                                                     */

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
  (void) pvParameters;

  uint8_t t = 0;         // iterate over the code for 255 times, before restarting with 256 sprites where t = 0

  FT_API_Boot_Config();  // initialise the Gameduino 2.
  FT_GPU_HAL_WrCmdBuf_P(phost, sprites_assets, sizeof(sprites_assets));
    // Copy James' magic list of commands into the command buffer.
    // These co-processor commands are "compiled" into their 4 byte equivalents, and I haven't decoded them in detail.
    // But, since the FT800 is reading the same double word codes, it doesn't really matter how they're generated.

  while(1)               // a freeRTOS task should never return
    FT_API_Write_CoCmd( CMD_DLSTART );       // initialise and start a Display List (DL)
    FT_API_Write_CoCmd( CLEAR(1,1,1) );      // clear the screen

    FT_API_Write_CoCmd( BEGIN(BITMAPS) );    // start to write BITMAPS into the DL
    uint8_t j = t;
    uint32_t v;
    uint32_t r;
    int16_t nspr = min(2001, max(256, 19 * t));
    ft_prog_uint32_t * pv = sprites;         //  pv is the sprite BITMAP pointer

    for (uint16_t i = 0; i < nspr; ++i) {
      v = pgm_read_dword(pv++);              // determine which sprite we're controlling
      r = pgm_read_dword(circle + j++);      // circle is the rotation control
      FT_GPU_HAL_WrCmd32(phost, v + r);      // the sprite address and the location are written here to the co-processor
    FT_API_Write_CoCmd( END());              // finish writing BITMAPS into the Display List

    FT_API_Write_CoCmd( BEGIN(LINES) );      // start to write LINES into the Display List
    FT_API_Write_CoCmd( COLOR_RGB(0x00, 0x00, 0x00) );  // set the line colour to black 0x000000
    FT_API_Write_CoCmd( COLOR_A(140) );                 // set alpha channel transparency
    FT_API_Write_CoCmd( LINE_WIDTH( 28 * 16) );
    FT_API_Write_CoCmd( VERTEX2II(240 - 110, 136, 0, 0) );  // start to draw an alpha transparency background line
    FT_API_Write_CoCmd( VERTEX2II(240 + 110, 136, 0, 0) );  // finish the line
    FT_API_Write_CoCmd( END() );             // finish writing LINES into the Display List

    FT_API_Write_CoCmd( RESTORE_CONTEXT() ); // With no prior SAVE_CONTEXT() command, this restores the default colours and values.

    FT_GPU_CoCmd_Number(phost, 215, 110, 31, OPT_RIGHTX, nspr);    // write a number.
    FT_GPU_CoCmd_Text_P(phost, 229, 110, 31, 0, PSTR("sprites"));  // write using a PROGMEM stored string function, to save RAM
      //  phost is a pointer to the context for the Gameduino2.
      //  Mainly used where there may be multiple screens present, but in this case several state and semaphore items are maintained.

    FT_API_Write_CoCmd( DISPLAY() );          // close the active Display List (DL) opened by CMD_DLSTART()
    FT_API_Write_CoCmd( CMD_SWAP );           // Do a DL swap to render the just written DL

    t++;    // t will roll over and will restart the number of sprites to the minimum of 256


blobs is a sketching demonstration, as you paint on the touch screen a trail of circles follows.
The code keeps a history of the last 128 touch positions, and draws the transparent, randomly coloured circles.


#define NBLOBS      128
#define OFFSCREEN   -16384

struct xy {
  int x, y;
} blobs[NBLOBS];

void setup()

  for (int i = 0; i < NBLOBS; i++) {
    blobs[i].x = OFFSCREEN;
    blobs[i].y = OFFSCREEN;

void loop()
  static byte blob_i;
  if (GD.inputs.x != -32768) {
    blobs[blob_i].x = GD.inputs.x << 4;
    blobs[blob_i].y = GD.inputs.y << 4;
  } else {
    blobs[blob_i].x = OFFSCREEN;
    blobs[blob_i].y = OFFSCREEN;
  blob_i = (blob_i + 1) & (NBLOBS - 1);


  for (int i = 0; i < NBLOBS; i++) {
    // Blobs fade away and swell as they age
    GD.ColorA(i << 1);
    GD.PointSize((1024 + 16) - (i << 3));

    // Random color for each blob, keyed from (blob_i + i)
    uint8_t j = (blob_i + i) & (NBLOBS - 1);
    byte r = j * 17;
    byte g = j * 23;
    byte b = j * 147;
    GD.ColorRGB(r, g, b);

    // Draw it!
    GD.Vertex2f(blobs[j].x, blobs[j].y);

The code in freeRTOS is similar, but the touch functionality is derived directly from the FT800 register containing the most recent screen touch location. I have commented within the code.

/* freeRTOS Scheduler include files. */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

#define NBLOBS 128
#define OFFSCREEN -16384

/*———-Global used for HAL management————-*/
extern FT_GPU_HAL_Context_t * phost; // optional, just to make it clear where this comes from

struct xy { // somewhere to store all the blob locations
int16_t x, y;
} blobs[NBLOBS];

/*————–Function Definitions——————-*/

int main(void) __attribute__((OS_main)); // optional, just good practice

static void TaskWriteLCD(void *pvParameters); // define a single task to write to Gameduino 2 LCD


/* Main program loop */
int main(void)
xTaskCreate( // create a task to write on the Gameduino 2 LCD
, (const portCHAR *)"WriteLCD"
, 128 // number of bytes for the task stack
, 3 // priority of task (1 is highest priority, 4 lowest).
, NULL );

vTaskStartScheduler(); // now freeRTOS has taken over, and the scheduler is running


/* Tasks */

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
(void) pvParameters;

FT_API_Boot_Config(); // initialise the Gameduino 2.
FT_API_Touch_Config(); // initialise the FT800 Touch capability.

for (uint8_t i = 0; i > 16) & 0xffff) << 4; // read where x axis touch occurred, and scale it
blobs[blob_i].y = (int16_t)(readTouch & 0xffff) << 4; // read where y axis touch occurred, and scale it
} else {
blobs[blob_i].x = OFFSCREEN; // if there was no touch, draw the blob OFFSCREEN
blobs[blob_i].y = OFFSCREEN;
blob_i = (blob_i + 1) & (NBLOBS – 1); // increment to the next blob for touch interaction

// this is the display interface stuff
FT_API_Write_CoCmd( CMD_DLSTART ); // initialise and start a display list (DL)

FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0xe0, 0xe0, 0xe0) );// set the colour to which the screen will be cleared
FT_API_Write_CoCmd( CLEAR(1,1,1) ); // clear the screen

FT_API_Write_CoCmd( BEGIN(POINTS) ); // start to write POINTS into the Display List (DL)

for (uint8_t i = 0; i < NBLOBS; ++i)
// Blobs fade away and swell as they age
FT_API_Write_CoCmd( COLOR_A(i << 1) ); // set an alpha transparency
FT_API_Write_CoCmd( POINT_SIZE((1024 + 16) – (i << 3)) );

// Random colour for each blob, keyed from (blob_i + i)
uint8_t j = (blob_i + i) & (NBLOBS – 1);
uint8_t r = j * 17;
uint8_t g = j * 23;
uint8_t b = j * 147;
FT_API_Write_CoCmd( COLOR_RGB(r, g, b) );

// Draw it!
FT_API_Write_CoCmd( VERTEX2F(blobs[j].x, blobs[j].y) );

FT_API_Write_CoCmd( END() ); // finish writing POINTS into the active DL

FT_API_Write_CoCmd( DISPLAY() ); // close the active Display List (DL)
FT_API_Write_CoCmd( CMD_SWAP ); // Do a DL swap to render the just written DL

I intend to build a few more demonstrations of the code, and to copy some games that James has already implemented, because I'm not a game designer.