Goldilocks Analogue – Prototyping 4

Just over 6 months since the third iteration of the Goldilocks Analogue Prototyping was started, and now I’ve finished the design for a forth iteration. The Goldilocks Analogue Prototype 4 design is now finished, and I’m working out what the final bill of materials will cost to assemble into a final outcome.

The third prototype was completely successful, and produced the improvements I was looking for. The use of the MSPI Mode on USART1 means that two SPI interfaces can be run in parallel, allowing the DAC to hold its tight timing requirements while slower SD card transactions take place (for example). This was proven through the implementation of a direct digital synthesiser, controlled by a SPI controlled touch screen.

Goldilocks Analogue - Prototype 3

Goldilocks Analogue – Prototype 3

Revision for Prototype 4

The Prototype 3 was supposed to be the final version, and it achieved everything that I set out in the original design specifications. But, then there was some feature creep.

In discussing the TRS 3.5mm audio socket, a better more robust TRRS version was found. The realisation that it would be possible to have a microphone input, without requiring additional board space, led me to experiment with the Adafruit breakout board for the MAX9814 Microphone amplifier, and then to build a very simple Walkie-Talkie demonstration to test the use of audio input (with the integrated ADC), simultaneously with audio output (via the DAC).

Once the use of the MAX9814 was proven, I could implement a reference circuit as an input option. The amplified microphone input is connected to Pin 7 of the Analogue Port A. Conveniently, the MAX9814 delivers the amplified signal at +1.25V with a 2V peak to peak signal. This allows the sample to fall into the range of 0V to 2.56V internal reference voltage for the ATmega ADC, providing the maximum sampling resolution with no further adjustments.

The MAX9814 also includes an integrated microphone biasing circuitry, which is designed to support normal electret microphones.

GoldilocksAnalogueSchematicUpdate

As an alternative input functionality, the Prototype 4 also allows for LINE level inputs. I have used a voltage divider to reference the input signal to 1.25V DC. Although a 2V peak to peak Line level input will overload the Microphone amplifier, rendering the output signal on PA7 unusable, the LINE input is routed to Pin 6 on Port A will have exactly the right range to sample using the internal ATmega ADC voltage reference.

Both Pin 6 and Pin 7 on Port A are outside of the normal Arduino UNO R3 footprint, so the normal functionality of the UNO footprint is not affected by the two input options.

The additional space required for the microphone and line level input circuitry has been created by simplifying the negative supply rail for the Op-Amp. The Op-Amp is provided to support DC to 50k sample per second analogue output. To achieve a linear output from 0v to 4.096V the Op-Amp requires a negative supply voltage. In this revision, I have used a single LTC1983 regulated supply device to provide the negative -3V supply rail. The outcome should be equivalent to the Prototype 3 solution, which used 3 devices.

Board Layout

The final board layout has been completed, and the board is now in discussion for manufacturing.

The GoldilocksAnalogueP4Schematic in PDF format.

Front of board (All Layers)

This is the front of the board showing all of the layers, and the general layout of the devices. The board layout is pretty busy, but still there is sufficient prototyping capability to take all the port pins off-board, or provide on-board breakouts.

GoldilocksAnalogueScreen

Top Layer

This is the Top Layer, which contains all of the devices. There are no devices on the Bottom Layer.

GoldilocksAnalogueTopGoldilocksAnalogueTopFill

Route 2 (GND) Layer

The Ground Layer on Route 2 is unchanged from previous iterations, and provides a solid platform for low noise analogue circuits.

GoldilocksAnalogueRoute2GoldilocksAnalogueRoute2Fill

Route 15 (Vcc) Layer

The Route 15 power supply layer contains all of the supply lines, providing 5V regulated, 5V filtered for analogue AVcc, 3.3V regulated, and -3V regulated.

GoldilocksAnalogueRoute15GoldilocksAnalogueRoute15Fill

Bottom Layer

All the pin outs are defined on the Bottom Layer. In addition to the items previously mentioned, there are two small locations where the Line and Microphone inputs can be cut, and allow the full functionality of PA6 and PA7 to be recovered.

GoldilocksAnalogueBottomGoldilocksAnalogueBottomFill

Pin Mapping

This the map of the ATmega1284p pins to the Arduino physical platform, and their usage on the Goldilocks Analogue

Arduino
UNO R3
328p Feature 328p Pin 1284p Pin 1284p Feature Comment
Analog 0 PC0 PA0
Analog 1 PC1 PA1
Analog 2 PC2 PA2
Analog 3 PC3 PA3
Analog 4 SDA PC4 PA4 PC1 I2C -> Bridge Pads
Analog 5 SCL PC5 PA5 PC0 I2C -> Bridge Pads
Reset Reset PC6 RESET Separate Pin
Digital 0 RX PD0 PDO RX0
Digital 1 TX PD1 PD1 TX0
Digital 2 INT0 PD2 PD2 INT0 / RX1 USART1
Digital 3 INT1 / PWM2 PD3 PD3 INT1 / TX1 USART1
-> MCP4822 SPI MOSI
Digital 4 PD4 PD4 PWM1 / XCK1 16bit PWM
-> MCP4822 SPI SCK
Digital 5 PWM0 PD5 PD5 PWM1 16bit PWM
Digital 6 PWM0 PD6 PD6 PWM2
Digital 7 PD7 PD7 PWM2
Digital 8 PB0 PB2 INT2 <- _INT/SQW DS3231
Digital 9 PWM1 PB1 PB3 PWM0
Digital 10 _SS / PWM1 PB2 PB4 _SS / PWM0 SPI
Digital 11 MOSI / PWM2 PB3 PB5 MOSI SPI
Digital 12 MISO PB4 PB6 MISO SPI
Digital 13 SCK PB5 PB7 SCK SPI
 (Digital 14) PB0  T0 -> SDCard SPI _SS
 (Digital 15) PB1  T1 -> MCP4822 SPI _SS
SCL PC0 SCL I2C – Separate
SDA PC1 SDA I2C – Separate
PC2 TCK JTAG <- _CARD_DETECT
for uSD Card
PC3 TMS JTAG -> MCP4822 _LDAC
PC4 TDO JTAG -> SRAM SPI _SS
PC5 TDI JTAG -> EEPROM SPI _SS
PC6 TOSC1 <- 32768Hz Crystal
PC7 TOSC2 -> 32768Hz Crystal
XTAL1 PB6
XTAL2 PB7
 (Analog 6) PA6 -> LINE Input
 (Analog 7) PA7 -> MIC Input

Goldilocks Analogue Synthesizer

For the past year, I’ve been prototyping an Arduino clone, the Goldilocks Analogue, which incorporates advanced analogue output capabilities into the design of the original Goldilocks with ATmega1284p AVR MCU and uSD card cage. Recently the design scope crept up to include two SPI memory devices (EEPROM, SRAM, FRAM), and microphone audio input. But, before I go through another prototype cycle, I thought it would be a good idea to build some demonstration applications, showcasing the capabilities of an arduino compatible platform with integrated analogue output and have some fun with audio.

Goldilocks Analogue Prototype 3

Some of the initial tests I’ve built include some 8 bit algorithmic music and, using two Goldilocks Analogue prototype devices, a digital walkie talkie using Xbee radios. They were fun, but don’t really demonstrate the full range of the audio capabilities of the platform.

It seemed appropriate to build a synthesizer using the Goldilocks Analogue as the platform, and a Gameduino 2 shield incorporating a FDTI FT800 EVE GPU, and see how close I could get to a musical outcome.

Research

Before randomly building something that made a bunch of squeaky sounds, I thought the best thing to do is to learn something about the field of analogue synthesizers and synthesizing audio.

I also obtained some simple analogue synthesizers from Korg to see exactly what they produce, so I could copy them. Some people write that this monotron analogue synthesizer family are good examples of a low cost musical instrument. I found it very interesting to examine the wave forms produced by the various settings.

Using the features of the two Korg devices, I was able to define the goal for the synthesizer that I wanted to build using the Goldilocks Analogue.

The Korg monotron DUO has two voltage controlled oscillators (VCO1 and VCO2), which produce square waves. The VCO1 has a pitch setting, which defines the basic frequency at which the ribbon keyboard operates. The ribbon keyboard can be set to have a major scale, a minor scale, a full chromatic scale, or be a ribbon with no set notes. For clarity, the pitch on the DUO is analogue, so there is no guarantee that the notes generated by the ribbon keyboard will be in tune.

The VCO2 pitch can be modified either below or above the pitch of the VCO1. In its middle section, with some care, it can be matched exactly to the VCO1 setting. The switch allows either just the VCO1 or both VCO1 and VCO2 to produce sound. A separate XMOD intensity knob allows the VCO2 to modulate the frequency of the VCO1 oscillator, producing cross-modulation.

The monotron DUO contains the famous Korg MS-20 resonant low pass filter, which can be adjusted for both cut-off frequency and intensity of the resonant frequency. Setting the filter values allows the square wave noise generated by the two oscillators to be shaped into very interesting tones.

The Korg monotron DELAY is a very different device from the DUO. It has two oscillators, but only one at audio frequencies. The audio oscillator produces a saw-tooth wave at a frequency controlled by the ribbon keyboard. On the monotron DELAY there is no capability for playing specific notes as the keyboard is only available in ribbon mode. The second oscillator of the monotron DELAY is a low frequency oscillator (LFO), which can be adjusted from 1Hz up to about 30Hz. This LFO can produce either a triangle wave or a square wave to modulate the main audio oscillator. This is used mainly to apply vibrato to musical tones, or to produce very unusual tone ramps. The intensity and pitch of the LFO are controlled by knobs.

The Korg low pass filter present in the monotron DELAY is only adjustable for its cutoff frequency, so it is less flexible and interesting than the monotron DUO implementation.

The monotron DELAY is really built to showcase the analogue space delay functionality, which can be adjusted in both length of delay, and in intensity of feedback. With about 1 second of delay and 100% or more feedback possible, very short sequences of notes can be played and then built upon.

I’m not particularly musical, but I spent some very pleasant hours playing with the two Korg synthesizers experimenting with the sounds available from their very simple platforms, and used their capabilities to guide me in what to build into my Goldilocks Analogue synthesizer.

The next piece of research was to understand how to generate analogue wave forms using direct digital synthesis, and then how to modify sound of the wave forms using convolution or modulation in the time domain.

Design Specification

Having the two Korg devices as an inspiration, and reading about the original Moog synthesizer capabilities from the 1970’s, made the specification pretty straight forward.

Goldilocks Analogue GUI

The Goldilocks Analogue synthesizer has three oscillators, two of which operate at audio frequencies, being VCO1 and VCO2, and one low frequency oscillator, being LFO. The VCO1 is tuned in octaves at correct concert pitch, so that notes played would be at the right frequency. The VCO2 is pitched relative to the VCO1 pitch, and would range minus one octave to plus one octave (or half the VCO1 frequency to double the VCO1 frequency). The LFO is adjustable over the range from 1 Hz to 40 Hz.

I had decided to let each oscillator take one of two wave forms. For VCO1 I initially chose square wave, and saw tooth wave, to be able to replicate the exact sound of the Korg devices. I’ve since decided to move the saw tooth wave to the VCO2, and replaced it with a sine wave on VCO1. It is good to have the pure tone at the correct frequency for tuning instruments. An A4 from the Goldilocks Analogue Synthesizer will, for example, always be 440Hz.

For VCO2 I selected a triangle wave and a saw tooth wave. And, for the LFO there is a sine wave and a triangle wave available. I should point out that changing the wave form available to each oscillator is no more complicated that replacing the look-up table associated with the setting, and there is space available in the ATmega1284p to store at least another 4 separate wave form tables in flash memory, even without extending to on-board SPI EEPROM, or uSD storage.

In the mixing section the intensity or volume of each of VCO1 and VCO2 can be set. It is possible to turn off either oscillator. The intensity of the LFO effect is controlled too. The LFO modulates both the VCO1 and the VCO2. The final input is the cross modulation of VCO1 by the VCO2. Very interesting tonality is created by modulating VCO1 by pitches very close to its own frequency.

The mixed signal would then be sent to the voltage controlled filter. Using the current set up, the sample rate is 16,000 Hz, which is enough to produce 6 octaves. I have implemented a Biquad IIR filter.

Following the filter stage, the signal enters the space delay stage. The space delay stage can have only half a second of delay, because of the RAM limitations (16kByte) of the ATmega1284p. So up to 6144 16 bit samples are supported by the space delay function. Samples are recovered from the delay buffer, and mixed with the new signals, then injected back into the delay loop. This creates an infinite loop of samples, depending on the amount of feedback set by the FEEDBACK control.

The final signal output level is controlled by a MASTER volume control. Additionally, a STO and RCL capability for settings has been implemented. Only the most recent settings are stored, which can be recalled when power is restored.

As the keyboard notes are generated using a look up table, multiple keyboard tuning options are possible. I have implemented Concert Tuning (A4 = 440Hz) and Equal Temperament (commonly used for pianos), and Verdi or Stradivari tuning (C4 = 256Hz) with Just Intonation Equal Fifths as an alternative. There is a toggle to chose between these two options.

GUI Implementation

The GUI of the solution depends on a Gameduino 2 screen, which is based on the FTDI Chip FT800 EVE GPU device. The FT800 was the first EVE GPU available from FTDI and it can only support single touch. This limitation makes it only partially useful as a product to support this application. The most interesting sounds are generated by bending the controls whilst playing the notes. Fortunately there are newer EVE GPU devices that support multi-touch and they would make a better platform if this synthesizer were to become more than just a demonstration.

The GUI makes extensive use of FT800 co-processor widget capabilities being dials, toggles, keys, and text. Some examples below.

// text
FT_GPU_CoCmd_Text_P(phost, 300,  8, 27, OPT_CENTER, PSTR("VCF"));
FT_GPU_CoCmd_Text_P(phost, 300, 25, 26, OPT_CENTER, PSTR("CUTOFF"));
FT_GPU_CoCmd_Text_P(phost, 300, 95, 26, OPT_CENTER, PSTR("PEAK"));

// toggles
FT_API_Write_CoCmd(TAG(LFO_WAVE));
FT_GPU_CoCmd_Toggle_P(phost, 13,242,46,18, OPT_3D, synth.lfo.wave, PSTR("SIN" "\xFF" "TRI"));

FT_API_Write_CoCmd(TAG(KBD_TOGGLE));
FT_GPU_CoCmd_Toggle_P(phost, 405,130,60,26, OPT_3D, synth.kbd_toggle, PSTR("CONCRT" "\xFF" "VERDI"));

// dials
FT_API_Write_CoCmd(TAG(DELAY_FEEDBACK));
FT_GPU_CoCmd_Dial(phost, 365,125,20, OPT_3D, synth.delay_feedback); // DELAY FEEDBACK

FT_API_Write_CoCmd(TAG(MASTER));
FT_GPU_CoCmd_Dial(phost, 440,55,26, OPT_3D, synth.master); // MASTER

The integrated touch tracking capability makes it very easy to parse touch into specific commands.

readTag = FT_GPU_HAL_Rd8(phost, REG_TOUCH_TAG);

if (readTag > 0x80)// tag is greater than 0x80 and therefore is a dial.
{
	TrackRegisterVal.u32 = FT_GPU_HAL_Rd32(phost, REG_TRACKER);

	switch (TrackRegisterVal.touch.tag)
	{
	case (VCO1_PITCH):
		synth.vco1.pitch = TrackRegisterVal.touch.value & 0xe000;
		break;
	// continues...
	}

This integrated touch tracking capability can return which dial (slider / scroll bar) has been touched, and the relative position of the touch. This same position value can then be used in the display command to set the position of the dial (slider / scroll bar), providing direct feedback on the GUI.

The main GUI task simply calls the touch function, and if there is a touch recorded the GUI is updated, and the revised settings entered into the analogue audio control structure. Otherwise if there are no touches recorded there are no processor cycles wasted updating the display. The FT800 EVE GPU continues to display the same content until a new display list is loaded into the GPU memory.

When a keyboard touch is recorded, the tone generation information is updated, and this then directly impacts the output tone generated by the audio section.

//  setting the phase increment for VCO1 is frequency * LUT size / sample rate.
//  << 1 in SAMPLE_RATE is residual scale to create 24.8 fixed point number.
// The LUT is already pre-scaled << 7 in the calculation.
// The LUT can't be pre-scaled to << 8 because this creates numbers too large for uint32_t to hold,
// and we want to allow the option to vary the SAMPLE_RATE at compilation time, so it has to stay in the calculation.
synth.vco1.phase_increment = (uint32_t)pgm_read_dword(synth.note_table_ptr + stop * NOTES + note) / (SAMPLE_RATE >> 1);

// set the VCO2 phase increment to be -1 octave to +1 octave from VCO1, with centre dial frequency identical.
if (synth.vco2.pitch & 0x8000) // upper half dial
	synth.vco2.phase_increment = (synth.vco1.phase_increment * ((uint32_t)synth.vco2.pitch << 1)) >> 16 ;
else // lower half dial
	synth.vco2.phase_increment = (synth.vco1.phase_increment >> 1) + (( (synth.vco1.phase_increment >> 1) * ((uint32_t)synth.vco2.pitch << 1) ) >> 16 );

// set the LFO phase increment to be from 0 Hz to 32 Hz.
synth.lfo.phase_increment = ((uint32_t)synth.lfo.pitch * LUT_SIZE / ((uint32_t)SAMPLE_RATE << 4) );

The phase increment desired, respective to the relevant tone desired, is read from a look up table containing 8 octaves each of 12 notes for VCO1. VCO2 phase increment is then set as a proportion of VCO1. And LFO phase increment is set to range from 0 to around 30 Hz. With this information, and the selected wave form look up table, the audio implementation can do its thing.

Audio Implementation

The synthesizer audio section is implemented in one function, that is executed each time a new sample is generated. This means at 12,000Hz sample generation frequency, we have 83 micro seconds to generate the final sample to be pushed to the Goldilocks Analogue MCP4822 12 bit dual channel DAC.

The current sample generation routine takes under 45 micro seconds to complete with 3 Oscillators running, so there is a little head room still available. Or with some further coding improvements it will be possible to use 16,000 Hz as the sample generation frequency. The below logic trace shows the main SPI interface (SCK, MISO, MOSI, _SS) delivering commands to the EVE GPU, and the lower MSPI interface (MSPI SCK, MSPI MOSI, MSPI PING) providing the calculated samples, every 83 micro seconds, to the DAC.

Goldilocks Analogue Synthesizer, with 3 Oscillators operating.

Goldilocks Analogue Synthesizer, with 3 Oscillators operating.

It is clear to see that two EVE GPU transactions are being interrupted by the DAC output, but because the main SPI interface is not changing state the transaction is faultlessly resumed once the DAC interrupt is completed.

In contrast, when there are no oscillators running because no key is pressed, the sample generation routine takes just 28 micro seconds to complete. The logic trace below shows the change of state from 0 to 3 oscillators.

Goldilocks Analogue, with no Oscillators operating.

Goldilocks Analogue, with no Oscillators operating.

There is little time available to calculate sample values in real time, so all of the samples are pre-calculated and are stored in look-up tables (LUT). Each LUT contains 4096 16 bit samples, which gives 12 significant bits of accuracy for the values. I chose 4096 samples because the ATmega1284p has sufficient storage to support multiple tables of this size in its flash memory. Smaller LUTs would sacrifice accuracy, and larger LUTs would compromise on the number of available wave forms.

I have prepared LUTs for sine wave, square wave, triangle wave, and saw tooth wave options. Another advantage of the LUT approach is that better bandwidth optimised LUT values can be substituted without changing the code. Also, LUTs allow completely arbitrary waveforms could be used if desired to obtain specific timbre or nuances of sound.

The sample generation code starts with the LFO oscillator using a direct digital synthesis model. Each oscillator sample is calculated identically by stepping through the LUT with a phase increment based on the frequency of the note required, but VCO2 phase increment is modified by the LFO output and the VCO1 phase increment is modified by both VCO2 and LFO outputs.

Code shown here assumes that both LFO and VCO2 output wave forms have already been calculated.

///////////// Now do the VCO1 ////////////////////

// This will be modulated by the VCO2 value (depending on the XMOD intensity),
// and the LFO intensity.
if( synth.vco1.toggle )
{
	// Increment the phase (index into waveform LUT) by the calculated phase increment.
	// Both the phase and phase_increment are stored as 24.8 in uint32_t.
	// The fractional component of the phase and phase_increment is needed to ensure the wave
	// is tracked accurately.
	synth.vco1.phase += synth.vco1.phase_increment;

	// calculate how much the LFO affects the VCO1 phase increment
	if (synth.lfo.toggle)
	{
		// increment the phase (index into LUT) by the calculated phase increment including the LFO output.
		synth.vco1.phase += (uint32_t)outLFO; // increment on the fractional component 8.8, limiting the effect.
	}

	// calculate how much the VCO2 XMOD affects the VCO1 phase increment
	if (synth.vco2.toggle)
	{
		// increment the phase (index into LUT) by the calculated phase increment including the LFO output.
		synth.vco1.phase += (uint32_t)outXMOD; // increment on the fractional component 8.8, limiting the effect.
	}

	// if we've gone over the waveform LUT boundary -> loop back
	synth.vco1.phase &= 0x000fffff; // this is a faster way doing the table
						// wrap around, which is possible
						// because our table is a multiple of 2^n.
						// Remember the lowest byte (0xff) is fractions of LUT steps.
						// The table is 0xfff.ff bytes long.

	currentPhase = (uint16_t)(synth.vco1.phase >> 8); // remove the fractional phase component.

	// get first sample from the defined LUT for VCO1 and store it in temp1
	temp1 = pgm_read_word(synth.vco1.wave_table_ptr + currentPhase);
	++currentPhase; // go to next sample

	currentPhase &= 0x0fff;	// check if we've gone over the boundary.
				// we can do this because it is a multiple of 2^n.

	// get second sample from the LUT for VCO1 and put it in temp2
	temp2 = pgm_read_word(synth.vco1.wave_table_ptr + currentPhase);

	// interpolate between samples
	// multiply each sample by the fractional distance
	// to the actual location value
	frac = (uint8_t)(synth.vco1.phase & 0x000000ff); // fetch the lower 8bits

	// the optimised assembly code Multiply routines come from Open Music Labs.
	MultiSU16X8toH16Round(temp3, temp2, frac);

	// scaled sample 2 is now in temp3, and since we are done with
	// temp2, we can reuse it for the next result
	MultiSU16X8toH16Round(temp2, temp1, 0xff - frac);
	// temp2 now has the scaled sample 1
	temp2 += temp3; // add samples together to get an average
	// our resultant wave is now in temp2

	// set amplitude with volume
	// multiply our wave by the volume value
	MultiSU16X16toH16Round(outVCO1, temp2, synth.vco1.volume);
	// our VCO1 wave is now in outVCO1
}
else // if there is no note being played, then shift the output value towards mute.
{
	outVCO1 >>= 1;
}

The next piece of the audio process is to mix the two oscillators VCO1 and VCO2, and then calculate the space delay required. This is where the resonant low pass filter is implemented.

////////////// mix the two oscillators //////////////////
// irrespective of whether a note is playing or not.
// combine the outputs
temp1 = (outVCO1 >> 1) + (outVCO2 >> 1);

///////// Resonant Low Pass Filter here  ///////////////
IIRFilter( &filter, &temp1);

///////// Do the space delay function ///////////////////

// Get the number of buffer items we have, which is the delay.
MultiU16X16toH16Round( buffCount, (uint16_t)(sizeof(int16_t) * DELAY_BUFFER), synth.delay_time);

// Get a sample back from the delay buffer, some time later,
if( ringBuffer_GetCount(&delayBuffer) >= buffCount )
{
	temp0.u8[1] = ringBuffer_Pop(&delayBuffer);
	temp0.u8[0] = ringBuffer_Pop(&delayBuffer);
}
else // or else wait until we have samples available.
{
	temp0.u16 = 0x0000;
}

if (synth.delay_time) // If the delay time is set to be non zero,
{
	// do the space delay function, irrespective of whether a note is playing or not,
	// and combine the output sample with the delayed sample.
	temp1 += temp0.u16;

	// multiply our sample by the feedback value
	MultiSU16X16toH16Round(temp0.u16, temp1, synth.delay_feedback);
}
else
	ringBuffer_Flush(&delayBuffer);	// otherwise flush the buffer if the delay is set to zero.

// and push it into the delay buffer if buffer space is available
if( ringBuffer_GetCount(&delayBuffer) <= buffCount )
{
	ringBuffer_Poke(&delayBuffer, temp0.u8[1]);
	ringBuffer_Poke(&delayBuffer, temp0.u8[0]);
}
// else drop the space delay sample (probably because the delay has been reduced).


////////////// Finally, set the output volume //////////////////
// multiply our wave by the volume value
MultiSU16X16toH16Round(temp2, temp1, synth.master);

// and output wave on both A & B channel, shifted to (+)ve values only because this is what the DAC needs.
*ch_A = *ch_B = temp2 + 0x8000;

This generates the required output waveforms that make the Goldilocks Analogue Synthesiser work.

The second order Biquad IIR filter code has been implemented in a general way, enabling multiple filters to be applied to the sample train. Set up for Low Pass, Band Pass, and for High Pass have been implemented. The coefficients and state variables for each filter are maintained in a structure.

//========================================================
// second order IIR -- "Direct Form I Transposed"
//  a(0)*y(n) = b(0)*x(n) + b(1)*x(n-1) +  b(2)*x(n-2)
//                   - a(1)*y(n-1) -  a(2)*y(n-2)
// assumes a(0) = IIRSCALEFACTOR = 32 (to increase calculation accuracy).

// http://en.wikipedia.org/wiki/Digital_biquad_filter
// https://www.hackster.io/bruceland/dsp-on-8-bit-microcontroller
// http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt

typedef struct {
	uint16_t sample_rate;	// sample rate in Hz
	uint16_t cutoff;	// normalised cutoff frequency, 0-65536. maximum is sample_rate/2
	uint16_t peak;		// normalised Q factor, 0-65536. maximum is Q_MAXIMUM
	int16_t b0,b1,b2,a1,a2;	// Coefficients in 8.8 format
	int16_t xn_1, xn_2;	//IIR state variables
	int16_t yn_1, yn_2;	//IIR state variables
} filter_t;

void setIIRFilterLPF( filter_t *filter ) // Low Pass Filter Setting
{
	if ( !(filter->sample_rate) )
		filter->sample_rate = SAMPLE_RATE;

	if ( !(filter->cutoff) )
		filter->cutoff = UINT16_MAX >> 1; // 1/4 of sample rate = filter->sample_rate>>2

	if ( !(filter->peak) )
		filter->peak =  (uint16_t)(M_SQRT1_2 * UINT16_MAX / Q_MAXIMUM); // 1/sqrt(2) effectively

	double frequency = ((double)filter->cutoff * (filter->sample_rate>>1)) / UINT16_MAX;
	double q = (double)filter->peak * Q_MAXIMUM / UINT16_MAX;
	double w0 = (2.0 * M_PI * frequency) / filter->sample_rate;
	double sinW0 = sin(w0);
	double cosW0 = cos(w0);
	double alpha = sinW0 / (q * 2.0f);
	double scale = IIRSCALEFACTOR / (1 + alpha); // a0 = 1 + alpha

	filter->b0	= \
	filter->b2	= float2int( ((1.0 - cosW0) / 2.0) * scale );
	filter->b1	= float2int(  (1.0 - cosW0) * scale );

	filter->a1	= float2int( (-2.0 * cosW0) * scale );
	filter->a2	= float2int( (1.0 - alpha) * scale );
}

// interim values in 24.8 format
// returns y(n) in place of x(n)
void IIRFilter( filter_t *filter, int16_t * xn )
{
	int32_t yn;	// current output
	int32_t  accum;	// temporary accumulator

	// sum the 5 terms of the biquad IIR filter
	// and update the state variables
	// as soon as possible
	MultiS16X16to32(yn,filter->xn_2,filter->b2);
	filter->xn_2 = filter->xn_1;

	MultiS16X16to32(accum,filter->xn_1,filter->b1);
	yn += accum;
	filter->xn_1 = *xn;

	MultiS16X16to32(accum,*xn,filter->b0);
	yn += accum;

	MultiS16X16to32(accum,filter->yn_2,filter->a2);
	yn -= accum;
	filter->yn_2 = filter->yn_1;

	MultiS16X16to32(accum,filter->yn_1,filter->a1);
	yn -= accum;

	filter->yn_1 = yn >> (IIRSCALEFACTORSHIFT + 8); // divide by a(0) = 32 & shift to 16.0 bit outcome from 24.8 interim steps

	*xn = filter->yn_1; // being 16 bit yn, so that's what we return.
}

Hardware Implementation

The Goldilocks Analogue Prototype 3 is working very well, and it has resolved some of the issues of the second prototype. Using the USART1 MSPIM mode to drive the MCP4822 DAC allows the GUI to use the SPI bus for the Gameduino 2 GUI without conflicts. This is the only way that the rigorous timing for audio output can be maintained, given the heavy SPI usage required to drive the GPU co-processor.

Goldilocks Analogue - Prototype 3

The Atmel AVR ATmega1284p in the Goldilocks Analogue Prototype 3 is running at 24.576MHz. This is significantly above the specification (20MHz at 5V), but remembering that the specification for AVR ATmega devices covers an extended temperature range (that would kill a human) and it is unlikely that the Goldilocks Analogue would be used in extreme temperature situations, I’ve had no problems with this processor frequency to date.

There are two reasons for over-clocking the ATmega1284p. The first is that it is simply not possible to make the required calculations within the time budget available at the maximum specification CPU frequency of 20MHz or even more extreme at the standard Arduino rate of 16MHz.

The second reason is related to the generation of exact audio sampling frequencies. With a CPU clock of 24.576MHz, the 8 bit timer with pre-scaling can generate EXACT audio sample timing at 8kHz, 12kHz, 16kHz, 32kHz, and 48kHz. Using a 16 bit timer, we can also generate very close approximations to 44.1kHz, if required.

The routine to transfer samples does not need to consume precious 16 bit timer resources, which are useful to produce PWM for motor control. Retaining the capability to manage two motors (using the two 16 bit timers) is fairly important outcome.

The interrupt for generating the wave forms does only two things; write the sample values to the DAC, and then calculate the new sample value for the next sample time. The samples are written to the DAC first to ensure that the output is not jittered by the possibility of variable processing time in the audio handler routine. This can happen if (for example) one of the VCO is turned off, removing the sample calculation code from the code execution path.

ISR(TIMER0_COMPA_vect) __attribute__ ((hot, flatten));
ISR(TIMER0_COMPA_vect)
{
	// MCP4822 data transfer routine
	// move data to the MCP4822 - done first for regularity (reduced jitter).
	// &'s are necessary on data_in variables
	DAC_out (ch_A_ptr, ch_B_ptr);

	// audio processing routine - do whatever processing on input is required - prepare output for next sample.
	// Fire the global audio handler, if set.
	if (audioHandler!=NULL)
		audioHandler(ch_A_ptr, ch_B_ptr);
}

XBee Walkie Talkie

I’m building an advanced Arduino clone based on the AVR ATmega1284p MCU with some special features including a 12 bit DAC MCP4822, headphone amplifier, 2x SPI Memory (SRAM, EEPROM), and a SD Card. There are many real world applications for analogue outputs, but because the Arduino platform doesn’t have integrated DAC capability there are very few published applications for analogue signals. A Walkie Talkie is one example of using digital and analogue together to make a simple but very useful project.

Two Goldilocks Analogue prototypes with XBee radios, and Microphone amplifiers.

Two Goldilocks Analogue prototypes with XBee radios, and Microphone amplifiers.

The actual Walkie Talkie functionality is really only a few lines of code, but it is built on a foundation of analogue input (sampling), analogue output on the SPI bus to the MCP4822 DAC, sample timing routines, and the XBee digital radio platform. Let’s start from the top and then dig down through the layers.

XBee Radio

I am using XBee Pro S2B radios, configured to communicate point to point. For the XBee Pro there needs to be one radio configured as the Coordinator, and the other as a Router. There are configuration guides on the Internet.

I have configured the radios to wait the maximum inter-character time before sending a packet, which implies that the packets will be set only when full (84 bytes). This maximises the radio throughput. Raw throughput is 250 kbit/s, but the actual user data rate is limited to about 32 kbit/s. This has an impact on the sampling rate and therefore quality of speech that can be transmitted.

Using 8 bit samples, I have found that about 3 kHz sampling generates about as much data as can be transmitted without compression. I’m leaving compression for another project.

The XBee radios are configured in AT mode, which acts as a transparent serial pipe between the two endpoints. This is the simplest way to connect two devices via digital radio. And it allowed me to do simple testing, using wire, before worrying about whether the radio platform was working or not.

XBee Packet Reception in Purple

XBee Packet Reception in Purple

Looking at the tracing of a logic analyser, we can see the XBee data packets arriving on the (purple) Rx line of the serial port. The received packet data is stored into a ring buffer, and played out at a constant rate. I have allowed up to 255 bytes in the receive ring buffer, and this will be sufficient because the XBee packet size is 84 bytes.

The samples to be transmitted to the other device are transmitted on the (blue) Tx line, more or less in each sample period even though they are buffered before transmission. The XBee radio buffers these bytes for up to 0xFF inter-symbol periods (configuration), and only transmits a packet to the other endpoint when it has a full packet.

Sampling Rate

Looking at the bit budget for the transmission link, we need to calculate how much data can be transmitted without overloading the XBee radio platform, and causing sample loss. As we are not overtly compressing the voice samples, we have 8 bit samples times 3,000 Hz sampling or 24 kbit/s to transmit. This seems to work pretty well. I have tried 4 kHz sampling, but this is too close to the theoretical maximum, and doesn’t work too effectively.

Sample rate of 3,000 Hz seems to be the optimum.

Sample rate of 3,000 Hz seems to be the optimum.

Looking at the logic analyser, we can see the arrival of a packet of bytes commencing with 0x7E and 0x7C on the Rx line. Both the Microphone amplifier and the DAC output are biased around 0x7F(FF), so we can read that the signal levels captured and transmitted here are very low. The sample rate shown is 3,000 Hz.

Sample Processing

I have put a “ping” on one output to capture when the sampling interrupt is being processed (yellow). We can see that the amount of time spent in the interrupt processing is very small for this application, relative to the total time available. Possibly some kind of data compression could be implemented.

DAC and sample processing

DAC and sample processing

During the sampling interrupt, there are two major activities, generating an audio output, by placing a sample onto the DAC, and then reading the ADC to capture an audio sample and transmit it to the USART buffer.

This is done by the audioCodec_dsp function, which is called from the code in a timer interrupt.

void audioCodec_dsp( uint16_t * ch_A, uint16_t * ch_B)
  {
  /*----- Audio Rx -----*/
  if ( xSerialGetChar( &xSerialPort, &mod7_value.u8[1] ) ) // receive the most significant 8 bits of the sample from the Rx ring buffer.
  {
    mod7_value.u8[0] = 0x00; // and pad out the least significant 8 bits with null.
  }
  else
  {
    mod7_value.u16 = 0x7FFF; // or put nulled signal on the output
  }
  *ch_A = *ch_B = mod7_value.u16; // write the sample out on A & B channel.

  /*----- Audio Tx -----*/
  AudioCodec_ADC( &mod7_value.u16 ); // get 10 bits of sample from the ADC with reference 2.56V Maximum.
  xSerialPutChar( &xSerialPort, mod7_value.u8[1]); // transmit just the most significant 8 bits of the sample to the Tx buffer.
}

I am using the AVR 8 bit Timer0 to generate the regular sample intervals, by triggering an interrupt. By using a MCU FCPU frequency which is a binary multiple of the standard audio frequencies, we can generate accurate reproduction sampling rates by using only the 8 bit timer with a clock prescaler of 64. To generate odd audio frequencies, like 44,100 Hz, the 16 bit Timer1 can be used to get sufficient accuracy without requiring a clock prescaler.

The ATmega1284p ADC is set to free-run mode, and is scaled down to 192 kHz. While this is close to the maximum acquisition speed documented for the ATmega ADC, it is still within the specification for 8 bit samples.

ISR(TIMER0_COMPA_vect) __attribute__ ((hot, flatten));
ISR(TIMER0_COMPA_vect)
{
#if defined(DEBUG_PING)
  // start mark - check for start of interrupt - for debugging only (yellow trace)
  PORTD |= _BV(PORTD7); // Ping IO line.
#endif

  // MCP4822 data transfer routine
  // move data to the MCP4822 - done first for regularity (reduced jitter).
  DAC_out (ch_A_ptr, ch_B_ptr);

  // audio processing routine - do whatever processing on input is required - prepare output for next sample.
  // Fire the global audio handler which is a call-back function, if set.
  if (audioHandler!=NULL)
    audioHandler(ch_A_ptr, ch_B_ptr);

#if defined(DEBUG_PING)
  // end mark - check for end of interrupt - for debugging only (yellow trace)
  PORTD &= ~_BV(PORTD7);
#endif
}

This interrupt takes 14 us to complete, and is very short relative to the 333 us we have for each sample period. This gives us plenty of time to do other processing, such as running a user interface or further audio processing.

SPI Transaction

At the final level of detail, we can see the actual SPI transaction to output the incoming sample to the MCP4822 DAC.

SPI DAC Transaction

SPI MCP4822 DAC Transaction

As I have built this application on the Goldilocks Analogue Prototype 2 which uses the standard SPI bus, the transaction is normal. My later prototypes are using the Master SPI Mode on USART 1 of the ATmega1284p, which slightly accelerates the SPI transaction through double buffering, and frees the normal SPI bus for simultaneous reading or writing to the SD Card or SPI Memory, for audio streaming. In the Walkie Talkie application there is no need to capture the audio, so there’s no down side to using the older prototypes and the normal SPI bus.


void DAC_out(const uint16_t * ch_A, const uint16_t * ch_B)
{
  DAC_command_t write;

  if (ch_A != NULL)
  {
    write.value.u16 = (*ch_A) >> 4;
    write.value.u8[1] |= CH_A_OUT;
  }
  else // ch_A is NULL so we turn off the DAC
  {
    write.value.u8[1] = CH_A_OFF;
  }

  SPI_PORT_SS_DAC &= ~SPI_BIT_SS_DAC; // Pull SS low to select the Goldilocks Analogue DAC.
  SPDR = write.value.u8[1]; // Begin transmission ch_A.
  while ( !(SPSR & _BV(SPIF)) );
  SPDR = write.value.u8[0]; // Continue transmission ch_A.

  if (ch_B != NULL) // start processing ch_B while we're doing the ch_A transmission
  {
    write.value.u16 = (*ch_B) >> 4;
    write.value.u8[1] |= CH_B_OUT;
  }
  else // ch_B is NULL so we turn off the DAC
  {
    write.value.u8[1] = CH_B_OFF;
  }

  while ( !(SPSR & _BV(SPIF)) ); // check we've finished ch_A.
  SPI_PORT_SS_DAC |= SPI_BIT_SS_DAC; // Pull SS high to deselect the Goldilocks Analogue DAC, and latch value into DAC.

  SPI_PORT_SS_DAC &= ~SPI_BIT_SS_DAC; // Pull SS low to select the Goldilocks Analogue DAC.
  SPDR = write.value.u8[1]; // Begin transmission ch_B.
  while ( !(SPSR & _BV(SPIF)) );
  SPDR = write.value.u8[0]; // Continue transmission ch_B.
  while ( !(SPSR & _BV(SPIF)) ); // check we've finished ch_B.
  SPI_PORT_SS_DAC |= SPI_BIT_SS_DAC; // Pull SS high to deselect the Goldilocks Analogue DAC, and latch value into DAC.
}

Wrap Up

Using a few pre-existing tools and a few lines of code, it is possible to quickly build a digitally encrypted walkie talkie, capable of communicating (understandable, but not high quality) voice. And, there ain’t no CB truckers going to be listening in on the family conversations going forward.

This was a test of adding microphone input based on the MAX9814 to the Goldilocks Analogue. I will be revising the Prototype 3 and will add in a microphone amplification circuit to support applications needing audio input, like this walkie talkie example, or voice changers, or vocal control music synthesizers.

I’m also running the ATmega1284p devices at the increased frequency of 24.576 MHz, over the standard rate of 20 MHz. This specific frequency allows very precise reproduction of audio samples from 48 kHz down right down to 4 kHz (or even down to 1,500 Hz). The extra MCU clock cycles per sample period are very welcome when it comes to generating synthesised music.

Code as usual on Sourceforge AVR freeRTOS Also, a call out to Shuyang at SeeedStudio who’s OPL is awesome, and is the source of many components and PCBs.

Implementing NASA EEFS on AVR ATmega

I am building a variant of the Arduino platform which will have an analogue output capability in the form of a dual channel DAC, called Goldilocks Analogue. The DAC can be used to generate variable DC voltage levels that might be used as part of a PID control system, and it can also generate AC voltages up to about 50kHz if it can be fed with sufficient samples to produce the required signal. To generate a 44.1kHz audio signal the DAC has to receive a stream of data, with a new sample every 22us without fail.

44.1kHz samples using USART MSPI output.

44.1kHz samples using USART MSPI output.

Finding an answer to the question of how to reliably stream data to the DAC is the background to this post.

Looking for a way to structure and assemble a combination of many WAV files on a host PC for storage onto to the AVR ATmega MCU, I needed a system that would support:

  • Editing and assembly of files on a host PC (Linux, Windows, Mac), in to a package.
  • Transferring a package of files to the AVR ATmega (Arduino) device very simply.
  • Can read and write files to the storage medium very quickly, and without jitter.
  • Simple implementation in the avr-libc environment.

Initially I was looking at using the FAT File System on a SD Card to provide the required capability, but I found that SD Cards are quite slow when writing data to their FLASH medium. Often taking 100ms or more to complete a write cycle. A SD Card read cycle also takes quite a long time, when the FAT file system must be inspected prior to reading or writing a specific block of information. The SD Card is great for storing Mega Bytes of information, but is not optimal for jitter free read and write applications.

So I started looking at chip storage based on the SPI bus as a mechanism to store large numbers of samples for playback, or to store large amounts of acquired data samples. There are many alternatives using different technologies for SPI storage devices. These range from EEPROM storage, through to SRAM and also newer FRAM technologies. Storage capabilities with up to 1Mbit seem to be quite good value. For my application 1Mbit of storage would allow about 16 seconds of reasonable quality audio to be retrieved with minimal issues for complexity, jitter, and delay.

So I redesigned the Goldilocks Analogue to incorporate space to have two SPI memory (EEPROM, FRAM, SRAM) devices on the board.

Goldilocks Analogue - 2x SPI Memory Devices

Goldilocks Analogue – 2x SPI Memory Devices

 

Goldilocks Analogue

Goldilocks Analogue

Implementing a method to read and write bytes to these storage devices is very straightforward. There are many libraries available supporting the SPI storage devices of various types. But none of them supported assembling a package of files on a host PC, and then transferring this to the AVR device in a simple manner. So the hunt for a solution to this issue brought me to the NASA EEFS solution.

NASA EEFS

NASA has been releasing their Core Flight System with Open Source licencing over the past few years. The Core Flight System (CFS) is a recognition that many satellite and deep space missions have very common core requirements and that successive missions were simply cloning previous mission software and then owning changes going forward, with learning being improved in a serial manner. The CFS enabled missions that were developing in parallel to push improvements in the platform CFS code back into the general solution for peer and successive missions to benefit from.

The CFS is layered and each layer hides its implementation, enabling the internals of the layer to be changed without affecting other layers’ implementation. Within the CFS Platform Abstraction Layer there is a module designed to support the management of flight software packages on non-volatile storage, called the EEPROM File System (EEFS).

The EEFS is a very small (approximately 2% of the flight software) piece of code that implements the storage and retrieval of all flight system software from flash storage devices. It was designed by NASA GSFC to support similar outcomes as what I needed for my application:

  • Generate a flight software (or general embedded system) executable image on the development workstation. This feature allows the embedded file system to be generated with a known CRC and loaded on to the target processor as a single image. This is a big advantage over formatting a file system on the image, then transferring each file to the file system on the target.
  • Prove that the file system is correct and reliable. Because the EEPROM file system is simple, the code size is small, making it easy to review and find errors.
  • Patch the files in the file system. Due to the simple layout of the EEPROM file system, it is very easy to patch the files in the file system, if the need arises. This can be helpful in deeply embedded systems such as satellite data systems.
  • Dump and understand the file system format. Because the EEPROM file system is simple, it is easy to dump the contents of the EEPROM or PROM memory and determine the contents of each file.

The EEFS is basically a configurable slot-based file system. The file system can be pre-configured with a certain number empty files of known sizes, or known files with specific “spare bytes”, and written with a CRC into an image. The File Allocation Table is a fixed size and contains a fixed number of file slots, together with the location and maximum size for each slot. The File Headers for each slot contain all the information about each File. Changing a file does not impact the FAT, and therefore does not affect other files in the File System.

An EEFS image is created with a tool called geneepromfs, which is a command line tool compiled for the respective host upon which it is used. It reads an input file specifying the files that are to be assembled into the EEFS image, together with the number of empty file slots and their size, and it outputs a complete EEFS image ready to be burnt on the EEPROM, FRAM, or SRAM storage device.

So the EEFS looks like a perfect solution to my requirements. Let’s go to Github and clone the EEFS repository, and get started.

AVR Implementation of EEFS

The EEFS code is supplied for VxWorks or RTEMS platforms, along with a standalone implementation design for bare metal designs. To get the standalone design to work with the AVR ATmega, and my freeRTOS platform of choice, there were two major pieces of work.

Firstly, to develop a generalised SPI interface layer that would allow me to select the actual SPI device installed on the Goldilocks Analogue at compile time. This was necessary because each individual SPI storage device has slightly different command requirements (EEPROM ready check, different address byte numbers), and it made good sense to unify the interface into a single function with compile time options.

Secondly, I needed to revise the pointer calculations inherent in the EEFS code. The NASA GSFC code is based on the availability of 32 bit pointers, and does 32 bit calculations to locate information within the file system. But, on the AVR ATmega platform the inherent pointer size is 16 bits, and many of the advanced pointer arithmetic calculations used in the code would fail.

When I finished the major work, I reduced the return values of most functions to 1 byte error codes, which shaved almost 2,000 bytes of program code off the end result. On the AVR ATmega platform, it is well worth saving 2,000 bytes.

I have built a simple FRAM test program that can write files from a SD Card to the EEFS SPI device, and then edit (read, modify, write) files on the EEFS SPI device for test purposes. This shows how the resulting EEFS library can be best used.

As usual code on Sourceforge AVRfreeRTOS, and also forked on AVR EEFS Github.

Wiznet W5500 and ioShield-A What’s old is new again!

It seems that the Wiznet W5100 Ethernet Shield has been around since the very beginning of the Arduino movement. Its integrated TCP and UDP IP stack enabling solid standardised networking since the very beginning.

The hardware implementation of BSD sockets interface abstracted the complex process of generating compliant IP and made sure that it was done correctly, and the buffering of network packets in integrated packet RAM, rather than on the host AVR micro-controller; was a great thing when you only have 1kB of RAM available as the original ATmega168 Arduino devices provided. For the current generation of Arduino devices, nothing has really changed.

Recently, I wrote about the new W5200 iteration of the Wiznet integrated IP controller, and how it is significantly better in performance and features than the older W5100 version.

Now, I have my hands on the latest version. The W5500 on an ioShield-A from Wiznet.

W5500 on ioShield-A from Wiznet

W5500 on ioShield-A from Wiznet

TL;DR. The W5500 is the latest and best iteration of hardware IP socket Ethernet devices from Wiznet, and also the easiest for hobbyists to implement. As usual, my code is here at AVRfreeRTOS.

So what are the key differences between the models, and how do they perform? I’ll try to look at three important aspects to using these devices; cost, implementation or how are they to use, and performance.

Cost

As Wiznet has iterated through the W5x00 series it has cost reduced the manufacturing significantly. The W5100 was produced in 0.18um process, as was the W5200. The new W5500 is produced in 0.13um process, with a 1.2v core, in comparison. Between the W5100 and W5200 Wiznet doubled the size of the internal packet RAM to 16kByte, but significantly reduced the number of IO pins and drivers, to make the W5200 (and W5500) SPI bus specialists.

The result of these cost reduction processes can be seen in the pricing information from Digikey. The price per 1,000 for W5100 is $4.32 each, whereas the W5500 is $2.64 each. In a commercial project, or even a significant crowd funded project, this can have a significant impact on the bill of materials.

Digikey W5100 Pricing

Digikey W5100 Pricing

Digikey W5200 Pricing

Digikey W5200 Pricing

Digikey W5500 Pricing

Digikey W5500 Pricing

Implementation

The W5500 is available in 48LQFP which is aimed squarely at low tech solutions. The W5200 was only available in 48QFN which made it more difficult to use the chip in low volume applications.  While most people will purchase the W5500 on an Arduino Shield or similar platform, having the LQFP package does make it easier for the companies producing the Shields and modules for the hobbyist.

The three Wiznet W5x00 Generations

The three Wiznet W5x00 Generations

In terms of implementation differences between the W5100 and the W5200, I’ve already written on the extensive improvements to the SPI bus interface, both in terms of outright speed, and in the protocol improvements, doubling the packet RAM to 16kBytes, and doubling the number of sockets available to 8. The W5500 takes these improvements and finesses them to get an even better result.

Wiznet have prepared a summary of the differences between W5500 and W5200. The SPI protocol for the W5500 has been simplified, omitting the frame length field. The end of transmission is simply indicated by deselecting the chip with the SPI Chip Select line. This is an obvious and simple improvement.

The packet RAM on the W5500 has been made available as general storage for the host MCU. Both Tx and Rx RAM is available for use as required. This means that it is possible to augment the RAM on an Arduino Uno by 16kBytes (8kB Tx and 8kB Rx) which is 8x more than the ATmega328p has in total, and still maintain the same sized buffers available in the W5100, for example.

The Tx and Rx RAM is arranged in blocks associated with the socket, and the entire 16 bit address space is rolled out onto the configured RAM for each socket. This means that when writing or reading the W5500 Tx and Rx RAM the user doesn’t need to be concerned with masking the maximum physical RAM, and addressing roll-over is gracefully handled. This is unlike the W5100 and W5200, where RAM addressing would have to be masked against the configured physical RAM. If this sounds complicated, just check the datasheet where it is explained in a nice diagram.

For use in the Arduino IDE environment Wiznet has prepared W5500 drivers which can simply be copied into the IDE directory structure and used as needed. For general implementations, Wiznet have prepared a new generation BSD Sockets based Socket driver which is much more flexible and better written than the previous iteration.

I’ve implemented my code based on the Wiznet transition driver, which maintains the legacy BSD Socket style interface used in W5100 and in W5200. That way I can maintain one socket.h and socket.c code base as an interface, and simply use the relevant hardware driver W5x00.h and W5x00.c as required. I was pleased that in taking this path, Internet code that I’ve written previously “just worked”. This included the hardware sockets dhcp (using IPRAW), ntp, http interfaces which work with the W5500 protocol engine, and the uIP implementation that uses the MACRAW mode inherent in all three devices.

Of note is the resolution of the errata in the ARP engine, which required off device storage of the subnet mask in some situations, which affected both W5100 and W5200. With the W5500 Wiznet have put that issue behind them. I imagine that many other issues and inefficiencies in the hardware socket engine have been redesigned and resolved in the W5500 too.

Performance

The performance improvements of the W5200 over the W5100 have been documented, and the enormous throughput improvement obtained by using the streaming SPI Interface shown.

While the W5500 does implement an improvement in the SPI interface, by removing the data length selection field, there is no noticeable improvement in throughput over the W5200 using an AVR ATmega1284p Goldilocks as the platform.

One design goal for the W5500 seems to have been to make the SPI interface much more friendly for 32 bit processors, particularly Cortex M0+ MCU with limited RAM, by packing the addressing, and control information into one 32 bit (4 x 8bits) register. It is possible to imagine that there are additional performance improvements in the SPI interface if driven close to its design maximum SCK of 80MHz, rather than at the lowly SCK rate of 11.05MHz off the Goldilocks platform.

Testing W5500 SPI throughput with Saleae Logic on the Goldilocks ATmega1284p

Testing W5500 SPI throughput with Saleae Logic on the Goldilocks ATmega1284p

I compared the W5500 running uIP in MACRAW mode to the W5200 running identical (except for the driver) code and using the ping function to test how quickly the SPI interface can transfer a received packet to the host MCU, and then transfer the processed packet back to the W5x00 buffer for transmission.

The ping results were slightly slower than previously seen on the w5200. But I believe that is an external issue, possibly resulting from a change in my network. I have repeated the test with the W5200, and now get similar performance too. I believe I may have some network issues to resolve.

1300 Byte ping packet transmitted from a host to the W5500 interface running uIP in MACRAW mode.

1300 Byte ping packet transmitted from a host to the W5500 interface running uIP in MACRAW mode.

Looking at the output of the Saleae Logic and comparing the time taken to transfer the Ethernet frame into the host MCU, we can see that the time required to transfer the 1300 Byte frame is almost identical at 1.52ms.

W5500 Rx Ethernet Frame transfer to the ATmega1284p

W5500 Rx Ethernet Frame transfer to the ATmega1284p

W5200 Rx Ethernet Frame transfer to the ATmega1284p

W5200 Rx Ethernet Frame transfer to the ATmega1284p

Not surprisingly, the time to process the frame, and produce a response frame are also identical.

Ethernet Frame processing on the AVR1284p (W5500)

Ethernet Frame processing on the AVR1284p (W5500)

Ethernet Frame processing on the AVR1284p (W5200)

Ethernet Frame processing on the AVR1284p (W5200)

Conclusion

The W5500 chip is an improved version of the W5200, which was a greatly improved version of the W5100 device. It is a welcome new addition to a long heritage of IP protocol engines from Wiznet.

I think that the improved implementation in 48LQFP packaging and reduced supporting device count will make it easier for hobbyists and low volume manufacturers to generate great Internet tools off the Arduino and small ARM MCU platforms. We’re starting to see some implementations already.

Three generations of Wiznet Internet Protocol Devices. Goldilocks 1284p for scale.

Three generations of Wiznet Internet Protocol Devices. Goldilocks 1284p for scale.

As usual, my code is here at AVRfreeRTOS in the lib_iinchip folder.

Wiznet have made this post Treasure #14.

Gameduino 2 with Goldilocks and EVE

My Gameduino 2 was delivered just a few weeks ago, and I’ve spent too much time with it already. It is the latest Kickstarter project by James Bowman. James has written a Gameduino 2 Book too.

Recently, I’ve used the Gameduino 2 to implement a multi-oscillator audio synthesizer GUI, using many FTDI EVE GPU co-processor widgets. The use of widgets linked with the integrated touch functionality really simplifies the programming of complex GUIs.

The ability to add a large touch screen, with integrated audio and accelerometer to any Arduino project is a great thing. Previously, you had to move to 32 bit processors with LVDS interfaces to work with LCD screens, but the new FT800 EVE Graphical Processing Unit (GPU) integrates all of the graphic issues and allow you to drive it with a very high level object orientated graphics language. For example it takes just one command to create an entire clock face with hour, minute, and second-hands.

The Gameduino 2, via the FT800 EVE chip, provides the following capabilities:

  • 32-bit internal color precision
  • OpenGL-style command set
  • 256 KBytes of video RAM
  • smooth sprite rotate and zoom with bilinear filtering
  • smooth circle and line drawing in hardware – 16x antialiased
  • JPEG loading in hardware
  • audio tones and WAV audio output
  • built-in rendering of gradients, text, dials, sliders, clocks and buttons
  • intelligent touch capabilities, where objects can be tagged and recognised.

The FT800 runs the 4.3 inch 480×272 TFT touch panel screen at 60 Hz and drives a mono headphone output.

EVE Block Diagram

First off, there’s a demo of some of the capabilities of the Gameduino 2. I’ll come to the drivers later, but the Arduino compatible platform used here is the Goldilocks ATmega1284P from Freetronics. The Goldilocks is in my opinion the best platform to use with the Gameduino 2. Firstly there is the extra RAM and Flash capabilities in line with the ATmega1284p MCU. But also importantly the Goldilocks holds the Pre-R3 Arduino Uno connector standard, with the SPI pins located correctly on Pins 11, 12, and 13. And the INT0 interrupt located on Pin 2. This means that it can be used with the Gameduino 2, out of the box. No hacking required.

Must be addicted to these touch screens. I’ve just received an Australian designed 4D Systems FT843 Screen. It has possibly an identical screen to the Gameduino 2, but is based on a R3 Arduino shield format (SPI on ICSP) called the ADAM (Arduino Display Adapter Module), which means that it will work on any current Arduino hardware, without hacking. The FT843 ADAM supports a RESET line, which resolves the only problem I’ve noted with the Gameduino 2. Unfortunately, audio is not supported by a 3.5mm jack but rather by a pin-out option. The FT843 uses Swizzle 0, unlike the Gameduino 2 which uses Swizzle 3, and has the Display SPI Select on either D9 or D4 rather than on D8 like the Gameduino 2. Other than these simple configuration options, it similar.

4D Systems FT843 on Goldilocks 1284p

4D Systems FT843 on Goldilocks 1284p

Demo

The screen shows 5 sets of demonstrations. These demos are provided by FTDI, and typically in an Arduino Uno you would have to choose which of the 5 sets you want to see. With the extra capabilities of the Goldilocks, it is possible to load all of them simultaneously in 110kB of flash.

Set 0 focusses on individual commands that are loaded into the Display List. The Display List is essentially a list of commands that is executed or rendered for each frame of display. A Display List will be rendered indefinitely, until it is swapped by another Display List. Two Display Lists are maintained in a double buffering arrangement. One is written, whilst the other is displayed.

Set 1 exhibits some of the co-processor command capabilities, that allow complex objects to be created with only one command. A clock, slider, dial, or a rows of buttons can be created easily in this manner.

Set 2 shows the JPEG image rendering capabilities in RGB and in 8 bit mono.

Set 3 demonstrates custom font capabilities. There are 16 fonts available in the ROM of the FT800 EVE, but you can add your own as is desired.

Set 4 shows some advanced co-processor capabilities, such as touch tag recognition, no touch (zero MCU activity) screensaver, capturing screen sketches, and inbuilt audio options.

The main screen shows an analogue clock that is drawn with one co-processor command. Real time is generated by a 32,768Hz Crystal driving the Goldilocks Timer 2 for a system clock. The accuracy of the clock is limited only by the accuracy of the watch crystal, and I’ve built mine with a 5ppm version, which should be enough to keep within a few seconds per month.

Sample Application

The FTDI provided sample application covers most of the available commands and options for the FT800 EVE GPU.

The FT_SampleApp.h file contains definitions of functions implemented for the main application. These code snippets are not really useful beyond demonstrations of capability of the GPU, but never the less demonstrate how each specific feature of the FT800 EVE GPU can be utilised.

Driver

Because the FT800 EVE GPU has a very capable object orientated graphics language, the FTDI drivers present a very capable high level interface to the user. FTDI have prepared an excellent starting point from which I could easily make customisations suitable for the AVR ATmega Arduino hardware that I prefer to use.

The FTDI driver set is separated into a Command Layer, and into a Hardware Abstraction Layer (HAL). This separation makes it easy to customise for the AVR ATmega platform, but retains the standard FTDI command language for easy implementation of their example applications, and portability of code written for their command language.

To use the FT800 EVE drivers for the Gameduino 2 it is only necessary to include the FT_Platform.h file in the main program. This file contains references to all of the other files needed.

#include "../lib_ft800/FT_DataTypes.h"
#include "../lib_ft800/FT_X11_RGB.h"
#include "../lib_ft800/FT_Gpu.h"
#include "../lib_ft800/FT_Gpu_Hal.h"
#include "../lib_ft800/FT_Hal_Utils.h"
#include "../lib_ft800/FT_CoPro_Cmds.h"
#include "../lib_ft800/FT_API.h"

The FT_DataTypes.h file contains FTDI type definitions for the specific data types needed for the FT800 EVE GPU. This is mainly used to abstract the drivers for varying MCU. For the AVR it is not absolutely necessary, but it will help when the code is used on other platforms.

The FT_X11_RGB.h file contains the standard colour set used in X11 colours and on the Web, which are stored PROGMEM. I’ve written a small macro that will insert these into commands needing 24 bit colour settings. These colours will be stored and referenced from PROGMEM when they are called from either of the X11 specific macros defined in FT_Gpu.h If they are not called from the program, they will be discarded by the linker and not waste space in the final linked program.

X11 Colours

The FT_Gpu.h file contains all the definitions for command and register setting options. I have significantly rearranged the layout and comments in this file, compared to the FTDI version. Hopefully it is arranged in a way that allows options applying to specific commands and registers to be quickly located.

By writing DL commands to the Display List which are configured by the options in the FT_Gpu.h file it is possible to control most of the low level functions in the FT800 EVE GPU. The Display List is used by the FT800 GPU to render the screen, so it is only the contents of the active Display List that appear on the screen.

In the FT_Gpu_Hal.h file the commands specific to the SPI bus (or the I2C bus if this transfer mechanism is being used) are defined.

I have simplified out some HAL options provided by FTDI for high performance MCU, that might be constrained writing to the SPI bus at only 30MHz, the maximum FT800 SPI bus rate. The Goldilocks SPI bus only runs at 11MHz, and the standard Arduino Uno SPI bus only runs at 8 MHz, so those optimisations don’t help, and they also consume RAM for streaming buffers.

But, I have integrated a multi-byte SPI transfer into the HAL, which don’t use additional RAM buffer space, as they write via a pointer. This is probably the best way to work the SPI bus in the Arduino environment. I have also implemented multi byte SPI transfer directly from the PROGMEM for Strings, and for precomputed commands.

As a preferred option, I’ve implemented PROGMEM storage of Strings for all commands. The commands utilising RAM storage of Strings are retained for compatibility, and to allow computed Strings to be used.

All of the FTDI provided commands now have optional *_P variants which take PROGMEM strings, rather than RAM strings. This saves eleven hundred bytes of RAM used for strings, just in the demonstration programs provided by FTDI and shown in the Demo!

The FT_Hal_Util.h file contains some simple utility macros.

The FT_CoPro_Cmds.h file contains definitions for all of the available co-processor commands. These command are written to the co-processor command buffer, and are used to generate low level commands that appear in the Display List and be rendered for each frame.

Many of the co-processor commands replicate functionality of setting specific registers with options via the Display List GPU commands. This is useful because it is possible to programme the co-processor to implement a task and remain at the object orientated view of the screen, even though the a individual command may be a simple GPU setting that could have been done at Display List command level. Having all the commands available at co-processor level obviates the need to switch between the two “modes” of operation and thought.

I extracted a few of the standard functions that are needed irrespective of the specific application into an API. The FT_API.h file contains these simple command sequences, for booting up the Gameduino 2, and for managing the screen brightness. It also contains precalculated simplified sin, cos, and atan functions useful when drawing circles and clocks.

The API level also contains calls on the Hardware Abstraction Layer that are simply passed through. These calls are flattened by avr-gcc to save digging ourselves into a stack wasting function call hole.

And, of course, everything is integrated into the freeRTOS v8.0.0 port that I support on Sourceforge, AVRfreeRTOS, which gives non-blocking timing, tasks, semaphores, queues, and all aspects of freeRTOS that are so great.

As an example of the power of this combination of freeRTOS and the FT800 object orientated command language we can describe the method used to create an accurate well rendered clock on the Gameduino 2 screen. Using the 3 commands below, we obtain the clock face seen in my demo video main screen.

time(&currentTime); // get a time stamp in current seconds elapsed from Midnight, Jan 1 2000 UTC (the Y2K 'epoch'), as maintained by freeRTOS.
localtime_r(&currentTime, &calendar); // converts the time stamp pointed to by currentTime into broken-down time in a calendar structure, expressed as Local time.
FT_GPU_CoCmd_Clock(phost, FT_DispWidth - (FT_DispHeight/2), FT_DispHeight/2, FT_DispHeight/2 - 20, OPT_3D, calendar.tm_hour, calendar.tm_min, calendar.tm_sec, 0); // draw a clock in 3D rendering.

I’ve updated the clock function to include a touch screen time setting interface. Using the FT800 Touch Tags, and Button generation, this process is really incredibly easy.

Hardware

I’ve taken the liberty of borrowing some of James’ pictures for this story. They can originally be found here.

Gameduino 2 Pinout

Note that because of the wrap around connector and cable for the LCD screen, it is not possible to use the Arduino R3 pin out. The SPI bus pins are located at the traditional location on Pin 11 though Pin 13. Unless you want to hack your board, you’re limited to using Arduino Uno style boards.

Gameduino 2 Shield

Unfortunately, the FTDI FT800 Reset pin has not been implemented by the Gameduino 2. Using an ISP to programme the Arduino usually “accidentally” puts the FT800 EVE GPU into an unsupported state. This means that the Gameduino 2 and Arduino usually have to be power-cycled or hard Reset following each programming iteration. It would have been good to tie the FT800 Reset pin to the Arduino Reset pin via a short (ms) delay chip, to obviate the need to remove power to generate the hard Reset for the FT800.

Hello World & other examples

I thought it might be interesting to compare the code required to achieve the demonstration outcomes that James Bowman provides on the Gameduino2 site, with the code required to achieve the same result using freeRTOS and the FTDI style driver. So I’ve implemented three simple examples, “Hello World”, “Sprites”, and “Blobs” from his library.

All of the examples have been built using an Arduino Uno ATmega328p as the MCU hardware platform.

helloworld

The Hello World application simply initialises the Gameduino2, sets the colour to which the screen shall be cleared, and then writes text with the OPT_CENTER option to center it in the X and Y axis. As there is no delay, this is written as often and as fast as the MCU can repeat the loop.

#include <SPI.h>
#include <GD2.h>

void setup()
{
  GD.begin();
}

void loop()
{
  GD.ClearColorRGB(0x103000);
  GD.Clear();
  GD.cmd_text(240, 136, 31, OPT_CENTER, "Hello world");
  GD.swap();
}

The same result can be generated in C using freeRTOS and the FTDI Drivers. I have commented extensively within the code below.

/* freeRTOS Scheduler include files. */
/* these four header files encompass the full freeRTOS real-time OS features,
   of multiple prioritised tasks each with their own stack space, queues for moving data,
   and scheduling tasks, and semaphores for controlling execution flows */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this variable comes from.
                                               // It is automatically included, so this line is actually unnecessary.

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice.
                                               // Saves a few bytes of stack because the return from main() is not implemented.

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD.
                                               // typically multiple concurrent tasks are defined,
                                               // but in this case to replicate the Arduino environment, just one is implemented.

/*-----------------Functions---------------------------*/
/* Main program loop */
int main(void)
{
  xTaskCreate(            // create a task to write on the Gameduino 2 LCD
       TaskWriteLCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                // number of bytes for this task stack
    ,  NULL
    ,  3		  // priority of this task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();  // now freeRTOS has taken over, and the pre-emptive scheduler is running.
}
/*-----------------------------------------------------------*/
/* Tasks                                                     */
/*-----------------------------------------------------------*/

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
  (void) pvParameters;

  FT_API_Boot_Config();  // initialise the Gameduino 2.

  while(1)               // a freeRTOS task should never return
  {
    FT_API_Write_CoCmd( CMD_DLSTART );                       // initialise and start a Display List
//  FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0x10, 0x30, 0x00) ); // set the colour to which the screen is cleared (using RGB triplets) as in GD2 library OR
    FT_API_Write_CoCmd( CLEAR_COLOR_X11(FORESTGREEN) );      // set the colour to which the screen is cleared (using X11 colour definitions)
    FT_API_Write_CoCmd( CLEAR(1,1,1) );                      // clear the screen

    FT_GPU_CoCmd_Text_P(phost,FT_DispWidth/2, FT_DispHeight/2, 31, OPT_CENTER, PSTR("Hello world"));
      // write "Hello World" to X and Y centre of screen using OPT_CENTER  with the largest font 31
      // The string "Hello world" is stored in PROGMEM
      // Functions with *_P all use PROGMEM Strings (and don't consume RAM)
      // FT_DispWidth and FT_DispHeight are global variables set to orientate us in a flexible consistent way,
      // without hard coding the screen resolution.

    FT_API_Write_CoCmd( DISPLAY() );                         // close the Display List (DL) opened by CMD_DLSTART()
    FT_API_Write_CoCmd( CMD_SWAP );                          // swap the active Display List (double buffering), to display the new "Hello World" commands written to the Display List
  }
}

sprites

The Sprites application is similar to the original one built for the Gameduino, but here each sprite is rotating around a random point. The 2001 random points are stored in a PROGMEM array sprites. This takes 8K of flash. A second PROGMEM array circle holds the 256 XY coordinates to make the sprite move in a circle. The only RAM used is a single byte t used to keep track of the current rotation position, by counting iterations.

#include <EEPROM.h>
#include <SPI.h>
#include <GD2.h>

#include "sprites_assets.h"

void setup()
{
  GD.begin();
  GD.copy(sprites_assets, sizeof(sprites_assets));
}

static byte t;

void loop()
{
  GD.Clear();
  GD.Begin(BITMAPS);
  byte j = t;
  uint32_t v, r;

  int nspr = min(2001, max(256, 19 * t));

  PROGMEM prog_uint32_t *pv = sprites;
  for (int i = 0; i &lt; nspr; i++) {
    v = pgm_read_dword(pv++);
    r = pgm_read_dword(circle + j++);
    GD.cmd32(v + r);
  }

  GD.ColorRGB(0x000000);
  GD.ColorA(140);
  GD.LineWidth(28 * 16);
  GD.Begin(LINES);
  GD.Vertex2ii(240 - 110, 136, 0, 0);
  GD.Vertex2ii(240 + 110, 136, 0, 0);

  GD.RestoreContext();

  GD.cmd_number(215, 110, 31, OPT_RIGHTX, nspr);
  GD.cmd_text( 229, 110, 31, 0, "sprites");

  GD.swap();
  t++;
}

The code in freeRTOS is similar. I have commented within the code.

/* freeRTOS Scheduler include files. */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

// The include file containing the sprite graphics, and the special command sequence
#include "sprites_assets.h

/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this variable comes from

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD

/*-----------------Functions---------------------------*/

/* Main program loop */
int main(void)
{
  xTaskCreate(             // create a task to write on the Gameduino 2 LCD
    TaskWriteLCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                 // number of bytes for this task stack
    ,  NULL
    ,  3                   // priority of task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();   // now freeRTOS has taken over, and the pre-emptive scheduler is running
}
/*-----------------------------------------------------------*/
/* Tasks                                                     */
/*-----------------------------------------------------------*/

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
  (void) pvParameters;

  uint8_t t = 0;         // iterate over the code for 255 times, before restarting with 256 sprites where t = 0

  FT_API_Boot_Config();  // initialise the Gameduino 2.
  FT_GPU_HAL_WrCmdBuf_P(phost, sprites_assets, sizeof(sprites_assets));
    // Copy James' magic list of commands into the command buffer.
    // These co-processor commands are "compiled" into their 4 byte equivalents, and I haven't decoded them in detail.
    // But, since the FT800 is reading the same double word codes, it doesn't really matter how they're generated.

  while(1)               // a freeRTOS task should never return
  {
    FT_API_Write_CoCmd( CMD_DLSTART );       // initialise and start a Display List (DL)
    FT_API_Write_CoCmd( CLEAR(1,1,1) );      // clear the screen

    FT_API_Write_CoCmd( BEGIN(BITMAPS) );    // start to write BITMAPS into the DL
    uint8_t j = t;
    uint32_t v;
    uint32_t r;
    int16_t nspr = min(2001, max(256, 19 * t));
    ft_prog_uint32_t * pv = sprites;         //  pv is the sprite BITMAP pointer

    for (uint16_t i = 0; i < nspr; ++i) {
      v = pgm_read_dword(pv++);              // determine which sprite we're controlling
      r = pgm_read_dword(circle + j++);      // circle is the rotation control
      FT_GPU_HAL_WrCmd32(phost, v + r);      // the sprite address and the location are written here to the co-processor
    }
    FT_API_Write_CoCmd( END());              // finish writing BITMAPS into the Display List

    FT_API_Write_CoCmd( BEGIN(LINES) );      // start to write LINES into the Display List
    FT_API_Write_CoCmd( COLOR_RGB(0x00, 0x00, 0x00) );  // set the line colour to black 0x000000
    FT_API_Write_CoCmd( COLOR_A(140) );                 // set alpha channel transparency
    FT_API_Write_CoCmd( LINE_WIDTH( 28 * 16) );
    FT_API_Write_CoCmd( VERTEX2II(240 - 110, 136, 0, 0) );  // start to draw an alpha transparency background line
    FT_API_Write_CoCmd( VERTEX2II(240 + 110, 136, 0, 0) );  // finish the line
    FT_API_Write_CoCmd( END() );             // finish writing LINES into the Display List

    FT_API_Write_CoCmd( RESTORE_CONTEXT() ); // With no prior SAVE_CONTEXT() command, this restores the default colours and values.

    FT_GPU_CoCmd_Number(phost, 215, 110, 31, OPT_RIGHTX, nspr);    // write a number.
    FT_GPU_CoCmd_Text_P(phost, 229, 110, 31, 0, PSTR("sprites"));  // write using a PROGMEM stored string function, to save RAM
      //  phost is a pointer to the context for the Gameduino2.
      //  Mainly used where there may be multiple screens present, but in this case several state and semaphore items are maintained.

    FT_API_Write_CoCmd( DISPLAY() );          // close the active Display List (DL) opened by CMD_DLSTART()
    FT_API_Write_CoCmd( CMD_SWAP );           // Do a DL swap to render the just written DL

    t++;    // t will roll over and will restart the number of sprites to the minimum of 256
  }
}

main2

blobs is a sketching demonstration, as you paint on the touch screen a trail of circles follows.
The code keeps a history of the last 128 touch positions, and draws the transparent, randomly coloured circles.

#include <EEPROM.h>
#include <SPI.h>
#include <GD2.h>

#define NBLOBS      128
#define OFFSCREEN   -16384

struct xy {
  int x, y;
} blobs[NBLOBS];

void setup()
{
  GD.begin();

  for (int i = 0; i < NBLOBS; i++) {
    blobs[i].x = OFFSCREEN;
    blobs[i].y = OFFSCREEN;
  }
}

void loop()
{
  static byte blob_i;
  GD.get_inputs();
  if (GD.inputs.x != -32768) {
    blobs[blob_i].x = GD.inputs.x << 4;
    blobs[blob_i].y = GD.inputs.y << 4;
  } else {
    blobs[blob_i].x = OFFSCREEN;
    blobs[blob_i].y = OFFSCREEN;
  }
  blob_i = (blob_i + 1) & (NBLOBS - 1);

  GD.ClearColorRGB(0xe0e0e0);
  GD.Clear();

  GD.Begin(POINTS);
  for (int i = 0; i < NBLOBS; i++) {
    // Blobs fade away and swell as they age
    GD.ColorA(i << 1);
    GD.PointSize((1024 + 16) - (i << 3));

    // Random color for each blob, keyed from (blob_i + i)
    uint8_t j = (blob_i + i) & (NBLOBS - 1);
    byte r = j * 17;
    byte g = j * 23;
    byte b = j * 147;
    GD.ColorRGB(r, g, b);

    // Draw it!
    GD.Vertex2f(blobs[j].x, blobs[j].y);
  }
  GD.swap();
}

The code in freeRTOS is similar, but the touch functionality is derived directly from the FT800 register containing the most recent screen touch location. I have commented within the code.

/* freeRTOS Scheduler include files. */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

#define NBLOBS       128
#define OFFSCREEN   -16384

/*----------Global used for HAL management-------------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this comes from

struct xy {										// somewhere to store all the blob locations
  int16_t x, y;
} blobs[NBLOBS];

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD

/*-----------------Functions---------------------------*/

/* Main program loop */
int main(void)
{
  xTaskCreate(            // create a task to write on the Gameduino 2 LCD
    TaskWriteLCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                // number of bytes for the task stack
    ,  NULL
    ,  3                  // priority of task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();  // now freeRTOS has taken over, and the scheduler is running
}

/*-----------------------------------------------------------*/
/* Tasks                                                     */
/*-----------------------------------------------------------*/

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
 (void) pvParameters;

  FT_API_Boot_Config();     // initialise the Gameduino 2.
  FT_API_Touch_Config();    // initialise the FT800 Touch capability.

  for (uint8_t i = 0; i < NBLOBS; ++i)
  {
    blobs[i].x = OFFSCREEN;
    blobs[i].y = OFFSCREEN;
  }

  while(1)                  // a freeRTOS task should never return
  {
    static uint8_t blob_i;  // the blob we're currently processing
    uint32_t readTouch;     // xy coordinates of a touch are stored in uint32_t

    // this is the touch interface stuff
    readTouch = FT_GPU_HAL_Rd32(phost, REG_TOUCH_SCREEN_XY);// the screen location of the last touch is stored in REG_TOUCH_SCREEN_XY
    if (readTouch != NIL_TOUCH_XY)                          // if there was a touch
    {
      blobs[blob_i].x  = (int16_t)((readTouch >> 16) & 0xffff) << 4; // read where x axis touch occurred, and scale it
      blobs[blob_i].y = (int16_t)(readTouch & 0xffff) << 4; // read where y axis touch occurred, and scale it
    } else {
      blobs[blob_i].x = OFFSCREEN;         // if there was no touch, draw the blob OFFSCREEN
      blobs[blob_i].y = OFFSCREEN;
    }
    blob_i = (blob_i + 1) & (NBLOBS - 1);  // increment to the next blob for touch interaction

    // this is the display interface stuff
    FT_API_Write_CoCmd( CMD_DLSTART );      // initialise and start a display list (DL)

    FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0xe0, 0xe0, 0xe0) );// set the colour to which the screen will be cleared
    FT_API_Write_CoCmd( CLEAR(1,1,1) );     // clear the screen

    FT_API_Write_CoCmd( BEGIN(POINTS) );    // start to write POINTS into the Display List (DL)

    for (uint8_t i = 0; i < NBLOBS; ++i)
    {
      // Blobs fade away and swell as they age
      FT_API_Write_CoCmd( COLOR_A(i << 1) ); // set an alpha transparency
      FT_API_Write_CoCmd( POINT_SIZE((1024 + 16) - (i << 3)) );

      // Random colour for each blob, keyed from (blob_i + i)
      uint8_t j = (blob_i + i) & (NBLOBS - 1);
      uint8_t r = j * 17;
      uint8_t g = j * 23;
      uint8_t b = j * 147;
      FT_API_Write_CoCmd( COLOR_RGB(r, g, b) );

      // Draw it!
      FT_API_Write_CoCmd( VERTEX2F(blobs[j].x, blobs[j].y) );
    }

    FT_API_Write_CoCmd( END() );            // finish writing POINTS into the active DL

    FT_API_Write_CoCmd( DISPLAY() );        // close the active Display List (DL)
    FT_API_Write_CoCmd( CMD_SWAP );         // Do a DL swap to render the just written DL
    }
}

I intend to build a few more demonstrations of the code, and to copy some games that James has already implemented, because I’m not a game designer.

Goldilocks Analogue – Prototyping

Last time I designed a Goldilocks board, it was because I was unhappy about the availability of a development platform that was within my reach; a tool to enable me to continue to learn about coding for micro-controllers.

This Goldilocks, let us call it Goldilocks Analogue, it is not about what I think is necessary, but more about what I’d like to have. The focus is not so much about the basics of SRAM and Flash, but much more on what functions I would like to have, and using my own means to get there.

Also, as the original Goldilocks is sold out, Freetronics are considering making their own version. Please add your wishes here.

Test results are in. Check out the detailed post on Goldilocks Analogue – Testing. Following the testing, I’ve redesigned the analogue output section to make it much more capable. It now support simultaneous AC and DC outputs, with an application specific headphone amplifier device to provide AC output, and high current OpAmp to provide DC output.

Background

The Goldilocks Project was specifically about getting the ATmega1284p MCU onto a format equivalent to the Arduino Uno R3. The main goal was to get more SRAM and Flash memory into the same physical footprint used by traditional Arduino (pre-R3) and latest release Uno R3 shields.

Goldilocks Arduino 1284p

Original – Goldilocks Version 1.1

I also tried to optimally use the co-processor ATmega32U2, (mis)utilised by Arduino purely for the USB-Serial functionality, by breaking out its pins, and creating a cross-connect between the two MCU to enable them to communicate via the SPI bus.

Whilst the Goldilocks achieved what it set out to do, there were some problems it created for itself.

Firstly, the ATmega family of devices is really very bad a generating correct USART baud rates when their main frequency doesn’t match a multiple of the standard USART rates. Engineers in the know select one of these primary clock rates (for example 14.7456MHz, 18.432MHz, or 22.1184MHz) when they’re planning on doing any real Serial communications. Unfortunately, the 16MHz clock rate chosen by the Arduino team generates about the worst USART timing errors possible.

This means that the Arduino devices can only work at 16MHz while programming them with the Serial Bootloader, otherwise programming is bound to fail, due to losing a bit or two due to the clock rate error.

Arduino had serial programming completely solved in the old days by using a real USB-USART chip, the FTDI FT232R, but for some reason they stopped doing the right thing. This might have been the perfect solution, but they abandoned it. Who knows why…

Secondly, although having an integrated uSD card cage on the platform is a great thing, using a resistor chain to do the voltage conversion is nominally a bit problematic. The output pins (SCK, MOSI, CS) are permanently loaded by 3k2 Ohm and an input pin (MISO) high signal generates only 0.66 of Vcc, which only just clears the minimum ATmega signal high level of 0.6 Vcc. Neither of these issues prevent the uSD card from working, and the voltage divider resistor chain takes almost no space on the board. But still it is not perfect.

Thirdly, there are some minor oversights in the V1 build that I would like to correct if possible.

New Directions

I’ve been toying with the idea of building an Xmega board, in Arduino Uno R3 format, because of the significantly enhanced I/O capabilities of this MCU including true DAC capabilities, but I’ve not followed up for two reasons; the Xmega has no history of use by hobbyists as there is with the ATmega devices, and it doesn’t bring any advantage that an ARM MCU wouldn’t otherwise do better and faster.

Never the less, the ATmega platform still lacks one thing that I believe is necessary; a high quality analogue capability. The world is analogue, and having an ADC capability, without having a corresponding DAC capability, is like having a real world recorder with no means to playback these real world recordings.

A major initiative of the Goldilocks is to bring an analogue capability to the Arduino platform. So this device will be called the Goldilocks Analogue.

Updated - Goldilocks Analogue

Updated – Goldilocks Analogue

There have been music shields and audio shields built before, and the design used is closely aligned to the original Adafruit Wave Shield, but I’ve not seen dual high quality DACs with both AC and DC capability, integrated onto the main board of an Arduino previously. So that’s where I’m going.

The goal is to be able to produce a DC referenced signal, from 0Hz up to around 100kHz, that can provide a binary-linear representative voltage (with sufficient current) to enable a control system, as well as to produce the highest quality audio, with very low noise and THD buffer amplifiers, that the basic AVR platform is capable of producing.

Using Eagle

I used to look at Eagle (Kicad, etc) with healthy scepticism. Yeah, not something that I’d be able to learn, but in the process of realising the Goldilocks Analogue, I have learned that it is far easier to learn a new skill than it is to guide someone in India or Malaysia, who doesn’t even get the start of what I want. The old idiom, if you want something done right, you’ve got to do it yourself.

There is a “Fremium” version of Eagle available, which is enough to get started. I’m going to try to get a “Hobbyist” version as soon as the paperwork is through.

So all this below is my first Eagle project.

The Schematic

I’ll talk through each item in the schematic, particularly those things which are novel in the Goldilocks Analogue. The schematics for the Goldilocks V1 can be found in the User Manual.

FT232R

The FT232R is the same device used in countless earlier Arduinos, such as the Duemilanove, and in USB-Serial adapters everywhere. The drivers for all major operating systems are widespread and there is no magic required. Importantly, the FT232R chip generates a real USART baud rate, at any speed from 300 baud to 3 Mbaud.

Unlike in the Duemilanove I’m using the FT232RQ chip, which is in the QFN package. There is too much going on to take up the board space with the larger package.

FT232RQ - Goldilocks Analogue

I’ve added a switch to disable the DTR Reset functionality of the Arduino and Wiring Bootloaders. Often, I would like a running device NOT to be reset by plugging the USB cable, but then I’ll be using the Goldilocks in another thing where I do want this to happen. Having a switch, like Seeed often do, is the best answer.

Also, I’ve added a 6 pin connector replicating the standard FTDI pin-out, to enable the FT232RQ to communicate with other devices, should this be necessary. It would be a shame to lock it into the board, with no option for extension.

uSD Buffer

In designing the buffer for the uSD, I was trying to achieve two things. Firstly, isolate the uSD card entirely from the SPI bus when it was not in use. By isolate, I mean over 1MOhm resistance. This isolation ensures that the uSD card doesn’t load up the SPI pins at all, when the uSD is not being used.

Secondly, I was trying to ensure that each end of the SPI bus receives the correct voltages and currents to ensure maximum throughput.

uSD Buffer - Goldilocks Analogue

The two devices selected achieve both goals as desired.

For the MCU to uSD direction (SCK, MOSI, and CS) I’m using a 74LVC125 in quad package. This package is tolerant of inputs at 5V rising above its Vcc of 3V3. The output enable on low, is connected to the Chip Select line, which means that the uSD card will not be driven unless the CS line is low. It always presents a high impedance to the MCU.

As a quad package the 74LVC125 has one spare gate, which can be used to drive the Arduino LED. This is neat no cost result that entirely removes any loading on Arduino Pin13.

For the uSD to MCU direction the buffer has to effectively produce a 5v CMOS high when receiving a 3V3 CMOS high. The best way to do this is to use a device that is TTL signal compatible. The TTL minimum high signal is only 2V, much lower than the CMOS minimum high signal of 2/3 of Vcc, and importantly below the worst case of 2/3 of 3V3 CMOS.

The only device I could find with the required characteristic of accepting TTL inputs with a low output enable, is the MC74VHC1GT125. I’m sure there are other options though.

DAC and Buffer

This is the fun stuff. Analogue… the real world. As noted above, the goal is to produce two binary-linear signals with enough buffering that they can drive a reasonable load (such as small headphones or an audio amplifier) and produce a constant voltage under a number of power supply options.

The inspiration for the circuit came from the Adafruit Wave Shield, but there are a number of significant improvements that are worth noting, not least the use of a dual DAC, for two channels of output.

DAC and Buffer - Goldilocks Analogue

Firstly, if you want to get a very low noise output, whilst using a high current Switch Mode Power Supply, it is necessary to filter the supply voltage. I’ve utilised the dual steps of an L-C primary filter, followed by a ferrite core bead secondary filter. I’m not sure whether this is all necessary, and I’ll be testing the circuit later with various components removed to check their efficacy in the role, but if they’re not designed in now they never will be added later.

I’m using the Microchip MCP4822 DAC to produce the raw output voltage. This is an SPI device which will be selected using the other “spare” Goldilocks digital pin PB1. Using PB1 to signal the DAC means that none of the Arduino R3 pins are used for on-board Goldilocks functions, and as both CS lines (PB0 and PB1) are tied high they will ensure that all these on-board devices stay off the SPI bus during system reboot.

The MCP4822 takes 16 bits to set a signal level, this is two SPI bus transactions. The maximum SPI rate is SCK/2. Therefore, if my Goldilocks is doing nothing else, it can generate 691,200 SPI transactions per second. If both DACs are being driven we can generate a square wave of 172,800Hz. This is an unreachable figure. More likely, the best case will be around 50kHz for both channels, or 100kHz if only one DAC is being used.

Optionally, the LDAC pin-out can be used to synchronise the transfer of digital inputs to the analogue output buffers across the two DACs or to a specific clock with low jitter.

Unlike the Adafruit solution, the MCP4822 generates its own internal 4.096V reference voltage Vref. This means that irrespective of whether the Goldilocks Analogue is being powered by a battery, by USB, or by the barrel connector and the SMPS, the output voltage for a particular digital input will be constant.

The op-amp configuration with dual op-amps, in a quad package, designed to double the current capability of the output, has raised concern from all who see it. Concern was my initial thought too. However after some research, I found it to be a recommended configuration for current doubling. The only difference to the Adafruit example circuit is to add low value output resistors which allow each op-amp to find its own offset level without consuming excess current.

I have added the option to bridge the output capacitors to provide a DC output. The output capacitors are necessary for audio use, as headphones or audio amplifier inputs require an AC connection, with no DC offset.

The Layout

It takes many hours to layout even a small board the size of an Arduino Uno. Luckily, I had a completed and fully functioning example to use as a platform, thanks to Jon’s prior work on Goldilocks V1.

The final prototype board layout is now done, and the board design sent off for manufacturing.

Goldilocks Analogue BoardIn this layout, I’ve been able to retain most of what makes a Goldilocks; the ATmega1284p, the complete dual rows of header pins arranged in pin-logical order 0-7, bridging of the I2C pins to A4/A5, JTAG, and a high current power supply. Added to this now are the three items described above; the FT232RQ and Reset switch, buffers for the uSD card, and the analogue platform.

Starting in the bottom left, the SMPS has been relaid to significantly shorten the high current paths around pins 2, 3, and 4. This will reduce the circuit noise, and taken togther with the effort to create solid ground planes, and specific AVcc filtering, will help to ensure the minimum of power supply noise in the analogue platform.

On the right we can see the uSD buffers, which have eaten into the prototyping space significantly. Although the signals will be much nicer than with a resistor bridge, the cost is clearly on space. If the Goldilocks Analogue ever goes into production the SOIC package buffer chip will be replaced by a QFN package, and some space should be recoverable.

Finally, the analogue platform is implemented in the top left of the board, to the left of the pin-outs for the analogue platform and the FTDI interface. Below the pin-outs the analogue supply voltage filtering is implemented, with exception to the chip decoupling capacitors which are tied directly to their supply pins.

Keeping the analogue lines as short, as balanced, as fat, and as well shielded as possible was a key focus of my design. There are a few USART lines running under the chips, but they are unlikely to produce noise as they are under the first ground plane.

Goldilocks Analogue TopThe top layer of the board is pretty crowded. Some tricks such as bridging my lines to get a solid ground plan under the crystal, were passed to me.

Goldilocks Analogue Route2The Route2 or second layer is the ground plane of the board. As such it needs to provide a stable and solid path for currents to return to the origin. I have been able to provide almost solid copper under the entire area from MCU to power supply, and also from the analogue platform back to the central ground point.

Goldilocks Analogue Route15In the Goldilocks Analogue (as in Goldilocks V1) the Route15 layer is wholely at 5v and is a massive supply line. I’ve used this layer to transport the 3v3 supply around the lower edge of the board, to provide power to the uSD card, and its input buffer. The other thick tracks are the USB input line and the analogue AVcc supply line.

Goldilocks Analogue BottomOn the back of the board, mirrored here, things look as we expect. The previously noted bridge capability for the I2C bus to A4/A5 is there, as is the capability to bridge the DAC A and DAC B output capacitors to enable DC output.

Next Steps

The Goldilocks Analogue prototype board design has been sent to Seeed Studio for conversion into a PCB. While this is happening I’ll be sourcing components to solder to the PCB. I think the next post will be on this stage of the process.

Well I have everything finished and in the interim, until I write a new post, here’s the photos of the final assembly of the prototype at Jon’s SuperHouse.

Goldilocks Analogue - 3Here Jon is assembling the first prototype, using several faulty Goldilocks v1.1 devices as donor boards. Only two components didn’t fit correctly, and we didn’t have a uSD card cage so that was left off.

Goldilocks Analogue - 1Out of the toaster oven, and final assembly finished. Just checking that the voltages are as expected across the board.

Well I’ve had it on the desk now for two nights, and I’m very impressed that it seems to generally meet the specification that was intended. The code for setting the DAC levels is currently only optimised for setting two values at a time. Specifically, it is not a streaming function. Never-the-less, it is possible to achieve the stated goal for both DAC channels. The actual number achieved is 108 kSamples/second, shown below, or 18.8us to transmit 2 samples on 2 channels.

The trace below shows the signals for both DACs at 0x0000, then both DACs set to 0x0FFF.

Goldilocks Analogue Max DAC Rate

Therefore, we’ll be able to achieve the 44.1kHz sample rate for CD audio, but only 12 bit resolution, with some time time to spare. If there is a need to read a uSD card, or do some other processing then it is likely that this rate will be more than halved, as the data would then need to to be read over the SPI bus (the same bus the DAC is using) for example. Also, there is a single pole filter between the DACs and the OpAmp buffer, with a 3dB cut-off frequency of 23kHz, which will limit the maximum output frequency but will help to reduce sampling alias issues.

Looking at the board from the top left the MCP4822 can be seen in the SIOC8 package, with the Burr Brown OPA4132 quad op-amp in a SOIC14 package just near the POWER selection jumper. The FTDI FT232RQ USART in QFN package takes up much less space than its FT232RL peer.

Goldilocks Analogue - Top Left

Goldilocks Analogue – Top Left

Now the prototype is finished, it is easy to see what needs to be improved. Actually there’s not too much wrong. The inductors for the Analogue Vcc have the wrong footprint, so they will need to be fixed. The inductor is too large for the footprint and is snuggled up to the POWER jumper, and the ferrite bead is somewhat too small. I didn’t source the very small 15 turn potentiometers, so they are just shorted out. As is the DTR (RESET) disable switch located near the USB connector. As a final issue, the footprint for the 1/8″ jack was wrong for the supplied connectors, so I’ve just added a short set of jumpers to achieve the same outcome.

Goldilocks Analogue - Bottom Right

Goldilocks Analogue – Bottom Right

Here is a short video demonstrating a Voltage Controlled Oscillator running at 44.1kHz sampling into dual channels. It sounds a little odd, because one of the channels is inverted, generating an out of phase effect.

Results

Well, things are good, and bad.

I’ve been testing the DAC stage and found (what I should have known) that I needed an output buffer op-amp able to reach the negative rail (0V) on input and output to support the MCP4822 0v to 4.095V ranging DAC. The OPA4132 exhibits noise and instability issues around 0.3V output.

Unfortunately the OPA4350 (rail to rail high current), which looks like it will be the right pin compatible device, costs over $10 each, which is nearly as expensive as the audiophile OPA4132 I specified previously.

There seems to be a pin compatible alternative, the TS924A, which is about $2 each, but it is several orders of magnitude worse in performance.

For Example: OPA4350 vs TS924A
Gain Bandwidth Product: 38MHz vs 4MHz
Slew Rate: 22V/μs vs 1.3V/μs
Total Harmonic Distortion: 0.0006% vs 0.005%

Is it worth the difference, when working with a 12 bit DAC in the presence of mV of power supply noise?
Personally, I doubt it.

Using my new Red Pitaya to analyse the output, with a 43.066Hz Sine wave (1024 samples at 44.1kHz) the noise floor is 70dB down from the signal ex DAC. It seems the DAC performs as advertised.

GoldilocksAnalogue43HzSineZoom

43.066Hz 12bit Sine wave, 1024 samples output at 44.1kHz.

More in part two of Goldilocks Analogue – Testing.

Ends.