Now that the new prototype for the Goldilocks Analogue is completed, it is time to test it to see how successful the design was. And interestingly, in the time that I’ve been designing the Goldilocks Analogue, Stanford University in collaboration with Texas Instruments have produced their own Analog Shield.
So this test will compare the Goldilocks Analogue with its dual channel 12bit MCP4822 DAC with the Stanford Analog Shield quad channel 16bit DAC8564 DAC. In a later test sequence, using the same test tone, I compare the Goldilocks Analogue with the Open Music Labs Audio Codec Shield using a Wolfson Micro WM8731 24 bit Codec with ADC, DAC, and signal processing capabilities.
Summary (TL;DR)
The test platform is essentially the same ATmega1284p device, clocked at 22.1184MHz. For the Goldilocks Analogue it is integrated on to the main board. For the Analog Shield I used a Goldilocks device as provided in the Pozible project.
You don’t need a lot of space to have great tools. I’m using a Red Pitaya device, configured as an oscilloscope and as a spectrum analyser, together with a Saleae Logic to capture SPI transactions.
Micro Test bench – Red Pitaya and Saleae Logic
Both Goldilocks Analogue and Analog Shield are comfortably capable of producing reasonable quality stereo signals at 44.1kHz sampling rate. Both devices output beautiful looking 43.1Hz sine waves, at 44.1kHz reconstruction rate, from a 16 bit 1024 sample Sine Wave.
The Goldilocks Analogue produces 0V to 4.096V 1:1 buffered signals from its DC outputs, and an AC amplified headphone output in parallel. The Analog Shield produces -5V to +5V balanced amplified signals from the 0 to 2.5V DAC.
Full Swing 43.1Hz Sine Wave Goldilocks Analogue – Blue 0V to +4.096V Analog Shield – Red -5v to +5V
The Analog Shield shows a lot of harmonics at high frequencies as shown. Comparing the two solutions, the Analog Shield doesn’t do justice to the extra 4 bits (theoretically 98dB SNR 16 bit DAC), over the Goldilocks Analogue (theoretically 74dB SNR 12 bit DAC). The BOM price difference between MCP4822 ($4.60) and DAC8564 ($20) is hard to justify given the performance demonstrated.
I think that using the platform of the AVR ATmega (Arduino) there is little point using a 16 bit DAC. There is too much noise (many mV) in the power supply and around Goldilocks or Arduino Uno or Mega boards to make more than 10 to 12 bits of DAC resolution (or ADC resolution) in any way relevant.
Overlaid 976kHz Spectrum – Analog Shield Red – Shows significant harmonics
Stanford – TI Analog Shield
As part of its microcontroller course, Stanford University required a platform to sample and generate analogue signals. The Standford – TI Analog Shield arose from this need.
The Analog Shield contains a Texas Instrument quad channel ADC and a quad channel DAC, together with a variable voltage supply. I have not tested the ADC.
The DAC capability is based on a Texas Instrument DAC8564 device. This device has many interesting features, including the ability to synchronise loading of updated digital outputs, and to maintain multiple power-down states. The DAC8564 has many great features, and it comes with a price tag to match its capabilities.
Analog Shield – Quad 16bit DAC & Quad 16bit ADC – Stanford University & Texas Instruments
Signals generated by the DAC8564 (from 0v to 2.5V) are biased around 0V and amplified to produce a 10V full swing output. The output exhibits some “cramping” around 0x0000 (-5V) outputs.
Analog Shield – Quad DAC Schematic – -5V to +5V full swing
Head to Head
Testing was done using a 16 bit 1024 sample Sine Wave file. Outputs were generated by a timer triggered to interrupt every 22.7us (44.1kHz), and produce a new output level. Testing should show only a main signal at 43.1Hz, and the reproduction frequency of 44.1kHz. The Goldilocks Analogue discards the lower 4 bits of the samples and only outputs the 12 most significant bits.
Theoretically, the Goldilocks Analogue MCP4822 DAC should be able to achieve 74dB SNR, with its 12 bits of resolution, based on the rule of thumb SINAD = (6.02 x BITS) + 1.76. For the Analog Shield DAC8564 the number is 98db SNR.
Head to Head Testing – Using Red Pitaya and Saleae Logic
All outputs generated by a 1024 sample 16bit Sine wave, generated with a 44.1kHz reconstruction sample rate, triggered by an interrupt timer.
Goldilocks Analogue – 43Hz Sine Wave – Two Channels – One Channel Inverted
Analog Shield – 43Hz Sine Wave – Two Channels – One Channel Inverted
In previous testing on the Goldilocks Analogue prototype I had found that my OpAmp devices were unable to achieve 0V properly. In this new prototype I have produced a stable -1.186V Vss supply for the OpAmp. The signals at 0x000 show that I’ve achieved the required result, with the output being smooth down to the 0x000 level, and up to 0xFFF as well.
Goldilocks Analogue – 0xFFF Output
Goldilocks Analogue – 0x000 Output
The Analog Shield also produces smooth signals, but it does display some compression around 0x0000 levels. Possibly because of some issues with generating the Vss rail for the OpAmps.
Analog Shield – 0xFFFF Output
Analog Shield – 0x0000 Output – Slight Clipping
Looking at the spectra generated by both implementations up to 953Hz it is possible to see harmonics from the Sine Wave, and other low frequency noise.
The spectra are not directly comparable, because the Goldilocks Analogue is producing a 4V full swing, or -4dBm, whilst the Analog Shield is producing a 10V full swing, or 3.7dBm. Distortions in the Analog Shield need to be reduced by 7.7dB to be equivalent to distortion in the Goldilocks Analogue.
The spectrum produced by the Goldilocks Analogue shows most distortion is below -70dB, and that the noise floor lies below -100dB.
Goldilocks Analogue – 43.1Hz Sine Wave – 953Hz Spectrum
The Analog Shield has a significant noise present at 50Hz, which may be caused by noise leakage through the PC USB power supply not being completely filtered before the supply is provided to the DAC8564. Other noise rises above -80dB, and is present right across the spectrum.
Analog Shield – 43.1Hz Sine Wave – 953Hz Spectrum
In the spectra out to 7.6kHz we are looking at the clearly audible range, which is the main use case for the devices.
The Goldilocks Analogue has noise carriers out to around 4.5kHz, but they are all below -80dB. After 4.5kHz the only noise remains below -100dB.
Goldilocks Analogue – 43.1Hz Sine Wave – 7.6kHz Spectrum
The Analog Shield shows noise carriers out to only 2.5kHz, but on one channel these are above -80dB. Otherwise the test shows mainly background noise below -100dB beyond 2.5kHz.
Analog Shield – 43.1Hz Sine Wave – 7.6kHz Spectrum
The spectra out to 61kHz should show a noise carrier generated by the reconstruction frequency of 44.1kHz.
The Goldilocks Analogue shows the spectrum maintains is low noise level below -90dB right out to the end of the audible range, and further out to the reconstruction carrier at 44.1kHz.
Goldilocks Analogue – 43.1Hz Sine Wave – 61kHz Spectrum
Similarly, the Analog Shield is quiet out beyond the audible range. It exhibits a strong noise carrier at the reconstruction frequency. Also, it shows some beat frequencies generated by a small noise carrier at 10kHz, and the reconstruction carrier. These noise carriers might be caused by the TPS61093 boost power supply used to generate the +ve and -ve supplies for the output buffer OpAmps, although it has a characteristic frequency at 1.2MHz, or it might be leakage from some other device.
Analog Shield – 43.1Hz Sine Wave – 61kHz Spectrum – Harmonics around 44.1kHz reconstruction frequency
The final two spectra show the signal out to 976kHz. We’d normally expect to simply see the noise floor, beyond the 44.1kHz reconstruction carrier noise.
The Goldilocks Analogue has a noise carrier at around 210kHz. This could be generated by one or both of the TPS60403 devices used to generate the negative AVss supply. These devices have a typical switching frequency of 250kHz, specified between 150kHz and 300kHz, so this is possible. Aside from the single carrier mentioned, there is no further noise out to 976kHz.
Goldilocks Analogue – 43.1Hz Sine Wave – 976kHz Spectrum
The Analog Shield shows the reconstruction carrier noise at -50dB, and then harmonics of this carrier all the way out to 976kHz. Not sure why these artifacts are appearing. There is a chance that noise derived from these signals is impacting the overall outcome for the DAC8564.
Analog Shield – 43.1Hz Sine Wave – 976kHz Spectrum
Using the Saleae Logic we can capture the SPI transactions generating the analogue result. To maintain the 44.1kHz reconstruction rate a set of samples needs to be transferred every 22.7us.
In the code I’ve used to generate the signal an interrupt timer triggers every 22.7us indicated by the rising edge of “Channel 6”. Once the interrupt has finished processing it lowers the Channel 6 line, indicating that control has returned to the main program. If required the main program has to use the remaining time to generate the required signal. Clearly the faster the SPI transaction to set up the transaction can be completed the more time available for other purposes.
The MCP4822 found in the Goldilocks Analogue has 4 control bits and 12 data bits, which are transferred in 2 8 bit transactions. To set two channels only 4 SPI transactions are required, taking 7.25us.
Goldilocks Analog SPI transaction – Interrupt duration 7.25us
The DAC8564 used in the Analog Shield has 8 control bits and 16 data bits, which are transferred in 3 8 bit transactions, or 24 SPI clock cycles. For two channels this takes 6 SPI transactions and 9.08us.
Both devices leave sufficient time for calculation of simple VCO, or other multiply based, effects in real time with 44.1kHz dual channel. Halving the sample rate to 22.05kHz would be necessary to provide more opportunity to retrieve data from uSD cards or other more complex data sources.
Analog Shield SPI Transaction – Interrupt duration 9.083us
The final reality check in this Head-to-Head comparison is provided by Digikey. I must say that some of the features of the DAC8564 had convinced me to look at migrating the production Goldilocks Analogue to use this new TI DAC. However, seeing that the BOM cost for the TI DAC is approximately 4 times greater than the MCP4822 device soon cooled those thoughts.
Given the cost sensitive nature of the Arduino environment it is not practical to use a device costing nearly US$20 on the Goldilocks Analogue platform, irrespective of its performance.
And, given that the Analog Shield provides no justification in terms of signal quality over the existing Goldilocks Analogue solution there seems to be no technical merit in changing the DAC specification, either.
The Microchip MCP4822 is available from US$4.60 at Digikey.
Digikey MCP4822 Pricing
The Texas Instrument DAC8564 is available from US$18.60 at Digikey.
Digikey DAC8464 Pricing
Design Review
The remaining features and functions of the Goldilocks Analogue have been tested, and resulting from these tests I’m going to make the following redesigns and changes.
Revert to tri-state buffers for uSD I/O logic conversion CMOS 5v to CMOS 3v3 – Bi-directional translators pure fail! I tried both TI TXB and TXS devices and they simply didn’t work as specified.
Remove USART pin-out – unnecessary feature and just takes board space.
Move DAC pin-out to right hand edge of the prototyping space – easier to use if not covered by UNO format shield. As the Goldilocks Analogue is slightly longer than standard Arduino UNO shields, putting the DAC DC output pins outside the shield outline on the right hand end of the board will allow easier access for connections.
Increase prototyping space – fill in space freed by pin-out removals.
Use smaller packages where possible – heading for production.
Increase bypass capacitors on uSD 3V supply – uSD cards consume significant current, potential for instability because of long 3V3 supply
Use a JTAG pin for MCP4822 LDAC – to enable synchronization of the DAC channels.
Well I’ve had my Goldilocks Analogue on my desk now for a few nights, and I’m very impressed that it seems to generally meet the specification that was intended.
The code for setting the DAC levels is currently only optimised for setting two values at a time. Specifically, it is not a streaming function. Never-the-less, it is possible to achieve the stated goal for both DAC channels. The actual number achieved is 108 kSamples/second, shown below, or 18.8us to transmit 2 samples on 2 channels.
I’ve been testing the Goldilocks Analogue DAC stage and found (what I should have known) that I needed an output buffer op-amp able to do negative rail (0V) input and output to support the MCP4822 0v to 4.095V ranging DAC, so I’m going to order some soon.
Unfortunately the OPA4350 (rail to rail high current), at over $10 each, is nearly as expensive as the audiophile OPA4132 I specified previously.
There seems to be a pin compatible alternative, the TS924A, which is about $2 each, but several orders of magnitude worse in performance.
For Example: OPA4350 vs TS924A
Gain Bandwidth Product: 38MHz vs 4MHz
Slew Rate: 22V/μs vs 1.3V/μs
Total Harmonic Distortion: 0.0006% vs 0.005%
Is it worth the difference, when working with a 12 bit DAC in the presence of mV of power supply noise? I’ll revisit this with the next design notes.
How about 5V?
To build an effective DAC I’ve learned that we need to have an effective quiet 5V power supply, with as little noise as possible. To put this into context, the 12 bit MCP4822 generates a 1mV per level difference. I noticed there is quite a few mV of noise on the 5V supply of the Goldilocks Analogue. Not what I was hoping to see, so I’ve gone back to see how that compares with predecessors.
All measurements are taken with a 100MHz Rigol DS1102E, with 150MHz probes grounded onto the USB socket of each board respectively. The probes are directly on the regulator output capacitor, which should be the quietest point on the board, and then directly off Vcc on the MCU for the DIL boards, or a nearby test point for the SMD boards.
First stop is a reference design Arduino Uno R2 board. As a starting point the Uno shows 16mV on the power supply, and 35mV on the MCU. Let’s see how some other boards compare.
Arduino Uno
The Freetronics Eleven board also uses a linear regulator, similar (if not identical to) the Arduino Uno. However the capacitor arrangement must be somewhat more effective because the noise present on the output capacitor is less than 6mV. This is significantly better than on the Arduino Uno, and even measured against the Vcc pin the supply noise remains significantly lower than the Uno.
This Freetronics Eleven board was running at 22.1184MHz rather than the standard Arduino speed, but that doesn’t seem to affect the result.
Freetronics Eleven
Next up is the Seeed Studio ADK boards is an ATmega2560 board running at 16MHz, which also uses a linear regulator. The ADK board exhibits the same quiet supply found in the Freetronics Eleven, and has very similar characteristics on the Vcc test point as well.
Seeed Studio ADK
The Goldilocks v1.1 resulting from the Pozible project is my true point of reference. I hope to be able to do slightly better than this board, by better layout of the same SMPS circuit that is used across all of the Freetronics range of products.
In measurement, the Goldilocks v1.1 sets a very high standard to surpass. Even though there is a 500kHz SMPS running in the supply, the Vcc supply capacitor shows very low noise of only 5.29mV, and the Vcc noise on the test point is only 12.0mV.
Goldilocks V1.1
The Goldilocks Analogue Prototype 1 uses the same SMPS circuit present across the entire range of Freetronics boards, but using an optimised (for this application) layout. Shorter internal tracks should result in slightly faster smoothing, and better regulation, although the effect at normal currents would be minimal.
In comparison with the Goldilocks v1.1, there seems to be slightly less noise at the supply capacitor with less than 5mV showing, although this effect is marginal. The noise on the Vcc line seems to be similar.
Goldilocks Analogue Prototype 1
SMPS
Having looked at the power supplies at rest, and built a baseline of what to expect, here are some screenshots of the Goldilocks SMPS under load. Port D is being switched from 0x00 to 0xFF at 30kHz. The first screenshot shows one of the Port D pins on channel 1, and the Vcc (5V) line at the supply capacitor on channel 2. Note that there is 24mV of noise on the Vcc supply capacitor, even without drawing much current.
This second screenshot shows the same set up, but with 8 LEDs being driven off the Port D. Channel 2 shows the variation in supply capacitor Vcc under the increased load.
So the big question is whether the new layout of the SMPS has achieved anything. Well, based on this simple test, it would seem to have generated a fairly significant improvement.
The below two screenshots shows the trailing edge of the 8 LED power load being switched off. Note that on the supply Vcc there is some oscillation and instability as the SMPS recovers from the load reduction.
This first screenshot shows the Goldilocks V1.1 situation, with a ripple of 832mV peak to peak.
With the Goldilocks Analogue, the peak to peak ripple is reduced to under 600mV, which is a 25% improvement in this situation.
Sine Waves
I’m using my new Red Pitaya to analyse the output of the Goldilocks Analogue output state, with a 43.066Hz Sine wave (1024 samples played out at 44.1kHz) as the input to the DAC.
MCP4822 DAC
This shows the sine wave generated at the output of the low pass filter (23kHz -3dB single pole design). This is almost exactly as we would like to see.
Looking at the lower segment of the sine wave below and the image above, it is almost perfect to 0V and to 4.096V peak to peak.
Looking at this sine wave on the spectrum analyser across the band up to 7.6kHz, we can see the noise floor is 70dB down from the signal exiting the DAC. It seems the DAC performs as advertised.
Looking across the band to 61kHz, we can see the peak at -70dbm from the 44.1kHz sampling noise. Unfortunately, the single pole filter doesn’t do much to suppress this noise source.
To get an idea of whether the 23kHz filter is working as expected, we need to look out to the 976kHz band. The next two images contract the signal before and after the Filter.
The lower image shows that noise peaks out to 200kHz have been properly suppressed, so the filter is working but it is too slow to hit the 41kHz noise.
Op Amps
The Op Amp selection was a mistake. Rather than thinking about the application correctly, I simply sought out the best “musical” Op Amp. The audiophile OPA4132 is absolutely not the right tool for this job. And these images show why.
Driving into a 1kOhm resistance, the output of the OPA4132 is shaved off as the op amp output amplifier saturates as it approaches the 0V rail (or Ground rail). The top of the waveform is OK, as we are still 1V from the Vcc rail. Under higher loading, such as headphones at 32Ohm, the signal is worse, much worse.
Looking at this (initially as an open circuit) view and then with the 1kOhm loading imposed, we can see the difficulty the op amp has with the signal.
The noise peaks (created by the flattening of the sine wave) are at least 10dB greater under a small load. As the load increases the situation gets worse.
To fix this issue, I will need to use a rail to rail high current op amp, and also make sure that I create a small negative rail (-o.23V using a LM7705) to allow the op amp to reach 0V effectively.
Amongst the Burr Brown range the best alternative seems to be the OPA2353, which can support almost 40mA of current, and can effectively drive 75Ohm loads. Unfortunately the OPA2353 (rail to rail high current) costs about $6 each,
There seems to be a pin compatible alternative, the TS922A, which is about $2 each, but several orders of magnitude worse in performance. However, it can supply 80mA and effectively drive 32Ohm headphones.
For Example: OPA2353 vs TS922A
Gain Bandwidth Product: 44MHz vs 4MHz
Slew Rate: 22V/μs vs 1.3V/μs
Total Harmonic Distortion: 0.0006% vs 0.005%
Errata
Reset button can be moved closer to the edge.
AVcc supply capacitors are too close together, and need to have bigger pads.
Use the DAC out header spare pin to pin-out the AVcc, for testing or use on prototyping area.
AVcc inductor is not correct footprint (device much larger).
AVcc ferrite bead is not correct footprint (device much smaller).
Remove analogue potentiometers, unnecessary as DAC is extraordinarily accurate.
Put pin-out on DC output, and headphone socket on AC output 100uF capacitors.
Replace Analogue Op Am with LM7705 and TS922A. This generates 0V to 4.096V required to properly buffer the MCP4822 DAC. LM7705 can’t sustain 80mA, but for small DC signals (up to 20mA) we will have accurate buffering of the DAC, and for headphones, it is an analogue signal anyway.
Replace uSD Op Amps with TXB0104, more suitable for the purpose.
Recently, I’ve used the Gameduino 2 together with the Goldilocks Analogue to implement a multi-oscillator audio synthesizer GUI, using many FTDI EVE GPU co-processor widgets. The use of widgets linked with the integrated touch functionality really simplifies the programming of complex GUIs.
The ability to add a large touch screen, with integrated audio and accelerometer to any Arduino project is a great thing. Previously, you had to move to 32 bit processors with LVDS interfaces to work with LCD screens, but the new FT800 EVE Graphical Processing Unit (GPU) integrates all of the graphic issues and allow you to drive it with a very high level object orientated graphics language. For example it takes just one command to create an entire clock face with hour, minute, and second-hands.
The Gameduino 2, via the FT800 EVE chip, provides the following capabilities:
32-bit internal color precision
OpenGL-style command set
256 KBytes of video RAM
smooth sprite rotate and zoom with bilinear filtering
smooth circle and line drawing in hardware – 16x antialiased
JPEG loading in hardware
audio tones and WAV audio output
built-in rendering of gradients, text, dials, sliders, clocks and buttons
intelligent touch capabilities, where objects can be tagged and recognised.
The FT800 runs the 4.3 inch 480×272 TFT touch panel screen at 60 Hz and drives a mono headphone output.
First off, there’s a demo of some of the capabilities of the Gameduino 2. I’ll come to the drivers later, but the Arduino compatible platform used here is the Goldilocks ATmega1284P from Freetronics. The Goldilocks is in my opinion the best platform to use with the Gameduino 2. Firstly there is the extra RAM and Flash capabilities in line with the ATmega1284p MCU. But also importantly the Goldilocks holds the Pre-R3 Arduino Uno connector standard, with the SPI pins located correctly on Pins 11, 12, and 13. And the INT0 interrupt located on Pin 2. This means that it can be used with the Gameduino 2, out of the box. No hacking required.
The Goldilocks Analogue is an upgraded replacement for the original Goldilocks, with many additional features. Thank you for pledging on the Goldilocks Analogue Kickstarter Project page, which was successfully closed on November 19th 2015, with 124% funding. Now that the Kickstarter pledges have been shipped, the new Goldilocks Analogue is now available on Tindie.
Must be addicted to these touch screens. I’ve just received an Australian designed 4D Systems FT843 Screen. It has possibly an identical screen to the Gameduino 2, but is based on a R3 Arduino shield format (SPI on ICSP) called the ADAM (Arduino Display Adapter Module), which means that it will work on any current Arduino hardware, without hacking. The FT843 ADAM supports a RESET line, which resolves the only problem I’ve noted with the Gameduino 2. Unfortunately, audio is not supported by a 3.5mm jack but rather by a pin-out option. The FT843 uses Swizzle 0, unlike the Gameduino 2 which uses Swizzle 3, and has the Display SPI Select on either D9 or D4 rather than on D8 like the Gameduino 2. Other than these simple configuration options, it similar.
4D Systems FT843 on Goldilocks 1284p
Demo
The screen shows 5 sets of demonstrations. These demos are provided by FTDI, and typically in an Arduino Uno you would have to choose which of the 5 sets you want to see. With the extra capabilities of the Goldilocks, it is possible to load all of them simultaneously in 110kB of flash.
Set 0 focusses on individual commands that are loaded into the Display List. The Display List is essentially a list of commands that is executed or rendered for each frame of display. A Display List will be rendered indefinitely, until it is swapped by another Display List. Two Display Lists are maintained in a double buffering arrangement. One is written, whilst the other is displayed.
Set 1 exhibits some of the co-processor command capabilities, that allow complex objects to be created with only one command. A clock, slider, dial, or a rows of buttons can be created easily in this manner.
Set 2 shows the JPEG image rendering capabilities in RGB and in 8 bit mono.
Set 3 demonstrates custom font capabilities. There are 16 fonts available in the ROM of the FT800 EVE, but you can add your own as is desired.
Set 4 shows some advanced co-processor capabilities, such as touch tag recognition, no touch (zero MCU activity) screensaver, capturing screen sketches, and inbuilt audio options.
The main screen shows an analogue clock that is drawn with one co-processor command. Real time is generated by a 32,768Hz Crystal driving the Goldilocks Timer 2 for a system clock. The accuracy of the clock is limited only by the accuracy of the watch crystal, and I’ve built mine with a 5ppm version, which should be enough to keep within a few seconds per month.
Sample Application
The FTDI provided sample application covers most of the available commands and options for the FT800 EVE GPU.
The FT_SampleApp.h file contains definitions of functions implemented for the main application. These code snippets are not really useful beyond demonstrations of capability of the GPU, but never the less demonstrate how each specific feature of the FT800 EVE GPU can be utilised.
Driver
Because the FT800 EVE GPU has a very capable object orientated graphics language, the FTDI drivers present a very capable high level interface to the user. FTDI have prepared an excellent starting point from which I could easily make customisations suitable for the AVR ATmega Arduino hardware that I prefer to use.
The FTDI driver set is separated into a Command Layer, and into a Hardware Abstraction Layer (HAL). This separation makes it easy to customise for the AVR ATmega platform, but retains the standard FTDI command language for easy implementation of their example applications, and portability of code written for their command language.
To use the FT800 EVE drivers for the Gameduino 2 it is only necessary to include the FT_Platform.h file in the main program. This file contains references to all of the other files needed.
The FT_DataTypes.h file contains FTDI type definitions for the specific data types needed for the FT800 EVE GPU. This is mainly used to abstract the drivers for varying MCU. For the AVR it is not absolutely necessary, but it will help when the code is used on other platforms.
The FT_X11_RGB.h file contains the standard colour set used in X11 colours and on the Web, which are stored PROGMEM. I’ve written a small macro that will insert these into commands needing 24 bit colour settings. These colours will be stored and referenced from PROGMEM when they are called from either of the X11 specific macros defined in FT_Gpu.h If they are not called from the program, they will be discarded by the linker and not waste space in the final linked program.
The FT_Gpu.h file contains all the definitions for command and register setting options. I have significantly rearranged the layout and comments in this file, compared to the FTDI version. Hopefully it is arranged in a way that allows options applying to specific commands and registers to be quickly located.
By writing DL commands to the Display List which are configured by the options in the FT_Gpu.h file it is possible to control most of the low level functions in the FT800 EVE GPU. The Display List is used by the FT800 GPU to render the screen, so it is only the contents of the active Display List that appear on the screen.
In the FT_Gpu_Hal.h file the commands specific to the SPI bus (or the I2C bus if this transfer mechanism is being used) are defined.
I have simplified out some HAL options provided by FTDI for high performance MCU, that might be constrained writing to the SPI bus at only 30MHz, the maximum FT800 SPI bus rate. The Goldilocks SPI bus only runs at 11MHz, and the standard Arduino Uno SPI bus only runs at 8 MHz, so those optimisations don’t help, and they also consume RAM for streaming buffers.
But, I have integrated a multi-byte SPI transfer into the HAL, which don’t use additional RAM buffer space, as they write via a pointer. This is probably the best way to work the SPI bus in the Arduino environment. I have also implemented multi byte SPI transfer directly from the PROGMEM for Strings, and for precomputed commands.
As a preferred option, I’ve implemented PROGMEM storage of Strings for all commands. The commands utilising RAM storage of Strings are retained for compatibility, and to allow computed Strings to be used.
All of the FTDI provided commands now have optional *_P variants which take PROGMEM strings, rather than RAM strings. This saves eleven hundred bytes of RAM used for strings, just in the demonstration programs provided by FTDI and shown in the Demo!
The FT_Hal_Util.h file contains some simple utility macros.
The FT_CoPro_Cmds.h file contains definitions for all of the available co-processor commands. These command are written to the co-processor command buffer, and are used to generate low level commands that appear in the Display List and be rendered for each frame.
Many of the co-processor commands replicate functionality of setting specific registers with options via the Display List GPU commands. This is useful because it is possible to programme the co-processor to implement a task and remain at the object orientated view of the screen, even though the a individual command may be a simple GPU setting that could have been done at Display List command level. Having all the commands available at co-processor level obviates the need to switch between the two “modes” of operation and thought.
I extracted a few of the standard functions that are needed irrespective of the specific application into an API. The FT_API.h file contains these simple command sequences, for booting up the Gameduino 2, and for managing the screen brightness. It also contains precalculated simplified sin, cos, and atan functions useful when drawing circles and clocks.
The API level also contains calls on the Hardware Abstraction Layer that are simply passed through. These calls are flattened by avr-gcc to save digging ourselves into a stack wasting function call hole.
And, of course, everything is integrated into the freeRTOS v8.0.0 port that I support on Sourceforge, AVRfreeRTOS, which gives non-blocking timing, tasks, semaphores, queues, and all aspects of freeRTOS that are so great.
As an example of the power of this combination of freeRTOS and the FT800 object orientated command language we can describe the method used to create an accurate well rendered clock on the Gameduino 2 screen. Using the 3 commands below, we obtain the clock face seen in my demo video main screen.
time(¤tTime); // get a time stamp in current seconds elapsed from Midnight, Jan 1 2000 UTC (the Y2K 'epoch'), as maintained by freeRTOS.
localtime_r(¤tTime, &calendar); // converts the time stamp pointed to by currentTime into broken-down time in a calendar structure, expressed as Local time.
FT_GPU_CoCmd_Clock(phost, FT_DispWidth - (FT_DispHeight/2), FT_DispHeight/2, FT_DispHeight/2 - 20, OPT_3D, calendar.tm_hour, calendar.tm_min, calendar.tm_sec, 0); // draw a clock in 3D rendering.
I’ve updated the clock function to include a touch screen time setting interface. Using the FT800 Touch Tags, and Button generation, this process is really incredibly easy.
Hardware
I’ve taken the liberty of borrowing some of James’ pictures for this story. They can originally be found here.
Note that because of the wrap around connector and cable for the LCD screen, it is not possible to use the Arduino R3 pin out. The SPI bus pins are located at the traditional location on Pin 11 though Pin 13. Unless you want to hack your board, you’re limited to using Arduino Uno style boards.
Unfortunately, the FTDI FT800 Reset pin has not been implemented by the Gameduino 2. Using an ISP to programme the Arduino usually “accidentally” puts the FT800 EVE GPU into an unsupported state. This means that the Gameduino 2 and Arduino usually have to be power-cycled or hard Reset following each programming iteration. It would have been good to tie the FT800 Reset pin to the Arduino Reset pin via a short (ms) delay chip, to obviate the need to remove power to generate the hard Reset for the FT800.
Hello World & other examples
I thought it might be interesting to compare the code required to achieve the demonstration outcomes that James Bowman provides on the Gameduino2 site, with the code required to achieve the same result using freeRTOS and the FTDI style driver. So I’ve implemented three simple examples, “Hello World”, “Sprites”, and “Blobs” from his library.
All of the examples have been built using an Arduino Uno ATmega328p as the MCU hardware platform.
The Hello World application simply initialises the Gameduino2, sets the colour to which the screen shall be cleared, and then writes text with the OPT_CENTER option to center it in the X and Y axis. As there is no delay, this is written as often and as fast as the MCU can repeat the loop.
The same result can be generated in C using freeRTOS and the FTDI Drivers. I have commented extensively within the code below.
/* freeRTOS Scheduler include files. */
/* these four header files encompass the full freeRTOS real-time OS features,
of multiple prioritised tasks each with their own stack space, queues for moving data,
and scheduling tasks, and semaphores for controlling execution flows */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"
/* Gameduino 2 include file. */
#include "FT_Platform.h"
/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost; // optional, just to make it clear where this variable comes from.
// It is automatically included, so this line is actually unnecessary.
/*--------------Function Definitions-------------------*/
int main(void) __attribute__((OS_main)); // optional, just good practice.
// Saves a few bytes of stack because the return from main() is not implemented.
static void TaskWriteLCD(void *pvParameters); // define a single task to write to Gameduino 2 LCD.
// typically multiple concurrent tasks are defined,
// but in this case to replicate the Arduino environment, just one is implemented.
/*-----------------Functions---------------------------*/
/* Main program loop */
int main(void)
{
xTaskCreate( // create a task to write on the Gameduino 2 LCD
TaskWriteLCD
, (const portCHAR *)"WriteLCD"
, 128 // number of bytes for this task stack
, NULL
, 3 // priority of this task (1 is highest priority, 4 lowest).
, NULL );
vTaskStartScheduler(); // now freeRTOS has taken over, and the pre-emptive scheduler is running.
}
/*-----------------------------------------------------------*/
/* Tasks */
/*-----------------------------------------------------------*/
static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
(void) pvParameters;
FT_API_Boot_Config(); // initialise the Gameduino 2.
while(1) // a freeRTOS task should never return
{
FT_API_Write_CoCmd( CMD_DLSTART ); // initialise and start a Display List
// FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0x10, 0x30, 0x00) ); // set the colour to which the screen is cleared (using RGB triplets) as in GD2 library OR
FT_API_Write_CoCmd( CLEAR_COLOR_X11(FORESTGREEN) ); // set the colour to which the screen is cleared (using X11 colour definitions)
FT_API_Write_CoCmd( CLEAR(1,1,1) ); // clear the screen
FT_GPU_CoCmd_Text_P(phost,FT_DispWidth/2, FT_DispHeight/2, 31, OPT_CENTER, PSTR("Hello world"));
// write "Hello World" to X and Y centre of screen using OPT_CENTER with the largest font 31
// The string "Hello world" is stored in PROGMEM
// Functions with *_P all use PROGMEM Strings (and don't consume RAM)
// FT_DispWidth and FT_DispHeight are global variables set to orientate us in a flexible consistent way,
// without hard coding the screen resolution.
FT_API_Write_CoCmd( DISPLAY() ); // close the Display List (DL) opened by CMD_DLSTART()
FT_API_Write_CoCmd( CMD_SWAP ); // swap the active Display List (double buffering), to display the new "Hello World" commands written to the Display List
}
}
The Sprites application is similar to the original one built for the Gameduino, but here each sprite is rotating around a random point. The 2001 random points are stored in a PROGMEM array sprites. This takes 8K of flash. A second PROGMEM array circle holds the 256 XY coordinates to make the sprite move in a circle. The only RAM used is a single byte t used to keep track of the current rotation position, by counting iterations.
The code in freeRTOS is similar. I have commented within the code.
/* freeRTOS Scheduler include files. */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"
/* Gameduino 2 include file. */
#include "FT_Platform.h"
// The include file containing the sprite graphics, and the special command sequence
#include "sprites_assets.h
/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost; // optional, just to make it clear where this variable comes from
/*--------------Function Definitions-------------------*/
int main(void) __attribute__((OS_main)); // optional, just good practice
static void TaskWriteLCD(void *pvParameters); // define a single task to write to Gameduino 2 LCD
/*-----------------Functions---------------------------*/
/* Main program loop */
int main(void)
{
xTaskCreate( // create a task to write on the Gameduino 2 LCD
TaskWriteLCD
, (const portCHAR *)"WriteLCD"
, 128 // number of bytes for this task stack
, NULL
, 3 // priority of task (1 is highest priority, 4 lowest).
, NULL );
vTaskStartScheduler(); // now freeRTOS has taken over, and the pre-emptive scheduler is running
}
/*-----------------------------------------------------------*/
/* Tasks */
/*-----------------------------------------------------------*/
static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
(void) pvParameters;
uint8_t t = 0; // iterate over the code for 255 times, before restarting with 256 sprites where t = 0
FT_API_Boot_Config(); // initialise the Gameduino 2.
FT_GPU_HAL_WrCmdBuf_P(phost, sprites_assets, sizeof(sprites_assets));
// Copy James' magic list of commands into the command buffer.
// These co-processor commands are "compiled" into their 4 byte equivalents, and I haven't decoded them in detail.
// But, since the FT800 is reading the same double word codes, it doesn't really matter how they're generated.
while(1) // a freeRTOS task should never return
{
FT_API_Write_CoCmd( CMD_DLSTART ); // initialise and start a Display List (DL)
FT_API_Write_CoCmd( CLEAR(1,1,1) ); // clear the screen
FT_API_Write_CoCmd( BEGIN(BITMAPS) ); // start to write BITMAPS into the DL
uint8_t j = t;
uint32_t v;
uint32_t r;
int16_t nspr = min(2001, max(256, 19 * t));
ft_prog_uint32_t * pv = sprites; // pv is the sprite BITMAP pointer
for (uint16_t i = 0; i < nspr; ++i) {
v = pgm_read_dword(pv++); // determine which sprite we're controlling
r = pgm_read_dword(circle + j++); // circle is the rotation control
FT_GPU_HAL_WrCmd32(phost, v + r); // the sprite address and the location are written here to the co-processor
}
FT_API_Write_CoCmd( END()); // finish writing BITMAPS into the Display List
FT_API_Write_CoCmd( BEGIN(LINES) ); // start to write LINES into the Display List
FT_API_Write_CoCmd( COLOR_RGB(0x00, 0x00, 0x00) ); // set the line colour to black 0x000000
FT_API_Write_CoCmd( COLOR_A(140) ); // set alpha channel transparency
FT_API_Write_CoCmd( LINE_WIDTH( 28 * 16) );
FT_API_Write_CoCmd( VERTEX2II(240 - 110, 136, 0, 0) ); // start to draw an alpha transparency background line
FT_API_Write_CoCmd( VERTEX2II(240 + 110, 136, 0, 0) ); // finish the line
FT_API_Write_CoCmd( END() ); // finish writing LINES into the Display List
FT_API_Write_CoCmd( RESTORE_CONTEXT() ); // With no prior SAVE_CONTEXT() command, this restores the default colours and values.
FT_GPU_CoCmd_Number(phost, 215, 110, 31, OPT_RIGHTX, nspr); // write a number.
FT_GPU_CoCmd_Text_P(phost, 229, 110, 31, 0, PSTR("sprites")); // write using a PROGMEM stored string function, to save RAM
// phost is a pointer to the context for the Gameduino2.
// Mainly used where there may be multiple screens present, but in this case several state and semaphore items are maintained.
FT_API_Write_CoCmd( DISPLAY() ); // close the active Display List (DL) opened by CMD_DLSTART()
FT_API_Write_CoCmd( CMD_SWAP ); // Do a DL swap to render the just written DL
t++; // t will roll over and will restart the number of sprites to the minimum of 256
}
}
blobs is a sketching demonstration, as you paint on the touch screen a trail of circles follows.
The code keeps a history of the last 128 touch positions, and draws the transparent, randomly coloured circles.
#include
#include
#include
#define NBLOBS 128
#define OFFSCREEN -16384
struct xy {
int x, y;
} blobs[NBLOBS];
void setup()
{
GD.begin();
for (int i = 0; i < NBLOBS; i++) {
blobs[i].x = OFFSCREEN;
blobs[i].y = OFFSCREEN;
}
}
void loop()
{
static byte blob_i;
GD.get_inputs();
if (GD.inputs.x != -32768) {
blobs[blob_i].x = GD.inputs.x << 4;
blobs[blob_i].y = GD.inputs.y << 4;
} else {
blobs[blob_i].x = OFFSCREEN;
blobs[blob_i].y = OFFSCREEN;
}
blob_i = (blob_i + 1) & (NBLOBS - 1);
GD.ClearColorRGB(0xe0e0e0);
GD.Clear();
GD.Begin(POINTS);
for (int i = 0; i < NBLOBS; i++) {
// Blobs fade away and swell as they age
GD.ColorA(i << 1);
GD.PointSize((1024 + 16) - (i << 3));
// Random color for each blob, keyed from (blob_i + i)
uint8_t j = (blob_i + i) & (NBLOBS - 1);
byte r = j * 17;
byte g = j * 23;
byte b = j * 147;
GD.ColorRGB(r, g, b);
// Draw it!
GD.Vertex2f(blobs[j].x, blobs[j].y);
}
GD.swap();
}
The code in freeRTOS is similar, but the touch functionality is derived directly from the FT800 register containing the most recent screen touch location. I have commented within the code.
/* Gameduino 2 include file. */
#include "FT_Platform.h"
#define NBLOBS 128
#define OFFSCREEN -16384
/*———-Global used for HAL management————-*/
extern FT_GPU_HAL_Context_t * phost; // optional, just to make it clear where this comes from
struct xy { // somewhere to store all the blob locations
int16_t x, y;
} blobs[NBLOBS];
/*————–Function Definitions——————-*/
int main(void) __attribute__((OS_main)); // optional, just good practice
static void TaskWriteLCD(void *pvParameters); // define a single task to write to Gameduino 2 LCD
/*—————–Functions—————————*/
/* Main program loop */
int main(void)
{
xTaskCreate( // create a task to write on the Gameduino 2 LCD
TaskWriteLCD
, (const portCHAR *)"WriteLCD"
, 128 // number of bytes for the task stack
, NULL
, 3 // priority of task (1 is highest priority, 4 lowest).
, NULL );
vTaskStartScheduler(); // now freeRTOS has taken over, and the scheduler is running
}
static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
(void) pvParameters;
FT_API_Boot_Config(); // initialise the Gameduino 2.
FT_API_Touch_Config(); // initialise the FT800 Touch capability.
for (uint8_t i = 0; i > 16) & 0xffff) << 4; // read where x axis touch occurred, and scale it
blobs[blob_i].y = (int16_t)(readTouch & 0xffff) << 4; // read where y axis touch occurred, and scale it
} else {
blobs[blob_i].x = OFFSCREEN; // if there was no touch, draw the blob OFFSCREEN
blobs[blob_i].y = OFFSCREEN;
}
blob_i = (blob_i + 1) & (NBLOBS – 1); // increment to the next blob for touch interaction
// this is the display interface stuff
FT_API_Write_CoCmd( CMD_DLSTART ); // initialise and start a display list (DL)
FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0xe0, 0xe0, 0xe0) );// set the colour to which the screen will be cleared
FT_API_Write_CoCmd( CLEAR(1,1,1) ); // clear the screen
FT_API_Write_CoCmd( BEGIN(POINTS) ); // start to write POINTS into the Display List (DL)
for (uint8_t i = 0; i < NBLOBS; ++i)
{
// Blobs fade away and swell as they age
FT_API_Write_CoCmd( COLOR_A(i << 1) ); // set an alpha transparency
FT_API_Write_CoCmd( POINT_SIZE((1024 + 16) – (i << 3)) );
// Random colour for each blob, keyed from (blob_i + i)
uint8_t j = (blob_i + i) & (NBLOBS – 1);
uint8_t r = j * 17;
uint8_t g = j * 23;
uint8_t b = j * 147;
FT_API_Write_CoCmd( COLOR_RGB(r, g, b) );
FT_API_Write_CoCmd( END() ); // finish writing POINTS into the active DL
FT_API_Write_CoCmd( DISPLAY() ); // close the active Display List (DL)
FT_API_Write_CoCmd( CMD_SWAP ); // Do a DL swap to render the just written DL
}
}
I intend to build a few more demonstrations of the code, and to copy some games that James has already implemented, because I'm not a game designer.
Last time I designed a Goldilocks board, it was because I was unhappy about the availability of a development platform that was within my reach; a tool to enable me to continue to learn about coding for micro-controllers.
This Goldilocks, let us call it Goldilocks Analogue, it is not about what I think is necessary, but more about what I’d like to have. The focus is not so much about the basics of SRAM and Flash, but much more on what functions I would like to have, and using my own means to get there.
Also, as the original Goldilocks is sold out, Freetronics are considering making their own version. Please add your wishes here.
Test results are in. Check out the detailed post on Goldilocks Analogue – Testing. Following the testing, I’ve redesigned the analogue output section to make it much more capable. It now support simultaneous AC and DC outputs, with an application specific headphone amplifier device to provide AC output, and high current OpAmp to provide DC output.
The Goldilocks Project was specifically about getting the ATmega1284p MCU onto a format equivalent to the Arduino Uno R3. The main goal was to get more SRAM and Flash memory into the same physical footprint used by traditional Arduino (pre-R3) and latest release Uno R3 shields.
Original – Goldilocks Version 1.1
I also tried to optimally use the co-processor ATmega32U2, (mis)utilised by Arduino purely for the USB-Serial functionality, by breaking out its pins, and creating a cross-connect between the two MCU to enable them to communicate via the SPI bus.
Whilst the Goldilocks achieved what it set out to do, there were some problems it created for itself.
Firstly, the ATmega family of devices is really very bad a generating correct USART baud rates when their main frequency doesn’t match a multiple of the standard USART rates. Engineers in the know select one of these primary clock rates (for example 14.7456MHz, 18.432MHz, or 22.1184MHz) when they’re planning on doing any real Serial communications. Unfortunately, the 16MHz clock rate chosen by the Arduino team generates about the worst USART timing errors possible.
This means that the Arduino devices can only work at 16MHz while programming them with the Serial Bootloader, otherwise programming is bound to fail, due to losing a bit or two due to the clock rate error.
Arduino had serial programming completely solved in the old days by using a real USB-USART chip, the FTDI FT232R, but for some reason they stopped doing the right thing. This might have been the perfect solution, but they abandoned it. Who knows why…
Secondly, although having an integrated uSD card cage on the platform is a great thing, using a resistor chain to do the voltage conversion is nominally a bit problematic. The output pins (SCK, MOSI, CS) are permanently loaded by 3k2 Ohm and an input pin (MISO) high signal generates only 0.66 of Vcc, which only just clears the minimum ATmega signal high level of 0.6 Vcc. Neither of these issues prevent the uSD card from working, and the voltage divider resistor chain takes almost no space on the board. But still it is not perfect.
Thirdly, there are some minor oversights in the V1 build that I would like to correct if possible.
New Directions
I’ve been toying with the idea of building an Xmega board, in Arduino Uno R3 format, because of the significantly enhanced I/O capabilities of this MCU including true DAC capabilities, but I’ve not followed up for two reasons; the Xmega has no history of use by hobbyists as there is with the ATmega devices, and it doesn’t bring any advantage that an ARM MCU wouldn’t otherwise do better and faster.
Never the less, the ATmega platform still lacks one thing that I believe is necessary; a high quality analogue capability. The world is analogue, and having an ADC capability, without having a corresponding DAC capability, is like having a real world recorder with no means to playback these real world recordings.
A major initiative of the Goldilocks is to bring an analogue capability to the Arduino platform. So this device will be called theGoldilocks Analogue.
Updated – Goldilocks Analogue
There have been music shields and audio shields built before, and the design used is closely aligned to the original Adafruit Wave Shield, but I’ve not seen dual high quality DACs with both AC and DC capability, integrated onto the main board of an Arduino previously. So that’s where I’m going.
The goal is to be able to produce a DC referenced signal, from 0Hz up to around 100kHz, that can provide a binary-linear representative voltage (with sufficient current) to enable a control system, as well as to produce the highest quality audio, with very low noise and THD buffer amplifiers, that the basic AVR platform is capable of producing.
Using Eagle
I used to look at Eagle (Kicad, etc) with healthy scepticism. Yeah, not something that I’d be able to learn, but in the process of realising the Goldilocks Analogue, I have learned that it is far easier to learn a new skill than it is to guide someone in India or Malaysia, who doesn’t even get the start of what I want. The old idiom, if you want something done right, you’ve got to do it yourself.
There is a “Fremium” version of Eagle available, which is enough to get started. I’m going to try to get a “Hobbyist” version as soon as the paperwork is through.
So all this below is my first Eagle project.
The Schematic
I’ll talk through each item in the schematic, particularly those things which are novel in the Goldilocks Analogue. The schematics for the Goldilocks V1 can be found in the User Manual.
FT232R
The FT232R is the same device used in countless earlier Arduinos, such as the Duemilanove, and in USB-Serial adapters everywhere. The drivers for all major operating systems are widespread and there is no magic required. Importantly, the FT232R chip generates a real USART baud rate, at any speed from 300 baud to 3 Mbaud.
Unlike in the Duemilanove I’m using the FT232RQ chip, which is in the QFN package. There is too much going on to take up the board space with the larger package.
I’ve added a switch to disable the DTR Reset functionality of the Arduino and Wiring Bootloaders. Often, I would like a running device NOT to be reset by plugging the USB cable, but then I’ll be using the Goldilocks in another thing where I do want this to happen. Having a switch, like Seeed often do, is the best answer.
Also, I’ve added a 6 pin connector replicating the standard FTDI pin-out, to enable the FT232RQ to communicate with other devices, should this be necessary. It would be a shame to lock it into the board, with no option for extension.
uSD Buffer
In designing the buffer for the uSD, I was trying to achieve two things. Firstly, isolate the uSD card entirely from the SPI bus when it was not in use. By isolate, I mean over 1MOhm resistance. This isolation ensures that the uSD card doesn’t load up the SPI pins at all, when the uSD is not being used.
Secondly, I was trying to ensure that each end of the SPI bus receives the correct voltages and currents to ensure maximum throughput.
The two devices selected achieve both goals as desired.
For the MCU to uSD direction (SCK, MOSI, and CS) I’m using a 74LVC125 in quad package. This package is tolerant of inputs at 5V rising above its Vcc of 3V3. The output enable on low, is connected to the Chip Select line, which means that the uSD card will not be driven unless the CS line is low. It always presents a high impedance to the MCU.
As a quad package the 74LVC125 has one spare gate, which can be used to drive the Arduino LED. This is neat no cost result that entirely removes any loading on Arduino Pin13.
For the uSD to MCU direction the buffer has to effectively produce a 5v CMOS high when receiving a 3V3 CMOS high. The best way to do this is to use a device that is TTL signal compatible. The TTL minimum high signal is only 2V, much lower than the CMOS minimum high signal of 2/3 of Vcc, and importantly below the worst case of 2/3 of 3V3 CMOS.
The only device I could find with the required characteristic of accepting TTL inputs with a low output enable, is the MC74VHC1GT125. I’m sure there are other options though.
DAC and Buffer
This is the fun stuff. Analogue… the real world. As noted above, the goal is to produce two binary-linear signals with enough buffering that they can drive a reasonable load (such as small headphones or an audio amplifier) and produce a constant voltage under a number of power supply options.
The inspiration for the circuit came from the Adafruit Wave Shield, but there are a number of significant improvements that are worth noting, not least the use of a dual DAC, for two channels of output.
Firstly, if you want to get a very low noise output, whilst using a high current Switch Mode Power Supply, it is necessary to filter the supply voltage. I’ve utilised the dual steps of an L-C primary filter, followed by a ferrite core bead secondary filter. I’m not sure whether this is all necessary, and I’ll be testing the circuit later with various components removed to check their efficacy in the role, but if they’re not designed in now they never will be added later.
I’m using the Microchip MCP4822 DAC to produce the raw output voltage. This is an SPI device which will be selected using the other “spare” Goldilocks digital pin PB1. Using PB1 to signal the DAC means that none of the Arduino R3 pins are used for on-board Goldilocks functions, and as both CS lines (PB0 and PB1) are tied high they will ensure that all these on-board devices stay off the SPI bus during system reboot.
The MCP4822 takes 16 bits to set a signal level, this is two SPI bus transactions. The maximum SPI rate is SCK/2. Therefore, if my Goldilocks is doing nothing else, it can generate 691,200 SPI transactions per second. If both DACs are being driven we can generate a square wave of 172,800Hz. This is an unreachable figure. More likely, the best case will be around 50kHz for both channels, or 100kHz if only one DAC is being used.
Optionally, the LDAC pin-out can be used to synchronise the transfer of digital inputs to the analogue output buffers across the two DACs or to a specific clock with low jitter.
Unlike the Adafruit solution, the MCP4822 generates its own internal 4.096V reference voltage Vref. This means that irrespective of whether the Goldilocks Analogue is being powered by a battery, by USB, or by the barrel connector and the SMPS, the output voltage for a particular digital input will be constant.
The op-amp configuration with dual op-amps, in a quad package, designed to double the current capability of the output, has raised concern from all who see it. Concern was my initial thought too. However after some research, I found it to be a recommended configuration for current doubling. The only difference to the Adafruit example circuit is to add low value output resistors which allow each op-amp to find its own offset level without consuming excess current.
I have added the option to bridge the output capacitors to provide a DC output. The output capacitors are necessary for audio use, as headphones or audio amplifier inputs require an AC connection, with no DC offset.
The Layout
It takes many hours to layout even a small board the size of an Arduino Uno. Luckily, I had a completed and fully functioning example to use as a platform, thanks to Jon’s prior work on Goldilocks V1.
The final prototype board layout is now done, and the board design sent off for manufacturing.
In this layout, I’ve been able to retain most of what makes a Goldilocks; the ATmega1284p, the complete dual rows of header pins arranged in pin-logical order 0-7, bridging of the I2C pins to A4/A5, JTAG, and a high current power supply. Added to this now are the three items described above; the FT232RQ and Reset switch, buffers for the uSD card, and the analogue platform.
Starting in the bottom left, the SMPS has been relaid to significantly shorten the high current paths around pins 2, 3, and 4. This will reduce the circuit noise, and taken togther with the effort to create solid ground planes, and specific AVcc filtering, will help to ensure the minimum of power supply noise in the analogue platform.
On the right we can see the uSD buffers, which have eaten into the prototyping space significantly. Although the signals will be much nicer than with a resistor bridge, the cost is clearly on space. If the Goldilocks Analogue ever goes into production the SOIC package buffer chip will be replaced by a QFN package, and some space should be recoverable.
Finally, the analogue platform is implemented in the top left of the board, to the left of the pin-outs for the analogue platform and the FTDI interface. Below the pin-outs the analogue supply voltage filtering is implemented, with exception to the chip decoupling capacitors which are tied directly to their supply pins.
Keeping the analogue lines as short, as balanced, as fat, and as well shielded as possible was a key focus of my design. There are a few USART lines running under the chips, but they are unlikely to produce noise as they are under the first ground plane.
The top layer of the board is pretty crowded. Some tricks such as bridging my lines to get a solid ground plan under the crystal, were passed to me.
The Route2 or second layer is the ground plane of the board. As such it needs to provide a stable and solid path for currents to return to the origin. I have been able to provide almost solid copper under the entire area from MCU to power supply, and also from the analogue platform back to the central ground point.
In the Goldilocks Analogue (as in Goldilocks V1) the Route15 layer is wholely at 5v and is a massive supply line. I’ve used this layer to transport the 3v3 supply around the lower edge of the board, to provide power to the uSD card, and its input buffer. The other thick tracks are the USB input line and the analogue AVcc supply line.
On the back of the board, mirrored here, things look as we expect. The previously noted bridge capability for the I2C bus to A4/A5 is there, as is the capability to bridge the DAC A and DAC B output capacitors to enable DC output.
Next Steps
The Goldilocks Analogue prototype board design has been sent to Seeed Studio for conversion into a PCB. While this is happening I’ll be sourcing components to solder to the PCB. I think the next post will be on this stage of the process.
Well I have everything finished and in the interim, until I write a new post, here’s the photos of the final assembly of the prototype at Jon’s SuperHouse.
Here Jon is assembling the first prototype, using several faulty Goldilocks v1.1 devices as donor boards. Only two components didn’t fit correctly, and we didn’t have a uSD card cage so that was left off.
Out of the toaster oven, and final assembly finished. Just checking that the voltages are as expected across the board.
Well I’ve had it on the desk now for two nights, and I’m very impressed that it seems to generally meet the specification that was intended. The code for setting the DAC levels is currently only optimised for setting two values at a time. Specifically, it is not a streaming function. Never-the-less, it is possible to achieve the stated goal for both DAC channels. The actual number achieved is 108 kSamples/second, shown below, or 18.8us to transmit 2 samples on 2 channels.
The trace below shows the signals for both DACs at 0x0000, then both DACs set to 0x0FFF.
Therefore, we’ll be able to achieve the 44.1kHz sample rate for CD audio, but only 12 bit resolution, with some time time to spare. If there is a need to read a uSD card, or do some other processing then it is likely that this rate will be more than halved, as the data would then need to to be read over the SPI bus (the same bus the DAC is using) for example. Also, there is a single pole filter between the DACs and the OpAmp buffer, with a 3dB cut-off frequency of 23kHz, which will limit the maximum output frequency but will help to reduce sampling alias issues.
Looking at the board from the top left the MCP4822 can be seen in the SIOC8 package, with the Burr Brown OPA4132 quad op-amp in a SOIC14 package just near the POWER selection jumper. The FTDI FT232RQ USART in QFN package takes up much less space than its FT232RL peer.
Goldilocks Analogue – Top Left
Now the prototype is finished, it is easy to see what needs to be improved. Actually there’s not too much wrong. The inductors for the Analogue Vcc have the wrong footprint, so they will need to be fixed. The inductor is too large for the footprint and is snuggled up to the POWER jumper, and the ferrite bead is somewhat too small. I didn’t source the very small 15 turn potentiometers, so they are just shorted out. As is the DTR (RESET) disable switch located near the USB connector. As a final issue, the footprint for the 1/8″ jack was wrong for the supplied connectors, so I’ve just added a short set of jumpers to achieve the same outcome.
Goldilocks Analogue – Bottom Right
Here is a short video demonstrating a Voltage Controlled Oscillator running at 44.1kHz sampling into dual channels. It sounds a little odd, because one of the channels is inverted, generating an out of phase effect.
Results
Well, things are good, and bad.
I’ve been testing the DAC stage and found (what I should have known) that I needed an output buffer op-amp able to reach the negative rail (0V) on input and output to support the MCP4822 0v to 4.095V ranging DAC. The OPA4132 exhibits noise and instability issues around 0.3V output.
Unfortunately the OPA4350 (rail to rail high current), which looks like it will be the right pin compatible device, costs over $10 each, which is nearly as expensive as the audiophile OPA4132 I specified previously.
There seems to be a pin compatible alternative, the TS924A, which is about $2 each, but it is several orders of magnitude worse in performance.
For Example: OPA4350 vs TS924A
Gain Bandwidth Product: 38MHz vs 4MHz
Slew Rate: 22V/μs vs 1.3V/μs
Total Harmonic Distortion: 0.0006% vs 0.005%
Is it worth the difference, when working with a 12 bit DAC in the presence of mV of power supply noise?
Personally, I doubt it.
Using my new Red Pitaya to analyse the output, with a 43.066Hz Sine wave (1024 samples at 44.1kHz) the noise floor is 70dB down from the signal ex DAC. It seems the DAC performs as advertised.
43.066Hz 12bit Sine wave, 1024 samples output at 44.1kHz.
Always looking for the latest and greatest code for AVR, I scan the debian Sid repositories every few months for updated packages to use. Recently debian Sid included the latest gcc-avr 4.8 package, binutils-avr and the new avrdude 6.0.1 package. So I had to install them all to test.
Unfortunately, there is a new device format used in the avrdude.conf file which breaks the standard Eclipse AVR Plugin, rendering the MCU choice ineffective.
Asking a question on avrfreaks.net found the answer. keestux has fixed the problem and has released an interim AVR Plugin 2.4.1 which incorporates a fix for the change in avrdude.conf file format.
Since my last post on the ArduSat and the idea I had to use the Supervisor node, an ATmega2561, as the core of a centralised eXtended RAM system for the Client nodes, ATmega328p “Arduino” devices, I’ve been thinking and working on a solution for building a centralised non-volatile SD Card based storage solution.
With design, sometimes it is necessary to let an idea stew for a while before the right answer just sort of distils out of the soup. For the solution for this problem, this was the case. There was some thinking space required…
The Question
There are 16 Client nodes in the ArduSat platform. Each and any of them may wish to use the central SD Card to store information at the same, or at different times. How would it be possible to allow more than 16 files to be open on the one SD Card (connected to the Supervisor node) whilst maintaining consistency in the file system? How would access to the file system be scheduled?
The Tools
I have been using the ChaN FatFs file system libraries now for some time. They are fully featured and have a very clean design, fully separating the file system layer from the underlying physical media access layer (the drivers). This means that the file system tools can be implemented on many different architectures, with only changes to the driver layer (DiskIO) needed for each platform.
The Thought Process
My initial thought was that the Supervisor node should maintain the file system, and that I should write packaging for the FatFs file system commands to allow them to be remotely implemented across the SPI bus, in a similar manner as described in the XRAMFS post.
The idea of writing these “remote controls” for the file system commands was scary, as I recognised that there are 33 commands in the interface, and each of them has their own characteristics. Also, maintaining these interfaces would likely be problematic, as I would have to test each command extensively to ensure that there were no “thick thumb” errors introduced into the stable and proven FatFs library.
Some weeks passed…
Then at about 3am, I realised that the right answer was to write a “shim” between the standard FatF file system commands and the standard physical media drivers, and to have this shim operate across the SPI bus in exactly the same manner as the XRAMFS solution.
So, I wrote it.
The Solution
The solution separates the ChaN libraries into two parts. The file system part is resident on the Client node. Each Client node maintains its own view of the file system on the Supervisor SD Card. As the ChaN FatFs library is written for low memory devices, the file directory tree is refreshed each time a change in the working file is done. The Supervisor node only does the DiskIO under the command of each of the Clients.
There are only 5 relevant driver layer DiskIO commands. These commands are used in the Supervisor node to execute requests sent over the SPI bus from the individual Clients. Since there are only a small number of commands, and they are static and dependent on the architecture of the machine they’re running on, their functionality is quite constant. The Supervisor has no knowledge of the file system at all. It simply implements DiskIO commands on sectors of the SD Card as requested, one a time, as requested by Clients.
The Supervisor implementation simply expands on the existing Task loop established for the XRAMFS system, by adding in the 5 additional DiskIO commands. The added complexity, that the SD Card is accessed over the SAME SPI bus as the communications between Client and Supervisor, means that I had to introduce an interim “Pending” state for commands to allow the Client to wait for confirmation that a task has been completed or, in the case of disk_read or disk_ioctl, to recover the waiting data from the Supervisor.
The Client implementation inserts different shim DiskIO commands for the FatF system to call. These commands use the SPI bus to call the Supervisor, and enter a request. Some commands return immediately, allowing the Supervisor to continue with the command, once the command and any required data has been transferred. Other commands wait until they can retrieve information from the Supervisor, before returning to the FatF file system layer of the library.
In this solution, the XRAMFS was instrumental in simplifying the transfer of information. The exclusive availability of 16kB of RAM for each Client meant that disk_write or disk_read commands could cache their data in XRAMFS whilst it was actually written to or read from the SDCard. Because the RAM is available exclusively, there is no consideration that another Client may overwrite the results of a command, or that memory exhaustion may corrupt data.
When a Client program calls one of the FatFs library commands, it in turn calls one of the special ArduSat SPI DiskIO shim routines. These routines signal the Supervisor in the normal manner, and transfer any data associated with the command into the Page of XRAMFS assigned to the Client.
The Supervisor will then undertake the standard DiskIO command, retaining the result of the command and any data resulting from the command in XRAMFS.
Both Client DiskIO routines, and the Task running in the Supervisor are aware of the “Pending” state, which is where a DiskIO command has been completed on the Supervisor and there is data waiting in the XRAMFS for the Client to recover.
Once the Client DiskIO command completes, it returns the normal interface information to the calling FatFs command.
Here a monitor program on a Client is initialising the SD Card. If the Supervisor notices that the SD Card is not initialised, it will return Error, and then undertake to initialise the card. The second call for initialisation will then be successful. This decoupling method ensures that Clients cannot reinitialise the card, whilst other Clients may be using the Card.
The file system (on the Client) is then initialised Then, the SD Card status is read. Finally, the current working directory is read and printed.
In this screenshot, a file is opened for reading, and the file pointer set to the start of the file. A dump of the first 64 Bytes of the file is read and printed. Then the file is closed.
Here, the same file as above is opened for writing, and 45 bytes of 0x10 (16) are being written. The result is checked by opening the file for reading, and dumping the relevant bytes to the screen. Success!
Issues
The Client (Arduino) ATmega328p has so little Flash and RAM that implementing the FatFs consumes a significant proportion of the available resources. From the ChaN FatFs web site, at least 13 kByte of Flash (of 32 kByte on the Arduino), and 600 Bytes of RAM (of 2048 Bytes on the Arduino) are consumed by the library alone. This is excluding the working buffers necessary to prepare or process data for storage.
I was unable to fully test the FatFs solution, because of RAM and Flash limitations. I simply couldn’t turn on all the features. However, I have some confidence that the solution fully works, because the actual FatFs library is unchanged from the working solution that I’ve tested on the Arduino Mega platform. It is only the DiskIO routines that have been tampered with, and since they produce reliable results for some of the FatFs functions, there is every reason to believe they would work for all of the functions.
ArduSat and NanoSatisfi for running a great project, which inspired this thought process. Possibly, this work might be useful for one of the launches over the coming years.
This proposal is to implement an Arduino clone using the ATmega1284p MCU, as replacement for the normal ATmega328p MCU, bringing significant improvements and longevity into the existing Arduino platform.
The current Arduino Uno and Leonardo devices cater for many applications, but they are becoming limited for some modern applications, such as Ethernet networking, SD Card storage, and USB based systems. The limitation in SRAM in the 328p and 32u4 is the most apparent issue, and this is the most difficult to supplement with external components. The Uno R3 platform is too small for demanding applications, and therefore not the right solution.
Arduino Mega devices are available which provide more RAM (but still less than Goldilocks) and many more interface pins, but unfortunately many of the standard Arduino Shields will not work with the Mega, unless you are prepared to hack them. The Mega platform is too big for the standard Shields, and therefore not the right solution.
Arduino and others are moving towards 32 bit MCU devices, including ARM Cortex based platforms such as the Arduino Due, which brings significantly more resource into play, but these platforms will require a major re-education of users, and may actually fragment the Arduino user group. Also, these ARM processor based devices must be operated at 3.3V and can only supply 4mA per I/O, which makes them incompatible with many of the existing Arduino Shields.
I believe the Goldilocks solution is to use the Arduino Uno / Leonardo R3 physical format, for 100% Arduino Shield compatibility at 5V and with the standard pin layout. But using the ATmega1284p processor to provide significant improvements in RAM, FLASH, EEPROM, interfaces, and other factors.
I’ve been working with Arduino devices now for some years, and have found that my interest remains in fully understanding the way the “bare metal” processor is working. The Arduino platform gives me that opportunity. https://sourceforge.net/projects/avrfreertos/
Whilst many argue that the days of 8 bit processors are numbered, and that devices such as the Raspberry Pi are the future, I would say that there remains a need for very simple, but very capable platforms, such as the “Goldilocks” platform proposed here. Raspberry Pi and others are essentially Linux machines, and are addressing different needs to this platform.
The 1284p MCU has already been used by the RepRap project as a platform in their minimalist Arduino platform, as a result of them experiencing similar resource limitation issues. The RepRap 1284p platform maintains Arduino code compatibility (boot-loader, board descriptions, avrdude) but abandons physical compatibility, as theirs is a special purpose application and has no need to support Arduino Shields.
Similarly Pololu use the 1284p in their Orangutan SVP platform, but again theirs is a special application, which incorporates many robotics interfaces, and deviates from both the Arduino physical and software platform. Pololu also implement an on-board ISP, which removes the need for the Arduino serial bootloader freeing more space for program code.
This is the final Goldilocks v1.1 board, that was prepared for Pozible Supporters.
Here are some screenshots of the prototype board design.
Differences between 328p and 1284p
The ATmega1284p has a number of significant differences from the 328p that make it a great MCU for the Arduino platform. Some are listed below, in no particular order.
16kByte SRAM = 8x Uno SRAM
The 1284p has 8x more SRAM than the 328p, and also has double the SRAM of the 2560. There is no other AVR ATmega MCU with this much SRAM.
For Ethernet, video, and USB applications where large frame buffers need to be maintained or manipulated, the flexibility of having 16kByte of SRAM will change the kind of applications that can be implemented.
4x Uno Flash & 2x Uno EEPROM
The 1284p has 4x more Flash and 2x more EEPROM than the 328p and therefore can store larger programs and non-volatile data.
2x Programmable USART
The 1284p has 2 programmable USARTs. These appear on Digital pin 0,1 and 2,3. This allows users to maintain the serial monitor connection with the Arduino IDE, whilst addressing another application, such as a GPS device. For new users and experienced alike having two serial interfaces will be a big improvement.
Independent Analogue Platform (separate I2C bus pins)
Using the SMD package for the 1284p allows the board layout to implement a fully independent analogue platform. This is because the I2C bus pins are on a separate port to the ADC pins, and the ADC pins have no “alternate function” except for the PCINT function. Also separate AVCC and GND pins allow the analogue PA Port to be powered and grounded separately from the digital section of the MCU.
Timer 3 (Extra 16bit timer)
The 1284p has an extra 16bit timer, Timer 3, that is not present on any other ATmega MCU. Timer 3 does not have PWM outputs (unlike Timer 0, Timer 1, and Timer 2), and therefore is free to use as a powerful internal Tick counter, for example in a RTOS. freeRTOS has already been modified to utilise this Timer 3.Timer 3 (Extra 16bit timer)
Timer 2 (Real Time Clock Oscillator)
The 1284p has a 32.768 kHz capable timer, that can be fitted with an accurate watch crystal to enable real time keeping. Use of the avr-libc time.h functionality (present only in upstream release currently) allows an efficient SystemTick to match with advance time and date functions.
JTAG Interface
The 1284p implements a JTAG functionality which will allow advanced developers the option to debug their code.
Better PWM access
The 1284p brings additional 8bit Timer 2 PWM outputs onto PD, which creates the option for 2 additional PWM options on this port. It also removes the sharing of the important 16bit PWM pins with the SPI interface, by moving them to PD4 & PD5, thus simplifying interface assignments.
Extra I/O pins (e.g. for internal SS pins)
The 1284p has additional digital I/O pins on the PB port. These pins could be utilised for on-board Slave Select pins (for example), without stealing on-header digital pins and freeing the Arduino Pin 10 for Shield SPI SS use exclusively.
Design Improvements on “Uno”
Add through-holes for all I/O
The existing Freetronics 2011 has space for prototyping, but doesn’t make any allowance for connecting pins to the prototyping space. Trying to solder jumpers between the I/O pins on the board backside is not very pretty, and also not robust for permanent prototyping. The Arduino Uno or Leonardo doesn’t have any prototyping space at all.
Suggest to include a row of through-holes inside the each of the pin headers to allow a header or jumper to be soldered to the I/O lead effectively.
Align the rows of through-holes to the 1/10” pitch, and to prototyping area pin pitch, to allow “Goldilocks” to have header pins soldered on the bottom, and be inserted into a standard breadboard.
Replicate SPI and I2C to through-holes (with additional 1284p Alternate SS I/O).
The SPI and I2C interfaces are used for many daughter card options (from Sparkfun for example). Some examples include RTC, acceleration & magnetic sensors.
These mini-cards need access to the SPI or I2C interfaces which are shared with I/O pins. Bringing these SPI and I2C pins with Vcc and GND onto through-holes in the standard order (of Sparkfun cards, for example) at the left and right ends (respectively) of the prototyping area would simplify prototyping with these interfaces.
This is now implemented on the Arduino Uno Rev 3. as additional pins for I2C.
But, the additional through-holes remain valuable for the prototyping area.
Add JTAG Interface
Adding a standard JTAG interface at the edge of the card would allow in circuit debugging to be implemented. Whilst there may not be sufficient space to implement a standard JTAG connector, there would be space to bring the JTAG pins onto through-holes for headers.
Add a Micro SD Cage
There is space to add the long term storage capability brought by an integral Micro SD Card cage. Many projects require logging of sensor data, or capturing or playback of information, and the Micro SD card format is the easiest way to get data onto and off of any format of PC or Smartphone. The ATmega1284p also has plenty of SRAM to allow large buffers for reading and writing to the Micro SD card, so it makes sense to include it as a most needed option.
Link ATmega32u2 and ATmega1284p SPI interfaces
The USB-serial interface on the Uno is implemented by a ATmega16u2 device, but its SPI bus is only connected to the SPI programming header, and the SS pin is not even brought out. The Goldilocks will allow the ATmega32u2 MOSI, MISO, CLK, pins to to be easily bridged (solder pads on rear of board), and bring the SS pins of both ATmega devices to a patch pad. This will allow the two devices to work in concert for demanding multi-processing applications, involving USB and other peripherals.
For some applications digital noise and voltage droop (when using servo PWM), can have a significant impact on the accuracy of ADC conversions. Using the SMD 1284p it is possible to completely electrically isolate the digital Vcc and analogue AVcc and GND planes, as well as isolating the ADC converter within the MCU. A separate rectifier, or low pass filter could be used to provide AVcc.
The option to bridge the 1284p I2C pins on SDA and SCL with A4 and A5, where needed for compatibility with Arduino Shields, should be maintained through the use of solder bridges.
Move Reset to edge
It is more common to need to use the Reset button with a shield in place, and if the Reset button is placed close to the edge (even vertically mounted, like the Seeed ADK main board), it can still be reached with a fingernail. Arduino Uno R3 implements this by moving Reset to the upper edge, near the USB connector.
Clock at 20MHz (or 22.1184MHz)
There is little reason to continue to run the MCU at 16MHZ, and given the MCU is specified to 20MHZ, being able to do 5 things, where previously we could only do 4, seems like a worthwhile improvement. Also, the use of a through hole precision crystal (not a SMD resonator) allows the use of after-market timing choices, eg 22.1184MHz for more accurate UART timings.
Design
This is the Goldilocks v1.0 prototype.
This is a proposal to map the ATmega1284p pins to the Arduino physical platform.
Arduino
UNO R3
328p Feature
328p Pin
1284p Pin
1284p Feature
Comment
Analog 0
PC0
PA0
Analog 1
PC1
PA1
Analog 2
PC2
PA2
Analog 3
PC3
PA3
Analog 4
SDA
PC4
PA4
PC1 I2C -> Bridged
Analog 5
SCL
PC5
PA5
PC0 I2C -> Bridged
Reset
Reset
PC6
RESET
Separate Pin
Digital 0
RX
PD0
PDO
RX0
Digital 1
TX
PD1
PD1
TX0
Digital 2
INT0
PD2
PD2
INT0 / RX1
Xtra USART1
Digital 3
INT1 / PWM2
PD3
PD3
INT1 / TX1
Xtra USART1
Digital 4
PD4
PD4
PWM1
16bit PWM
Digital 5
PWM0
PD5
PD5
PWM1
16bit PWM
Digital 6
PWM0
PD6
PD6
PWM2
Digital 7
PD7
PD7
PWM2
Digital 8
PB0
PB2
INT2
Xtra External Interrupt -> ATmega32u2 x-pad
Digital 9
PWM1
PB1
PB3
PWM0
Digital 10
SS / PWM1
PB2
PB4
SS / PWM0
SPI -> ATmega32u2 x-pad
Digital 11
MOSI / PWM2
PB3
PB5
MOSI
SPI
Digital 12
MISO
PB4
PB6
MISO
SPI
Digital 13
SCK
PB5
PB7
SCK
SPI
(Digital 14)
PB0
-> SDCard SPI SS
(Digital 15)
PB1
-> SDCard Card Sense
SCL
PC0
SCL
I2C – Separate
SDA
PC1
SDA
I2C – Separate
PC2
TCK
JTAG
PC3
TMS
JTAG
PC4
TDO
JTAG
PC5
TDI
JTAG
XTAL1
PB6
PC6
TOSC1
Unused
XTAL2
PB7
PC7
TOSC2
Unused
(Analog 6)
PA6
Unused -> Pad / Hole
(Analog 7)
PA7
Unused -> Pad / Hole
Here’s a picture of one of the two prototype boards, where I have added some additional items. I have changed the 1284p crystal to 22.1184MHz, and added a 5ppm 32kHz watch crystal for testing the avr-libc provided time.h functions.
Also I added header sockets for the 32u2 so that I could test its ability to program the 1284p using the LUFA AVRISP code, and determine how much decoupling should be added to the SPI bus link option. This is to ensure that even if the SPI bus is linked between the two MCU, that the 32u2 can’t lock up the SPI bus for devices trying to talk to the 1284p.
Also, I’ve added bridges for the SCL/SDA pins to the A4/A5 pins for old format (pre R3) shields.
I’ve prepared a preliminary distribution of the entire code set for goldilocks, including the lufa 130313 code used in the 32u2 and the stk500v2 bootloader used in the 1284p. This code is laid out in the arduino manner, with the directories matching the usual layout of Arduino boards.
The production board design was finalised on June 28th and sent for manufacturing. The v1.1 production boards are shown below. Pick and place pictures soon.
Goldilocks v1.1 PCB Front
Goldilocks v1.1 PCB Back
Firmware
Moved support to the Arduino IDE Boards Manager.
Recompiled bootloader binaries with avr-gcc 4.9.2.
Returned to the original 2013 pins_arduino.h pin numbering, now that analogRead() issues are corrected.
It is not every day that I get to tell the family I’m doing “rocket science”, but I hope over the past few days, it can be an exception. Space, the final frontier. In this case, it was a lack of space and the frontier it creates, that got me thinking.
I immediately thought that it would be great to build a distributed cache RAM system to support each of the ATmega328p “Arduino” Client nodes, using the XRAM capabilities of the ATmega2561 Supervisor node. So, I did.
Using this prototype system, each Arduino Client node now has sole access to 32kByte of XRAMFS in addition to their 2kByte of internal RAM.
Initial performance measured is 422kByte/s throughput for the swap function. In other words, half of the entire Arduino RAM can be swapped with the contents of XRAMFS in just 4.74ms.
In working with the Arduino hardware I’ve found that the severe limitation in RAM space causes constraints on what can be done. For example, Ethernet, USB and other modern applications need kBytes of buffer to be used effectively, and the ATmega328p used as the Arduino Uno platform supports a total of only 2kB RAM.
Using the Arduino Mega (or Android ADK hardware) has been the saviour of that situation for me, offering an identical environment, but 8kByte of RAM as a playground. And, most importantly, the ability to directly connect 0 wait-state external SRAM.
This XRAM capability of the ATmega2560 and ATmega2561 has been exploited by Rugged Circuits in their QuadRam module, which offers 512kByte of SRAM in one small package.
Therefore, using common off the shelf technology, I had the materials available to test the theory that building a XRAMFS system, to support the ArduSat platform, would work.
This allows each ArduSat Client to store 16 TIMES more data than it can currently access, and have access to that data at a relatively high speed from a medium not subject to wear (such as for example an SD card).
Ingredients & Build
This section looks at the ingredients and how to construct the prototype.
The ArduSat Supervisor node is based on the ATmega2561 MCU, because it is significantly smaller than the ATmega2560 MCU used in the Arduino Mega platform. The only difference between the two chips is that the ATmega2561 doesn’t provide as many Ports, and has only 64 Pins versus 100 Pins on the ATmega2560.
For this prototyping, the ATmega2560 is necessary, because I elected to use pin change interrupts as part of the bus protocol. Also, the Arduino Mega platform is readily available. I don’t even know where I’d go to get a ATmega2561 board…
The use of rainbow hook-up wire was essential for the success of the prototype.
Client Node – Arduino Uno / Freetronics Eleven
The ArduSat Client node is designed to be identical to the Arduino Uno platform, to ensure that it is absolutely easy for people to test code they intend to run in space. Therefore a variety of Arduino Uno devices are being used (basically, whatever I had around).
XRAM Module – Rugged Circuits QuadRAM
I’ve implemented using the Rugged Circuits QuadRAM and the MegaRAMpreviously. These modules slip over the end of the Arduino Mega platform, instantly enabling either 512kByte or 128kByte of zero wait state SRAM, mapped to the system address space. They also conveniently bring out the SPI interface onto through-hole for pins.
Something about the ability to create 16x 32kByte XRAM pages, linked with 16x Client nodes, seemed like synchronicity.
Layout
The prototype platform is designed to be the classic multi-slave SPI bus layout. This design is demonstrated in the AVR151 document and, in excerpt, is produced below.
Because of my decision to use the Pin Change Interrupts as part of the bus protocol, The Supervisor node (SPI Master) would use the Port K and Port J pins to fill the role of individual Slave Select (SS) pins. The Client nodes would each use their normal SS pin (PB2) to connect to the Supervisor.
In designing for 16x Client nodes, there is a limitation on Port J in that the good folks at Arduino determined not to break out all of the pins which, together with sharing PCINT8 with the Rx0 pin, significantly limits the number of Clients feasible on the prototype platform.
In practice, 8 Client nodes attached to all the pins on Pork K is the simple alternative. As luck (or good planning) would have it, those pins are all brought out onto one connector on the Arduino Mega platform, as evidenced by these pictures.
Amongst friends, a direct connection of the SPI SCK, MISO, and MOSI lines to all Clients is optimal. But in a shared environment, it would make sense to use FET bus isolation to keep Clients from physically attaching to the SPI bus until their SS line is held low by the Supervisor. A gram of hardware prevention can cure a tonne of software ill, as a “rogue” Client could otherwise potentially lock up the SPI bus for all, and the guys in the ISS won’t be happy if asked to hit the reset button.
Bus Protocol
Hey! – Yeah What? – This! – OK
That’s the protocol. Works in the home. Works in the office. Works the world over.
Information to this Saleae Logic chart below in Client Implementation section.
Hey!
The Supervisor node holds all the PCINT pins high. If a Client wants to initiate a Read/Write/Swap transaction, it will pull its SS line low for 30µs. This needs to be long enough for the Supervisor to register an interrupt and process it. If multiple Clients call out simultaneously, no problem, the Supervisor will grab all of the requests and push them onto a queue of requests to serve.
Yeah What?
At the next opportunity, the Supervisor serving task will pop a request off the queue, and identify which Client made the request. It will also check if there were other simultaneous requests, and push them back to the front of the queue.
The Supervisor then pulls the relevant Client SS line low. The Client has been listening for this, and at this point it enables its Slave interface to the SPI bus, and the two swap acknowledgements. When the Supervisor receives the ACK code, it knows the Client is ready, so it requests a command.
This!
When the Client (SPI Slave) has received the Supervisor ACK code, it prepares a command, and is prepared to either read, write or swap XRAMFS data under the command of the Supervisor (SPI Master).
The command set implemented by this protocol can be easily extended to include accessing other shared resources connected to the Supervisor node. This could include analogue sensors, SDCARD mass storage (though using the SPI bus would offer a degree of complexity), or serial interfaced devices.
OK
At the end of one command, with the data transaction complete, a final byte is exchanged to ensure that the Client has remained in sync with the Supervisor, and the SPI bus is released by the Client. It is important the Client stays off the SPI bus. The Supervisor then processes the next Yeah What? request.
Supervisor Implementation – freeRTOS
The Supervisor is implemented as a freeRTOS task, using standard SPI bus libraries contained in my code base. These libraries (now that this project has worked them over) are about as optimised as is possible to write in C, and achieve a good throughput over the SPI bus.
There are two (or one) PCINT based Interrupt that reads the PCINT pins and pushes the raw pin state onto a queue. This process traps multiple simultaneous requests, overcoming any interrupt masking or race conditions. Currently 30µs are allowed for the interrupts to execute. 10µs has been tested, but depending on how long the Supervisor stays in “Critical” state (interrupts off) processing other (non XRAMFS) tasks this time can be adjusted.
From idle, the Supervisor takes only 90µs to 0.1ms to pop a request from the queue and action it. Under load, it could take as long as 64ms to action a request. As soon as the pin state is collected it is processed to identify which SS line triggered the call, and therefore which bank of XRAM should be enabled. Also, at this time I check that no additional requests are pending from the same pin state. If so, the remaining pin state is pushed back on the queue to get next time round.
The exchange of acknowledgements ensures that both sides are speaking SPI, and are set to proceed.
The command contains the action (read / write / swap / test), the address of the XRAMFS block, the size of the XRAMFS block, and a CRC byte.
The bus transaction speed is dependent on the SPI Master SCK clock divisor. Optimally, a SPI Slave can receive data at 1/4th of its system clock. Currently, it is set to one 1/8th, therefore theoretical performance is double that of the logic capture above.
Initially, I determined to calculate a CRC byte to store along with the data, but the calculation time is large compared to the transaction time, and therefore too costly to implement at the protocol level. The application should utilise the CRC when it recovers data to confirm that the data is intact, and not irradiated.
Also, error checking following the transfer could be implemented. But at this stage I think it is better to have the Client do all sanity and error checking of its own data.
Client Implementation – freeRTOS or Arduino IDE
The Client is implemented in freeRTOS as a simple library function, that is passed a command structure, and a pointer to local RAM to be Read/Write/Swap. Some details below.
typedef enum { Huh = 0, // Client didn't issue us a command, so just break.
Read = 1, // read from XRAMFS
Write = 2, // write to XRAMFS
Swap = 3, // read from both XRAMFS & local RAM, and swap
Test = 4 // do something else, to be determined
} RAMFSCommand; // from point of view of the client (Arduino 328p)
typedef struct /* structure to hold the RAMFS info */
{ RAMFSCommand ram_cmd; // Read / Write / Swap / Test
size_t ram_addr; // Address of first byte of RAM in a RAMFS (greater than RAM_START_ADDR)
uint16_t ram_size; // Size of RAM block in RAMFS (less than RAM_COUNT or 32kByte)
uint8_t ram_crc8; // Calculated CRC of stored data
} xRAMFSarray, * pRAMFSarray;
uint8_t ramfs_transfer_block( pRAMFSarray pRAMFS_block, uint8_t *data );
I used C and the freeRTOS platform because it is easiest for my environment, and I know it best. But, I’ll re-write it as a library in the Arduino IDE environment as needed. It won’t be too hard.
The client can use the XRAMFS malloc function to manage RAM allocation. A very simple malloc has been built, which can’t free XRAMFS. But, it can be simply ignored if desired and the command structure can be filled manually.
Initially, I implemented an interrupt driven semaphore system to manage the Yeah What? part of the bus protocol, but typically the Supervisor responds so quickly that the time to do several context swaps generated by the interrupt exceeded the time the Supervisor was prepared to wait. A simple wait loop keeps the Client on ready standby for 90µs so it can complete the transaction in the shortest time.
The Client code has no knowledge of where its XRAM is located on the Supervisor. Therefore the code is orthogonal and constant, irrespective which Client being used. This is a very useful feature where the author may not know in advance which ArduSat Client his code will be running upon.
Client application code should be written to make use of the Swap XRAMFS <-> RAM capability. This makes best use of the SPI bus features to combine Read and Write into one transaction, effectively doubling throughput over the Write plus Read combination.
The user interface (monitor) is just for initial testing. I’ll have to write a load generation rig to find out what this baby can do, but that can wait for the next post. The logic analyser has captured the result of the > r (read) command in the below command line sequence. We can see the 20µs (now 30µs) Hey! on the Slave Select, 90µs pass before the acknowledgement bytes are swapped (only one cycle needed), 6 bytes of command structure are passed (Read command is 0x01), and then the data is read out of XRAMFS to the Client.
Design Notes
The basis of every design: detailed functional specifications, hardware design, and user interface documentation. Oh, and scribbles much.
Updates
I’ve updated the code on 22 February to remove some oversights in the Client main program, and added the OK check byte to the protocol. Code as usual on AVRfreeRTOS on Sourceforge.
Updated on 23 February to include some error checking on Supervisor side (preventing malicious Client requests), and on Client side preventing hang if the Supervisor is AWOL. Also removed the aggressive SPI timing utilising receive double buffering, as it often caused errors, and had no performance effect.
Initial performance measured is about 422kByte/s throughput for the swap function. Specifically 4.73825ms is needed for a complete 2048Byte data payload transaction (including sync, command, & OK timing). This also includes freeRTOS task swapping, as the Supervisor task is run with interrupts enabled in normal mode.
Have fixed some code issues on 4 March, mainly around a few µs delays required to let things run their course.
Now the platform is running stable with 4x Clients. A video is here
This post is a bit of a mixed bag, describing some software and hardware integration, together with some raving about a great tool I’ve been using. So, let’s get started.
Platform
Some time ago I got a Freetronics EtherMega, which is essentially an Arduino Mega2560 with an integrated Wiznet W5100 Ethernet interface, and a MicroSD card cage. I’ve introduced this product and the use of freeRTOS here.
One great thing about the ATmega2560 used in the EtherMega, and the Arduino Mega2560, is the availability of an external memory bus. I’ve been using a Rugged Circuits QuadRAM, and now have ordered three more of their MegaRAM devices, and intend to make the ATmega2560 my standard platform. Why three? Well everyone knows that good things come in threes.
I’m actually preferring the Rugged Circuits MegaRAM which has only 128kBytes of RAM, so it won’t be as flexible for bank switching as its big brother. Also its chip select line is reversed (note to self to fix this in the driver). But, simply having 64kBytes of normal extended RAM plus another 56kBytes of special purpose (bank switched) RAM seems like it wil be sufficient for the duration. I’ve bought a couple to go on some Android ADK devices, that I’ll write about soon.
Recently, I also acquired a Freetronics 16×2 LCD-keypad-shield to use as a drop-on display for debugging and status, and anything really. It works really nicely and with its single pin switch analogue interface (which will be useful for navigation). Unfortunately there is a conflict between the SD Card device select pin on the EtherMega (Arduino pin D4) and one of the data pins on the 16×2 LCD.
My rectification of this pin usage conflict can be seen on the pictures below, where the yellow wire joins Pin D4 to Pin D2. What can’t be seen is that the leg of Pin D4 has been cut off, so it doesn’t insert into the EtherMega, so there is no elecrical connection between the D4 pin on the 16×2 LCD, and the D4 Pin used on the EtherMega.
Tools
Recently I purchased a Saleae Logic to use in developing. I have got to say that this is probably the best $149 that I have spent on any tool, ever. Having the ability to capture long periods (minutes) of data, with 24MHz resolution, and zoom, shrink, drag, flick around in it, and also compare many windows of alternative samples is just so great. It saves so much time being able to simply “see” what the device is actually doing on the SPI, I2C and serial ports, simultaneously, is well great. But I already said that.
My main work was to put the existing ChaN’s FatFs Generic FAT File System Module v0.9 into my existing ATmega2560 freeRTOS system, using my existing libraries, and generally being fully integrated into the system, as a plaform for some further work.
This was fairly time consuming (until I got my Saleae Logic), because the SPI bus transfers required to initiate and drive a SD card are complex, and depend on which version of SD card (MMC, SD, HDSD) is being used.
Now that everything is working, I’ve also done some SPI optimisation, to speed up multi-byte SPI bus transfers used for reading and writing to the SD card.
In testing with a Freetronics EtherMega driving an 4GByte HDSD card the system achieved the following results.
Byte transfer cycle time: MOSI 3.750us, MISO 3.6250us
Multibyte transfer cycle time: MOSI 1.3333us, MISO 1.3750us
Gross Performance increase: MOSI 2.8x, MISO 2.64x
Measured performance for a multi-MegaByte file copy is about 140kBytes/s which includes both read and write operations to the same SD card.
Also, as I had purchased a 16×2 LCD display and I wanted to implement a flexible solution for display, I also ported the Control Module for HD44780 Character LCD into my system.
This was pretty straightforward, once I’d recognised the pin confict issue between the two Freetronics devices, and perhaps the most interesting things to say are:
Using a macro to control the pin assignment means that it is very easy to change the pins used for any display type. Simply renumber them in the macro and it is done.
Also, using the standard avr-libc stdio utility vsprintf formatting allows me to choose how much library I want to bring in. The standard library doesn’t support float formatting, but with a simple link switch either a simpler (smaller) or more fully featured (larger) standard library can be included. I also use the standard avr-libc tools for the serial port, so there is no additional overhead specifically for the LCD.
Wiznet W5100
Now I’ve finished the W5100 drivers from Wiznet, incorporating their new v1.6 changes (because they screwed up the silicon ARP state machine). And also, a fix for a subtle bug caused
by writing to the W5100 Tx buffer before it was finished with a previous transmission. This was fixed by checking the Tx read pointer and comparing it with the Tx write pointer. When the chip is idle, they are the same. That took me 3 weeks to isolate.
Now the fun part starts, which will be to re-learn the IP protocol suites, through re-implementing some of the standard network tools, like HTTP, FTP, NTP, DHCP, DNS, that we just take for granted. DHCP is done. Ping is done. HTTP is done.
I’ve been spending some time integrating the Rugged Circuits QuadRAM extension for the Arduino Mega (or Freetronics EtherMega) into the freeRTOS environment. And, it is now my standard environment. Actually, the MegaRAM, a slightly cheaper 128kByte version is my standard, as I’ve not found an application yet that needs more than 64kBytes total RAM. But, that will happen.
Without adding any complexity into the environment, I can address up to 56kBytes of heap variables, effectively leaving the entire 8kbyte of internal RAM for the stack. With no complexity or overhead.
In addition, with some simple commands within a task can implement bank switching to access up to 7 additional banks of RAM, each up to 56kBytes.
A further degree of integration into freeRTOS is to completely automate memory bank switching, to give each of either 7 or 15 tasks a bank of RAM for its exclusive use. But this is a goal for the next few months.
I’m very happy with my Saleae Logic too. Must write a review on this, one day. Great tool, more useful every day.
The described code is all available on Sourceforge, if you’re intending to try this at home.
The only finished example using the memory routines is the MegaSD project. Take a look at it on Sourceforge to see how to use the extra RAM.
HOW TO
What do we have to do to get this build working? Well it is pretty simple really, once everything is figured out. It is really only three steps.
Initialise the RAM, and tell the AVR that it should enable its extended RAM bus.
Tell the compiler that you’re moving the heap into the new extended RAM address space.
Tell freeRTOS to move its heap to the new extended RAM address space.
Initialise the RAM
The RAM should be initialised prior to building the standard C environment that comes along for the ride. It can be done in the .init1 (using assembler) or in .init3 in C. I built both methods, but elected to just use the C method, as it is more maintainable (legible).
There are a number of references for this code. Some of the older ones refer incorrectly to a MCUCR register. That is not correct for the ATmega2560.
This example covers what Rugged Circuits suggest for their testing of the QuadRAM, but it doesn’t put the initialisation into .init3, which is needed to make the initialisation before heap is assigned. It makes the initialisation carefree.
// put this C code into .init3 (assembler could go into .init1)
void extRAMinit (void) __attribute__ ((used, naked, section (".init3")));
void extRAMinit (void)
{
// Bits PL7, PL6, PL5 select the bank
// We're assuming with this initialisation we want to have 8 banks of 56kByte (+ 8kByte inbuilt unbanked)
DDRL |= 0xE0;
PORTL &= ~0xE0; // Select bank 0
// PD7 is RAM chip enable, active low. Enable it now.
DDRD |= _BV(PD7);
PORTD &= ~_BV(PD7);
// Enable XMEM interface
XMCRA = _BV(SRE); // Set the SRE bit, to enable the interface
XMCRB = 0x00;
To ensure that this .init3 function, that I’ve put into lib_ext_ram, is included in your linked code, we need to call something from the lib_ext_ram library. If you’re planning to use the banks of RAM, then this is easy as you’ll naturally be calling the bank switching functions.
However, if you only want to use the extra 56kByte of RAM for simplicity (it is after all 7x more than you have available with just the internal RAM), then just call this function once from somewhere, possibly main(). I have added it to the freeRTOS stack initialisation function in port.c, so I don’t need to see it ever again.
extRAMcheck();
It returns the XMCRA value, that can be tested if you desire. But there’s no need as things will anyway have gone badly pear shaped if the RAM is not properly enabled. Calling this once is all that is needed to ensure that the .init3 code is properly inserted into the linked code.
Note: that the above code is specific to the QuadRAM device. The MegaRAM device has different IO in use, and the differences are noted in my code on Sourceforge.
Move the heap
The standard C heap has to be moved to the new location above the stack. There are other memory allocation options, but in my opinion this is the most sensible one and the only one I’m planning to implement.
The __heap_start and __heap_end symbols describe the addresses occupied by the extended RAM, and inform malloc() of the location of the heap. This is described in more detail here http://www.nongnu.org/avr-libc/user-manual/malloc.html. This is a great diagram showing the situation.
Now freeRTOS has to be made aware of these changes to the heap location. There are three heap management options available for the AVR port. The two most memory economical options use a fixed array of memory defined in the .data section on initialisation. Clearly, this is not going to be useful. For the third option heap_3.c, which uses malloc(), we have nothing more to do.
However, getting heap_1.c and/or heap_2.c to work is not that complicated either. There are three parts to this. Firstly, creating a new section name, and locating it at the start of the desired heap space. We’ve already done that, above, with the –section-start command. The forth option heap_4.c has also been implemented.
Secondly, we have to make a small modification to both heap_1.c heap_2.c and heap_4.c to inform the compiler that the freeRTOS heap will be located at this .ext_ram_heap location. That is done in this manner (heap_2.c shown).
static union xRTOS_HEAP
{
//... edited ...
unsigned char ucHeap[ configTOTAL_HEAP_SIZE ];
//... edited ...
#if defined(portEXT_RAM)
} xHeap __attribute__((section(".ext_ram_heap"))); // Added this section to get heap to go to the ext memory.
#else
} xHeap;
#endif
And finally, now we’ve (probably, just because we can) allocated a large (up to 32kByte maximum) freeRTOS heap, we need to ensure that the loader omits this section from its preparations for writing the .hex file to flash (in a similar manner to the way the .eeprom section is removed).
Something to watch for is that none of your other code is calling malloc(), because if it does its memory allocations will collide with the freeRTOS heap. Either check that malloc() is not being linked in or, for the paranoid, just assign the heap_1.c heap_2.c or heap_4.c heap to a region separate to your new malloc() heap addresses.
And that’s all there is to getting an easy 512kByte of fast no-wait-state RAM on your Freetronics EtherMega or Arduino Mega2560. Enjoy!