Gameduino 2 with Goldilocks and EVE

My Gameduino 2 was delivered just a few weeks ago, and I’ve spent too much time with it already. It is the latest Kickstarter project by James Bowman. James has written a Gameduino 2 Book too.

The ability to add a large touch screen, with integrated audio and accelerometer to any Arduino project is a great thing. Previously, you had to move to 32 bit processors with LVDS interfaces to work with LCD screens, but the new FT800 EVE Graphical Processing Unit (GPU) integrates all of the graphic issues and allow you to drive it with a very high level object orientated graphics language. For example it takes just one command to create an entire clock face with hour, minute, and second-hands.

The Gameduino 2, via the FT800 EVE chip, provides the following capabilities:

  • 32-bit internal color precision
  • OpenGL-style command set
  • 256 KBytes of video RAM
  • smooth sprite rotate and zoom with bilinear filtering
  • smooth circle and line drawing in hardware – 16x antialiased
  • JPEG loading in hardware
  • audio tones and WAV audio output
  • built-in rendering of gradients, text, dials, sliders, clocks and buttons
  • intelligent touch capabilities, where objects can be tagged and recognised.

The FT800 runs the 4.3 inch 480×272 TFT touch panel screen at 60 Hz and drives a mono headphone output.

EVE Block Diagram

First off, there’s a demo of some of the capabilities of the Gameduino 2. I’ll come to the drivers later, but the Arduino compatible platform used here is the Goldilocks ATmega1284P from Freetronics. The Goldilocks is in my opinion the best platform to use with the Gameduino 2. Firstly there is the extra RAM and Flash capabilities in line with the ATmega1284p MCU. But also importantly the Goldilocks holds the Pre-R3 Arduino Uno connector standard, with the SPI pins located correctly on Pins 11, 12, and 13. And the INT0 interrupt located on Pin 2. This means that it can be used with the Gameduino 2, out of the box. No hacking required.

Must be addicted to these touch screens. I’ve just received an Australian designed 4D Systems FT843 Screen. It has possibly an identical screen to the Gameduino 2, but is based on a R3 Arduino shield format (SPI on ICSP) called the ADAM (Arduino Display Adapter Module), which means that it will work on any current Arduino hardware, without hacking. The FT843 ADAM supports a RESET line, which resolves the only problem I’ve noted with the Gameduino 2. Unfortunately, audio is not supported by a 3.5mm jack but rather by a pin-out option. The FT843 uses Swizzle 0, unlike the Gameduino 2 which uses Swizzle 3, and has the Display SPI Select on either D9 or D4 rather than on D8 like the Gameduino 2. Other than these simple configuration options, it similar.

4D Systems FT843 on Goldilocks 1284p

4D Systems FT843 on Goldilocks 1284p

Demo

The screen shows 5 sets of demonstrations. These demos are provided by FTDI, and typically in an Arduino Uno you would have to choose which of the 5 sets you want to see. With the extra capabilities of the Goldilocks, it is possible to load all of them simultaneously in 110kB of flash.

Set 0 focusses on individual commands that are loaded into the Display List. The Display List is essentially a list of commands that is executed or rendered for each frame of display. A Display List will be rendered indefinitely, until it is swapped by another Display List. Two Display Lists are maintained in a double buffering arrangement. One is written, whilst the other is displayed.

Set 1 exhibits some of the co-processor command capabilities, that allow complex objects to be created with only one command. A clock, slider, dial, or a rows of buttons can be created easily in this manner.

Set 2 shows the JPEG image rendering capabilities in RGB and in 8 bit mono.

Set 3 demonstrates custom font capabilities. There are 16 fonts available in the ROM of the FT800 EVE, but you can add your own as is desired.

Set 4 shows some advanced co-processor capabilities, such as touch tag recognition, no touch (zero MCU activity) screensaver, capturing screen sketches, and inbuilt audio options.

The main screen shows an analogue clock that is drawn with one co-processor command. Real time is generated by a 32,768Hz Crystal driving the Goldilocks Timer 2 for a system clock. The accuracy of the clock is limited only by the accuracy of the watch crystal, and I’ve built mine with a 5ppm version, which should be enough to keep within a few seconds per month.

Sample Application

The FTDI provided sample application covers most of the available commands and options for the FT800 EVE GPU.

The FT_SampleApp.h file contains definitions of functions implemented for the main application. These code snippets are not really useful beyond demonstrations of capability of the GPU, but never the less demonstrate how each specific feature of the FT800 EVE GPU can be utilised.

Driver

Because the FT800 EVE GPU has a very capable object orientated graphics language, the FTDI drivers present a very capable high level interface to the user. FTDI have prepared an excellent starting point from which I could easily make customisations suitable for the AVR ATmega Arduino hardware that I prefer to use.

The FTDI driver set is separated into a Command Layer, and into a Hardware Abstraction Layer (HAL). This separation makes it easy to customise for the AVR ATmega platform, but retains the standard FTDI command language for easy implementation of their example applications, and portability of code written for their command language.

To use the FT800 EVE drivers for the Gameduino 2 it is only necessary to include the FT_Platform.h file in the main program. This file contains references to all of the other files needed.

#include "../lib_ft800/FT_DataTypes.h"
#include "../lib_ft800/FT_X11_RGB.h"
#include "../lib_ft800/FT_Gpu.h"
#include "../lib_ft800/FT_Gpu_Hal.h"
#include "../lib_ft800/FT_Hal_Utils.h"
#include "../lib_ft800/FT_CoPro_Cmds.h"
#include "../lib_ft800/FT_API.h"

The FT_DataTypes.h file contains FTDI type definitions for the specific data types needed for the FT800 EVE GPU. This is mainly used to abstract the drivers for varying MCU. For the AVR it is not absolutely necessary, but it will help when the code is used on other platforms.

The FT_X11_RGB.h file contains the standard colour set used in X11 colours and on the Web, which are stored PROGMEM. I’ve written a small macro that will insert these into commands needing 24 bit colour settings. These colours will be stored and referenced from PROGMEM when they are called from either of the X11 specific macros defined in FT_Gpu.h If they are not called from the program, they will be discarded by the linker and not waste space in the final linked program.

X11 Colours

The FT_Gpu.h file contains all the definitions for command and register setting options. I have significantly rearranged the layout and comments in this file, compared to the FTDI version. Hopefully it is arranged in a way that allows options applying to specific commands and registers to be quickly located.

By writing DL commands to the Display List which are configured by the options in the FT_Gpu.h file it is possible to control most of the low level functions in the FT800 EVE GPU. The Display List is used by the FT800 GPU to render the screen, so it is only the contents of the active Display List that appear on the screen.

In the FT_Gpu_Hal.h file the commands specific to the SPI bus (or the I2C bus if this transfer mechanism is being used) are defined.

I have simplified out some HAL options provided by FTDI for high performance MCU, that might be constrained writing to the SPI bus at only 30MHz, the maximum FT800 SPI bus rate. The Goldilocks SPI bus only runs at 11MHz, and the standard Arduino Uno SPI bus only runs at 8 MHz, so those optimisations don’t help, and they also consume RAM for streaming buffers.

But, I have integrated a multi-byte SPI transfer into the HAL, which don’t use additional RAM buffer space, as they write via a pointer. This is probably the best way to work the SPI bus in the Arduino environment. I have also implemented multi byte SPI transfer directly from the PROGMEM for Strings, and for precomputed commands.

As a preferred option, I’ve implemented PROGMEM storage of Strings for all commands. The commands utilising RAM storage of Strings are retained for compatibility, and to allow computed Strings to be used.

All of the FTDI provided commands now have optional *_P variants which take PROGMEM strings, rather than RAM strings. This saves eleven hundred bytes of RAM used for strings, just in the demonstration programs provided by FTDI and shown in the Demo!

The FT_Hal_Util.h file contains some simple utility macros.

The FT_CoPro_Cmds.h file contains definitions for all of the available co-processor commands. These command are written to the co-processor command buffer, and are used to generate low level commands that appear in the Display List and be rendered for each frame.

Many of the co-processor commands replicate functionality of setting specific registers with options via the Display List GPU commands. This is useful because it is possible to programme the co-processor to implement a task and remain at the object orientated view of the screen, even though the a individual command may be a simple GPU setting that could have been done at Display List command level. Having all the commands available at co-processor level obviates the need to switch between the two “modes” of operation and thought.

I extracted a few of the standard functions that are needed irrespective of the specific application into an API. The FT_API.h file contains these simple command sequences, for booting up the Gameduino 2, and for managing the screen brightness. It also contains precalculated simplified sin, cos, and atan functions useful when drawing circles and clocks.

The API level also contains calls on the Hardware Abstraction Layer that are simply passed through. These calls are flattened by avr-gcc to save digging ourselves into a stack wasting function call hole.

And, of course, everything is integrated into the freeRTOS v8.0.0 port that I support on Sourceforge, AVRfreeRTOS, which gives non-blocking timing, tasks, semaphores, queues, and all aspects of freeRTOS that are so great.

As an example of the power of this combination of freeRTOS and the FT800 object orientated command language we can describe the method used to create an accurate well rendered clock on the Gameduino 2 screen. Using the 3 commands below, we obtain the clock face seen in my demo video main screen.

time(&currentTime); // get a time stamp in current seconds elapsed from Midnight, Jan 1 2000 UTC (the Y2K 'epoch'), as maintained by freeRTOS.
localtime_r(&currentTime, &calendar); // converts the time stamp pointed to by currentTime into broken-down time in a calendar structure, expressed as Local time.
FT_GPU_CoCmd_Clock(phost, FT_DispWidth - (FT_DispHeight/2), FT_DispHeight/2, FT_DispHeight/2 - 20, OPT_3D, calendar.tm_hour, calendar.tm_min, calendar.tm_sec, 0); // draw a clock in 3D rendering.

I’ve updated the clock function to include a touch screen time setting interface. Using the FT800 Touch Tags, and Button generation, this process is really incredibly easy.

Hardware

I’ve taken the liberty of borrowing some of James’ pictures for this story. They can originally be found here.

Gameduino 2 Pinout

Note that because of the wrap around connector and cable for the LCD screen, it is not possible to use the Arduino R3 pin out. The SPI bus pins are located at the traditional location on Pin 11 though Pin 13. Unless you want to hack your board, you’re limited to using Arduino Uno style boards.

Gameduino 2 Shield

Unfortunately, the FTDI FT800 Reset pin has not been implemented by the Gameduino 2. Using an ISP to programme the Arduino usually “accidentally” puts the FT800 EVE GPU into an unsupported state. This means that the Gameduino 2 and Arduino usually have to be power-cycled or hard Reset following each programming iteration. It would have been good to tie the FT800 Reset pin to the Arduino Reset pin via a short (ms) delay chip, to obviate the need to remove power to generate the hard Reset for the FT800.

Hello World & other examples

I thought it might be interesting to compare the code required to achieve the demonstration outcomes that James Bowman provides on the Gameduino2 site, with the code required to achieve the same result using freeRTOS and the FTDI style driver. So I’ve implemented three simple examples, “Hello World”, “Sprites”, and “Blobs” from his library.

All of the examples have been built using an Arduino Uno ATmega328p as the MCU hardware platform.

helloworld

The Hello World application simply initialises the Gameduino2, sets the colour to which the screen shall be cleared, and then writes text with the OPT_CENTER option to center it in the X and Y axis. As there is no delay, this is written as often and as fast as the MCU can repeat the loop.

#include <SPI.h>
#include <GD2.h>

void setup()
{
  GD.begin();
}

void loop()
{
  GD.ClearColorRGB(0x103000);
  GD.Clear();
  GD.cmd_text(240, 136, 31, OPT_CENTER, "Hello world");
  GD.swap();
}

The same result can be generated in C using freeRTOS and the FTDI Drivers. I have commented extensively within the code below.

/* freeRTOS Scheduler include files. */
/* these four header files encompass the full freeRTOS real-time OS features,
   of multiple prioritised tasks each with their own stack space, queues for moving data,
   and scheduling tasks, and semaphores for controlling execution flows */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this variable comes from.
                                               // It is automatically included, so this line is actually unnecessary.

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice.
                                               // Saves a few bytes of stack because the return from main() is not implemented.

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD.
                                               // typically multiple concurrent tasks are defined,
                                               // but in this case to replicate the Arduino environment, just one is implemented.

/*-----------------Functions---------------------------*/
/* Main program loop */
int main(void)
{
  xTaskCreate(            // create a task to write on the Gameduino 2 LCD
       TaskWriteLCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                // number of bytes for this task stack
    ,  NULL
    ,  3		  // priority of this task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();  // now freeRTOS has taken over, and the pre-emptive scheduler is running.
}
/*-----------------------------------------------------------*/
/* Tasks                                                     */
/*-----------------------------------------------------------*/

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
  (void) pvParameters;

  FT_API_Boot_Config();  // initialise the Gameduino 2.

  while(1)               // a freeRTOS task should never return
  {
    FT_API_Write_CoCmd( CMD_DLSTART );                       // initialise and start a Display List
//  FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0x10, 0x30, 0x00) ); // set the colour to which the screen is cleared (using RGB triplets) as in GD2 library OR
    FT_API_Write_CoCmd( CLEAR_COLOR_X11(FORESTGREEN) );      // set the colour to which the screen is cleared (using X11 colour definitions)
    FT_API_Write_CoCmd( CLEAR(1,1,1) );                      // clear the screen

    FT_GPU_CoCmd_Text_P(phost,FT_DispWidth/2, FT_DispHeight/2, 31, OPT_CENTER, PSTR("Hello world"));
      // write "Hello World" to X and Y centre of screen using OPT_CENTER  with the largest font 31
      // The string "Hello world" is stored in PROGMEM
      // Functions with *_P all use PROGMEM Strings (and don't consume RAM)
      // FT_DispWidth and FT_DispHeight are global variables set to orientate us in a flexible consistent way,
      // without hard coding the screen resolution.

    FT_API_Write_CoCmd( DISPLAY() );                         // close the Display List (DL) opened by CMD_DLSTART()
    FT_API_Write_CoCmd( CMD_SWAP );                          // swap the active Display List (double buffering), to display the new "Hello World" commands written to the Display List
  }
}

sprites

The Sprites application is similar to the original one built for the Gameduino, but here each sprite is rotating around a random point. The 2001 random points are stored in a PROGMEM array sprites. This takes 8K of flash. A second PROGMEM array circle holds the 256 XY coordinates to make the sprite move in a circle. The only RAM used is a single byte t used to keep track of the current rotation position, by counting iterations.

#include <EEPROM.h>
#include <SPI.h>
#include <GD2.h>

#include "sprites_assets.h"

void setup()
{
  GD.begin();
  GD.copy(sprites_assets, sizeof(sprites_assets));
}

static byte t;

void loop()
{
  GD.Clear();
  GD.Begin(BITMAPS);
  byte j = t;
  uint32_t v, r;

  int nspr = min(2001, max(256, 19 * t));

  PROGMEM prog_uint32_t *pv = sprites;
  for (int i = 0; i &lt; nspr; i++) {
    v = pgm_read_dword(pv++);
    r = pgm_read_dword(circle + j++);
    GD.cmd32(v + r);
  }

  GD.ColorRGB(0x000000);
  GD.ColorA(140);
  GD.LineWidth(28 * 16);
  GD.Begin(LINES);
  GD.Vertex2ii(240 - 110, 136, 0, 0);
  GD.Vertex2ii(240 + 110, 136, 0, 0);

  GD.RestoreContext();

  GD.cmd_number(215, 110, 31, OPT_RIGHTX, nspr);
  GD.cmd_text( 229, 110, 31, 0, "sprites");

  GD.swap();
  t++;
}

The code in freeRTOS is similar. I have commented within the code.

/* freeRTOS Scheduler include files. */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

// The include file containing the sprite graphics, and the special command sequence
#include "sprites_assets.h

/*------Global used for HAL context management---------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this variable comes from

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD

/*-----------------Functions---------------------------*/

/* Main program loop */
int main(void)
{
  xTaskCreate(             // create a task to write on the Gameduino 2 LCD
    TaskWriteLCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                 // number of bytes for this task stack
    ,  NULL
    ,  3                   // priority of task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();   // now freeRTOS has taken over, and the pre-emptive scheduler is running
}
/*-----------------------------------------------------------*/
/* Tasks                                                     */
/*-----------------------------------------------------------*/

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
  (void) pvParameters;

  uint8_t t = 0;         // iterate over the code for 255 times, before restarting with 256 sprites where t = 0

  FT_API_Boot_Config();  // initialise the Gameduino 2.
  FT_GPU_HAL_WrCmdBuf_P(phost, sprites_assets, sizeof(sprites_assets));
    // Copy James' magic list of commands into the command buffer.
    // These co-processor commands are "compiled" into their 4 byte equivalents, and I haven't decoded them in detail.
    // But, since the FT800 is reading the same double word codes, it doesn't really matter how they're generated.

  while(1)               // a freeRTOS task should never return
  {
    FT_API_Write_CoCmd( CMD_DLSTART );       // initialise and start a Display List (DL)
    FT_API_Write_CoCmd( CLEAR(1,1,1) );      // clear the screen

    FT_API_Write_CoCmd( BEGIN(BITMAPS) );    // start to write BITMAPS into the DL
    uint8_t j = t;
    uint32_t v;
    uint32_t r;
    int16_t nspr = min(2001, max(256, 19 * t));
    ft_prog_uint32_t * pv = sprites;         //  pv is the sprite BITMAP pointer

    for (uint16_t i = 0; i < nspr; ++i) {
      v = pgm_read_dword(pv++);              // determine which sprite we're controlling
      r = pgm_read_dword(circle + j++);      // circle is the rotation control
      FT_GPU_HAL_WrCmd32(phost, v + r);      // the sprite address and the location are written here to the co-processor
    }
    FT_API_Write_CoCmd( END());              // finish writing BITMAPS into the Display List

    FT_API_Write_CoCmd( BEGIN(LINES) );      // start to write LINES into the Display List
    FT_API_Write_CoCmd( COLOR_RGB(0x00, 0x00, 0x00) );  // set the line colour to black 0x000000
    FT_API_Write_CoCmd( COLOR_A(140) );                 // set alpha channel transparency
    FT_API_Write_CoCmd( LINE_WIDTH( 28 * 16) );
    FT_API_Write_CoCmd( VERTEX2II(240 - 110, 136, 0, 0) );  // start to draw an alpha transparency background line
    FT_API_Write_CoCmd( VERTEX2II(240 + 110, 136, 0, 0) );  // finish the line
    FT_API_Write_CoCmd( END() );             // finish writing LINES into the Display List

    FT_API_Write_CoCmd( RESTORE_CONTEXT() ); // With no prior SAVE_CONTEXT() command, this restores the default colours and values.

    FT_GPU_CoCmd_Number(phost, 215, 110, 31, OPT_RIGHTX, nspr);    // write a number.
    FT_GPU_CoCmd_Text_P(phost, 229, 110, 31, 0, PSTR("sprites"));  // write using a PROGMEM stored string function, to save RAM
      //  phost is a pointer to the context for the Gameduino2.
      //  Mainly used where there may be multiple screens present, but in this case several state and semaphore items are maintained.

    FT_API_Write_CoCmd( DISPLAY() );          // close the active Display List (DL) opened by CMD_DLSTART()
    FT_API_Write_CoCmd( CMD_SWAP );           // Do a DL swap to render the just written DL

    t++;    // t will roll over and will restart the number of sprites to the minimum of 256
  }
}

main2

blobs is a sketching demonstration, as you paint on the touch screen a trail of circles follows.
The code keeps a history of the last 128 touch positions, and draws the transparent, randomly coloured circles.

#include <EEPROM.h>
#include <SPI.h>
#include <GD2.h>

#define NBLOBS      128
#define OFFSCREEN   -16384

struct xy {
  int x, y;
} blobs[NBLOBS];

void setup()
{
  GD.begin();

  for (int i = 0; i < NBLOBS; i++) {
    blobs[i].x = OFFSCREEN;
    blobs[i].y = OFFSCREEN;
  }
}

void loop()
{
  static byte blob_i;
  GD.get_inputs();
  if (GD.inputs.x != -32768) {
    blobs[blob_i].x = GD.inputs.x << 4;
    blobs[blob_i].y = GD.inputs.y << 4;
  } else {
    blobs[blob_i].x = OFFSCREEN;
    blobs[blob_i].y = OFFSCREEN;
  }
  blob_i = (blob_i + 1) & (NBLOBS - 1);

  GD.ClearColorRGB(0xe0e0e0);
  GD.Clear();

  GD.Begin(POINTS);
  for (int i = 0; i < NBLOBS; i++) {
    // Blobs fade away and swell as they age
    GD.ColorA(i << 1);
    GD.PointSize((1024 + 16) - (i << 3));

    // Random color for each blob, keyed from (blob_i + i)
    uint8_t j = (blob_i + i) & (NBLOBS - 1);
    byte r = j * 17;
    byte g = j * 23;
    byte b = j * 147;
    GD.ColorRGB(r, g, b);

    // Draw it!
    GD.Vertex2f(blobs[j].x, blobs[j].y);
  }
  GD.swap();
}

The code in freeRTOS is similar, but the touch functionality is derived directly from the FT800 register containing the most recent screen touch location. I have commented within the code.

/* freeRTOS Scheduler include files. */
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "semphr.h"

/* Gameduino 2 include file. */
#include "FT_Platform.h"

#define NBLOBS       128
#define OFFSCREEN   -16384

/*----------Global used for HAL management-------------*/
extern FT_GPU_HAL_Context_t * phost;           // optional, just to make it clear where this comes from

struct xy {										// somewhere to store all the blob locations
  int16_t x, y;
} blobs[NBLOBS];

/*--------------Function Definitions-------------------*/

int main(void) __attribute__((OS_main));       // optional, just good practice

static void TaskWriteLCD(void *pvParameters);  // define a single task to write to Gameduino 2 LCD

/*-----------------Functions---------------------------*/

/* Main program loop */
int main(void)
{
  xTaskCreate(            // create a task to write on the Gameduino 2 LCD
    TaskWriteLCD
    ,  (const portCHAR *)"WriteLCD"
    ,  128                // number of bytes for the task stack
    ,  NULL
    ,  3                  // priority of task (1 is highest priority, 4 lowest).
    ,  NULL );

  vTaskStartScheduler();  // now freeRTOS has taken over, and the scheduler is running
}

/*-----------------------------------------------------------*/
/* Tasks                                                     */
/*-----------------------------------------------------------*/

static void TaskWriteLCD(void *pvParameters) // A Task to write to Gameduino 2 LCD
{
 (void) pvParameters;

  FT_API_Boot_Config();     // initialise the Gameduino 2.
  FT_API_Touch_Config();    // initialise the FT800 Touch capability.

  for (uint8_t i = 0; i < NBLOBS; ++i)
  {
    blobs[i].x = OFFSCREEN;
    blobs[i].y = OFFSCREEN;
  }

  while(1)                  // a freeRTOS task should never return
  {
    static uint8_t blob_i;  // the blob we're currently processing
    uint32_t readTouch;     // xy coordinates of a touch are stored in uint32_t

    // this is the touch interface stuff
    readTouch = FT_GPU_HAL_Rd32(phost, REG_TOUCH_SCREEN_XY);// the screen location of the last touch is stored in REG_TOUCH_SCREEN_XY
    if (readTouch != NIL_TOUCH_XY)                          // if there was a touch
    {
      blobs[blob_i].x  = (int16_t)((readTouch >> 16) & 0xffff) << 4; // read where x axis touch occurred, and scale it
      blobs[blob_i].y = (int16_t)(readTouch & 0xffff) << 4; // read where y axis touch occurred, and scale it
    } else {
      blobs[blob_i].x = OFFSCREEN;         // if there was no touch, draw the blob OFFSCREEN
      blobs[blob_i].y = OFFSCREEN;
    }
    blob_i = (blob_i + 1) & (NBLOBS - 1);  // increment to the next blob for touch interaction

    // this is the display interface stuff
    FT_API_Write_CoCmd( CMD_DLSTART );      // initialise and start a display list (DL)

    FT_API_Write_CoCmd( CLEAR_COLOR_RGB(0xe0, 0xe0, 0xe0) );// set the colour to which the screen will be cleared
    FT_API_Write_CoCmd( CLEAR(1,1,1) );     // clear the screen

    FT_API_Write_CoCmd( BEGIN(POINTS) );    // start to write POINTS into the Display List (DL)

    for (uint8_t i = 0; i < NBLOBS; ++i)
    {
      // Blobs fade away and swell as they age
      FT_API_Write_CoCmd( COLOR_A(i << 1) ); // set an alpha transparency
      FT_API_Write_CoCmd( POINT_SIZE((1024 + 16) - (i << 3)) );

      // Random colour for each blob, keyed from (blob_i + i)
      uint8_t j = (blob_i + i) & (NBLOBS - 1);
      uint8_t r = j * 17;
      uint8_t g = j * 23;
      uint8_t b = j * 147;
      FT_API_Write_CoCmd( COLOR_RGB(r, g, b) );

      // Draw it!
      FT_API_Write_CoCmd( VERTEX2F(blobs[j].x, blobs[j].y) );
    }

    FT_API_Write_CoCmd( END() );            // finish writing POINTS into the active DL

    FT_API_Write_CoCmd( DISPLAY() );        // close the active Display List (DL)
    FT_API_Write_CoCmd( CMD_SWAP );         // Do a DL swap to render the just written DL
    }
}

I intend to build a few more demonstrations of the code, and to copy some games that James has already implemented, because I’m not a game designer.

Goldilocks Analogue – Prototyping

Last time I designed a Goldilocks board, it was because I was unhappy about the availability of a development platform that was within my reach; a tool to enable me to continue to learn about coding for micro-controllers.

This Goldilocks, let us call it Goldilocks Analogue, it is not about what I think is necessary, but more about what I’d like to have. The focus is not so much about the basics of SRAM and Flash, but much more on what functions I would like to have, and using my own means to get there.

Also, as the original Goldilocks is sold out, Freetronics are considering making their own version. Please add your wishes here.

Test results are in. Check out the detailed post on Goldilocks Analogue – Testing. Following the testing, I’ve redesigned the analogue output section to make it much more capable. It now support simultaneous AC and DC outputs, with an application specific headphone amplifier device to provide AC output, and high current OpAmp to provide DC output.

Background

The Goldilocks Project was specifically about getting the ATmega1284p MCU onto a format equivalent to the Arduino Uno R3. The main goal was to get more SRAM and Flash memory into the same physical footprint used by traditional Arduino (pre-R3) and latest release Uno R3 shields.

Goldilocks Arduino 1284p

Original – Goldilocks Version 1.1

I also tried to optimally use the co-processor ATmega32U2, (mis)utilised by Arduino purely for the USB-Serial functionality, by breaking out its pins, and creating a cross-connect between the two MCU to enable them to communicate via the SPI bus.

Whilst the Goldilocks achieved what it set out to do, there were some problems it created for itself.

Firstly, the ATmega family of devices is really very bad a generating correct USART baud rates when their main frequency doesn’t match a multiple of the standard USART rates. Engineers in the know select one of these primary clock rates (for example 14.7456MHz, 18.432MHz, or 22.1184MHz) when they’re planning on doing any real Serial communications. Unfortunately, the 16MHz clock rate chosen by the Arduino team generates about the worst USART timing errors possible.

This means that the Arduino devices can only work at 16MHz while programming them with the Serial Bootloader, otherwise programming is bound to fail, due to losing a bit or two due to the clock rate error.

Arduino had serial programming completely solved in the old days by using a real USB-USART chip, the FTDI FT232R, but for some reason they stopped doing the right thing. This might have been the perfect solution, but they abandoned it. Who knows why…

Secondly, although having an integrated uSD card cage on the platform is a great thing, using a resistor chain to do the voltage conversion is nominally a bit problematic. The output pins (SCK, MOSI, CS) are permanently loaded by 3k2 Ohm and an input pin (MISO) high signal generates only 0.66 of Vcc, which only just clears the minimum ATmega signal high level of 0.6 Vcc. Neither of these issues prevent the uSD card from working, and the voltage divider resistor chain takes almost no space on the board. But still it is not perfect.

Thirdly, there are some minor oversights in the V1 build that I would like to correct if possible.

New Directions

I’ve been toying with the idea of building an Xmega board, in Arduino Uno R3 format, because of the significantly enhanced I/O capabilities of this MCU including true DAC capabilities, but I’ve not followed up for two reasons; the Xmega has no history of use by hobbyists as there is with the ATmega devices, and it doesn’t bring any advantage that an ARM MCU wouldn’t otherwise do better and faster.

Never the less, the ATmega platform still lacks one thing that I believe is necessary; a high quality analogue capability. The world is analogue, and having an ADC capability, without having a corresponding DAC capability, is like having a real world recorder with no means to playback these real world recordings.

A major initiative of the Goldilocks is to bring an analogue capability to the Arduino platform. So this device will be called the Goldilocks Analogue.

Updated - Goldilocks Analogue

Updated – Goldilocks Analogue

There have been music shields and audio shields built before, and the design used is closely aligned to the original Adafruit Wave Shield, but I’ve not seen dual high quality DACs with both AC and DC capability, integrated onto the main board of an Arduino previously. So that’s where I’m going.

The goal is to be able to produce a DC referenced signal, from 0Hz up to around 100kHz, that can provide a binary-linear representative voltage (with sufficient current) to enable a control system, as well as to produce the highest quality audio, with very low noise and THD buffer amplifiers, that the basic AVR platform is capable of producing.

Using Eagle

I used to look at Eagle (Kicad, etc) with healthy scepticism. Yeah, not something that I’d be able to learn, but in the process of realising the Goldilocks Analogue, I have learned that it is far easier to learn a new skill than it is to guide someone in India or Malaysia, who doesn’t even get the start of what I want. The old idiom, if you want something done right, you’ve got to do it yourself.

There is a “Fremium” version of Eagle available, which is enough to get started. I’m going to try to get a “Hobbyist” version as soon as the paperwork is through.

So all this below is my first Eagle project.

The Schematic

I’ll talk through each item in the schematic, particularly those things which are novel in the Goldilocks Analogue. The schematics for the Goldilocks V1 can be found in the User Manual.

FT232R

The FT232R is the same device used in countless earlier Arduinos, such as the Duemilanove, and in USB-Serial adapters everywhere. The drivers for all major operating systems are widespread and there is no magic required. Importantly, the FT232R chip generates a real USART baud rate, at any speed from 300 baud to 3 Mbaud.

Unlike in the Duemilanove I’m using the FT232RQ chip, which is in the QFN package. There is too much going on to take up the board space with the larger package.

FT232RQ - Goldilocks Analogue

I’ve added a switch to disable the DTR Reset functionality of the Arduino and Wiring Bootloaders. Often, I would like a running device NOT to be reset by plugging the USB cable, but then I’ll be using the Goldilocks in another thing where I do want this to happen. Having a switch, like Seeed often do, is the best answer.

Also, I’ve added a 6 pin connector replicating the standard FTDI pin-out, to enable the FT232RQ to communicate with other devices, should this be necessary. It would be a shame to lock it into the board, with no option for extension.

uSD Buffer

In designing the buffer for the uSD, I was trying to achieve two things. Firstly, isolate the uSD card entirely from the SPI bus when it was not in use. By isolate, I mean over 1MOhm resistance. This isolation ensures that the uSD card doesn’t load up the SPI pins at all, when the uSD is not being used.

Secondly, I was trying to ensure that each end of the SPI bus receives the correct voltages and currents to ensure maximum throughput.

uSD Buffer - Goldilocks Analogue

The two devices selected achieve both goals as desired.

For the MCU to uSD direction (SCK, MOSI, and CS) I’m using a 74LVC125 in quad package. This package is tolerant of inputs at 5V rising above its Vcc of 3V3. The output enable on low, is connected to the Chip Select line, which means that the uSD card will not be driven unless the CS line is low. It always presents a high impedance to the MCU.

As a quad package the 74LVC125 has one spare gate, which can be used to drive the Arduino LED. This is neat no cost result that entirely removes any loading on Arduino Pin13.

For the uSD to MCU direction the buffer has to effectively produce a 5v CMOS high when receiving a 3V3 CMOS high. The best way to do this is to use a device that is TTL signal compatible. The TTL minimum high signal is only 2V, much lower than the CMOS minimum high signal of 2/3 of Vcc, and importantly below the worst case of 2/3 of 3V3 CMOS.

The only device I could find with the required characteristic of accepting TTL inputs with a low output enable, is the MC74VHC1GT125. I’m sure there are other options though.

DAC and Buffer

This is the fun stuff. Analogue… the real world. As noted above, the goal is to produce two binary-linear signals with enough buffering that they can drive a reasonable load (such as small headphones or an audio amplifier) and produce a constant voltage under a number of power supply options.

The inspiration for the circuit came from the Adafruit Wave Shield, but there are a number of significant improvements that are worth noting, not least the use of a dual DAC, for two channels of output.

DAC and Buffer - Goldilocks Analogue

Firstly, if you want to get a very low noise output, whilst using a high current Switch Mode Power Supply, it is necessary to filter the supply voltage. I’ve utilised the dual steps of an L-C primary filter, followed by a ferrite core bead secondary filter. I’m not sure whether this is all necessary, and I’ll be testing the circuit later with various components removed to check their efficacy in the role, but if they’re not designed in now they never will be added later.

I’m using the Microchip MCP4822 DAC to produce the raw output voltage. This is an SPI device which will be selected using the other “spare” Goldilocks digital pin PB1. Using PB1 to signal the DAC means that none of the Arduino R3 pins are used for on-board Goldilocks functions, and as both CS lines (PB0 and PB1) are tied high they will ensure that all these on-board devices stay off the SPI bus during system reboot.

The MCP4822 takes 16 bits to set a signal level, this is two SPI bus transactions. The maximum SPI rate is SCK/2. Therefore, if my Goldilocks is doing nothing else, it can generate 691,200 SPI transactions per second. If both DACs are being driven we can generate a square wave of 172,800Hz. This is an unreachable figure. More likely, the best case will be around 50kHz for both channels, or 100kHz if only one DAC is being used.

Optionally, the LDAC pin-out can be used to synchronise the transfer of digital inputs to the analogue output buffers across the two DACs or to a specific clock with low jitter.

Unlike the Adafruit solution, the MCP4822 generates its own internal 4.096V reference voltage Vref. This means that irrespective of whether the Goldilocks Analogue is being powered by a battery, by USB, or by the barrel connector and the SMPS, the output voltage for a particular digital input will be constant.

The op-amp configuration with dual op-amps, in a quad package, designed to double the current capability of the output, has raised concern from all who see it. Concern was my initial thought too. However after some research, I found it to be a recommended configuration for current doubling. The only difference to the Adafruit example circuit is to add low value output resistors which allow each op-amp to find its own offset level without consuming excess current.

I have added the option to bridge the output capacitors to provide a DC output. The output capacitors are necessary for audio use, as headphones or audio amplifier inputs require an AC connection, with no DC offset.

The Layout

It takes many hours to layout even a small board the size of an Arduino Uno. Luckily, I had a completed and fully functioning example to use as a platform, thanks to Jon’s prior work on Goldilocks V1.

The final prototype board layout is now done, and the board design sent off for manufacturing.

Goldilocks Analogue BoardIn this layout, I’ve been able to retain most of what makes a Goldilocks; the ATmega1284p, the complete dual rows of header pins arranged in pin-logical order 0-7, bridging of the I2C pins to A4/A5, JTAG, and a high current power supply. Added to this now are the three items described above; the FT232RQ and Reset switch, buffers for the uSD card, and the analogue platform.

Starting in the bottom left, the SMPS has been relaid to significantly shorten the high current paths around pins 2, 3, and 4. This will reduce the circuit noise, and taken togther with the effort to create solid ground planes, and specific AVcc filtering, will help to ensure the minimum of power supply noise in the analogue platform.

On the right we can see the uSD buffers, which have eaten into the prototyping space significantly. Although the signals will be much nicer than with a resistor bridge, the cost is clearly on space. If the Goldilocks Analogue ever goes into production the SOIC package buffer chip will be replaced by a QFN package, and some space should be recoverable.

Finally, the analogue platform is implemented in the top left of the board, to the left of the pin-outs for the analogue platform and the FTDI interface. Below the pin-outs the analogue supply voltage filtering is implemented, with exception to the chip decoupling capacitors which are tied directly to their supply pins.

Keeping the analogue lines as short, as balanced, as fat, and as well shielded as possible was a key focus of my design. There are a few USART lines running under the chips, but they are unlikely to produce noise as they are under the first ground plane.

Goldilocks Analogue TopThe top layer of the board is pretty crowded. Some tricks such as bridging my lines to get a solid ground plan under the crystal, were passed to me.

Goldilocks Analogue Route2The Route2 or second layer is the ground plane of the board. As such it needs to provide a stable and solid path for currents to return to the origin. I have been able to provide almost solid copper under the entire area from MCU to power supply, and also from the analogue platform back to the central ground point.

Goldilocks Analogue Route15In the Goldilocks Analogue (as in Goldilocks V1) the Route15 layer is wholely at 5v and is a massive supply line. I’ve used this layer to transport the 3v3 supply around the lower edge of the board, to provide power to the uSD card, and its input buffer. The other thick tracks are the USB input line and the analogue AVcc supply line.

Goldilocks Analogue BottomOn the back of the board, mirrored here, things look as we expect. The previously noted bridge capability for the I2C bus to A4/A5 is there, as is the capability to bridge the DAC A and DAC B output capacitors to enable DC output.

Next Steps

The Goldilocks Analogue prototype board design has been sent to Seeed Studio for conversion into a PCB. While this is happening I’ll be sourcing components to solder to the PCB. I think the next post will be on this stage of the process.

Well I have everything finished and in the interim, until I write a new post, here’s the photos of the final assembly of the prototype at Jon’s SuperHouse.

Goldilocks Analogue - 3Here Jon is assembling the first prototype, using several faulty Goldilocks v1.1 devices as donor boards. Only two components didn’t fit correctly, and we didn’t have a uSD card cage so that was left off.

Goldilocks Analogue - 1Out of the toaster oven, and final assembly finished. Just checking that the voltages are as expected across the board.

Well I’ve had it on the desk now for two nights, and I’m very impressed that it seems to generally meet the specification that was intended. The code for setting the DAC levels is currently only optimised for setting two values at a time. Specifically, it is not a streaming function. Never-the-less, it is possible to achieve the stated goal for both DAC channels. The actual number achieved is 108 kSamples/second, shown below, or 18.8us to transmit 2 samples on 2 channels.

The trace below shows the signals for both DACs at 0x0000, then both DACs set to 0x0FFF.

Goldilocks Analogue Max DAC Rate

Therefore, we’ll be able to achieve the 44.1kHz sample rate for CD audio, but only 12 bit resolution, with some time time to spare. If there is a need to read a uSD card, or do some other processing then it is likely that this rate will be more than halved, as the data would then need to to be read over the SPI bus (the same bus the DAC is using) for example. Also, there is a single pole filter between the DACs and the OpAmp buffer, with a 3dB cut-off frequency of 23kHz, which will limit the maximum output frequency but will help to reduce sampling alias issues.

Looking at the board from the top left the MCP4822 can be seen in the SIOC8 package, with the Burr Brown OPA4132 quad op-amp in a SOIC14 package just near the POWER selection jumper. The FTDI FT232RQ USART in QFN package takes up much less space than its FT232RL peer.

Goldilocks Analogue - Top Left

Goldilocks Analogue – Top Left

Now the prototype is finished, it is easy to see what needs to be improved. Actually there’s not too much wrong. The inductors for the Analogue Vcc have the wrong footprint, so they will need to be fixed. The inductor is too large for the footprint and is snuggled up to the POWER jumper, and the ferrite bead is somewhat too small. I didn’t source the very small 15 turn potentiometers, so they are just shorted out. As is the DTR (RESET) disable switch located near the USB connector. As a final issue, the footprint for the 1/8″ jack was wrong for the supplied connectors, so I’ve just added a short set of jumpers to achieve the same outcome.

Goldilocks Analogue - Bottom Right

Goldilocks Analogue – Bottom Right

Here is a short video demonstrating a Voltage Controlled Oscillator running at 44.1kHz sampling into dual channels. It sounds a little odd, because one of the channels is inverted, generating an out of phase effect.

Results

Well, things are good, and bad.

I’ve been testing the DAC stage and found (what I should have known) that I needed an output buffer op-amp able to reach the negative rail (0V) on input and output to support the MCP4822 0v to 4.095V ranging DAC. The OPA4132 exhibits noise and instability issues around 0.3V output.

Unfortunately the OPA4350 (rail to rail high current), which looks like it will be the right pin compatible device, costs over $10 each, which is nearly as expensive as the audiophile OPA4132 I specified previously.

There seems to be a pin compatible alternative, the TS924A, which is about $2 each, but it is several orders of magnitude worse in performance.

For Example: OPA4350 vs TS924A
Gain Bandwidth Product: 38MHz vs 4MHz
Slew Rate: 22V/μs vs 1.3V/μs
Total Harmonic Distortion: 0.0006% vs 0.005%

Is it worth the difference, when working with a 12 bit DAC in the presence of mV of power supply noise?
Personally, I doubt it.

Using my new Red Pitaya to analyse the output, with a 43.066Hz Sine wave (1024 samples at 44.1kHz) the noise floor is 70dB down from the signal ex DAC. It seems the DAC performs as advertised.

GoldilocksAnalogue43HzSineZoom

43.066Hz 12bit Sine wave, 1024 samples output at 44.1kHz.

More in part two of Goldilocks Analogue – Testing.

Ends.

avrdude 6.0.1, avr-gcc 4.8, and keestux Eclipse AVR Plugin 2.4.1

Always looking for the latest and greatest code for AVR, I scan the debian Sid repositories every few months for updated packages to use. Recently debian Sid included the latest gcc-avr 4.8 package, binutils-avr and the new avrdude 6.0.1 package. So I had to install them all to test.

Updated and Local

Updated and Local

The new avrdude 6.0.1 is the first release in two years, so it has a lot of good stuff, including a fix for the long standing chip delay bug affecting the AVRmega1284p used in the Goldilocks and Pololu SVP platforms.

Unfortunately, there is a new device format used in the avrdude.conf file which breaks the standard Eclipse AVR Plugin, rendering the MCU choice ineffective.

Asking a question on avrfreaks.net found the answer. keestux has fixed the problem and has released an interim AVR Plugin 2.4.1 which incorporates a fix for the change in avrdude.conf file format.

keestux Eclipse AVR Plugin 2.4.1 is at: http://www.ijzerbout.nl/avr-eclipse/updatesite

Add his update site to your software sources, and life is good again.

AVR Plugin 2.4.1

uIP on Wiznet W5200 versus W5100 on Goldilocks 1284p

I guess it is no secret, the reason why I’ve put so much effort into getting the Goldilocks 1284p board built. I was looking for a platform that would allow me to experiment with the uIP TCP/IP and UDP/IP stack with the most performance and flexibility possible while still being compatible with the huge range of sensors and actuator Shields that form the Arduino legacy. From the microprocessor view, the ATmega1284p used in the Goldilocks certainly achieves that goal.GOLDILOCKS-oblique_large

I’ve written in a previous post about the theoretical performance difference between the common Wiznet (or IINChip) W5100 used in almost all Arduino Ethernet shields and the component I have selected that uses the W5200 to provide the Ethernet interface. This post demonstrates the real world performance differential with a simple example.

Recently, I’ve been working with the W5500 on a ioShield-A.

But first, I am happy with the result of the uIP port to the Wiznet platform within freeRTOS. I’ve taken some of the old uIP v0.9 and v1.0 files from many sources, and updated them with the latest snapshot status from Contiki 2.7, to try to bring the last 5 years of experience into the result. Whilst the resulting codebase has not as yet been extensively tested, it seems to work as expected.

Results

This is a simple test, sending 1300 byte PING packets to the MACRAW interface on the IINChip to be handled by uIP. After 100 PINGs the W5200 takes on average 3.804 ms, whilst the W5200 takes on average 22.109 ms for each round trip.

This means the W5200 is nearly 6x faster than the W5100 in real world performance.

uIP_on_W5200 uIP_on_W5100

Of note, this real world result is achieved whilst over-clocking the W5100 SPI bus out of specification at 5.5MHz (being SCK/4), rather than at 4MHz which is the specification. The W5200 SPI bus can, of course, run up to 30MHz or faster, so its limits are not even being tested by the Goldilocks ATmega1284p MCU.

W5200 SPI bus

The key differential which provides the W5200 its performance advantage is the use of multi-byte burst transfer mode for moving payload data into and out-of its controlling MCU. In theory the entire 32 kByte Address space of the W5200 could be transferred in one transaction. In practice, a full Ethernet frame can be transferred in just over 1 ms.

These shots show how the W5200 SPI multi-byte transfer works in practice.

uIPW5200_ping_transfer

The W5200 supports multi-byte burst mode transfers on the SPI bus. This is a 1300 Byte PING frame transfer out of the W5200, and returned by the AVR.

This screenshot shows an entire received 1300 Byte payload PING frame being transferred in 1.34ms.

uIPW5200_ping_generation

The AVRmega1284p generates a PING response frame, and transfers it back to the W5200 in one burst mode transfer.

The Goldilocks AVR1284p takes 0.29ms to generate the response PING, and then it is transferred back to the W5200 for transmitting on the wire.

Detail of the burst mode multi-byte SPI transfer capability of the W5200.

Detail of the burst mode multi-byte SPI transfer capability of the W5200.

This screenshot shows the detail of the transmission of the PING frame to the AVRmega1284p. Note that each Byte takes less than 1 us to transfer.

W5100 SPI bus

The W5100 SPI bus uses a 4 byte transaction to transfer a single payload byte, and it is not capable of a multi-byte burst mode.

uIPW5100_ping_transfer_detail

The Wiznet 5100 uses a 4 byte protocol to transfer a single data payload byte.

This screenshot shows the detail of the transmission between the AVRmega1284p and the W5100. It shows that to transfer 1 payload byte it takes about 0.036 ms (which is 36 us or 36x longer than the equivalent transfer on the W5200).

Conclusion

P1040382If you’re planning on building anything that relies on wired Ethernet, then go out of your way to find a Elecrow W5200 Shield or Seeed W5200 Shield. It is about six times faster than the common W5100 in the real world testing, and has many other great features.

uIP works well on the Goldilocks and provides a great platform for developing TCP/IP and UDP/IP stack applications.

Next steps are to implement CoAP and MQTT clients on this platform, to increase my understanding of both of these important IoT protocols.

Code, as usual, on Sourceforge.

Wiznet W5200 Arduino Shield by Elecrow

Oh W5100, why you so slow?

For a long time the standard Arduino Ethernet Shield has been driven by the Wiznet W5100 Internet Processor. This shield and the chip upon which it is based forms the basis of just about every IP enabled networking project in the Arduino world.

The Wiznet W5100 chip has some interesting features, such as direct and indirect memory access, but it has some severe limitations in its SPI bus capabilities . Also, the W5100 can support only 4 ports within its hardware IPv4 engine. Unlimited software ports can be added, by providing your own IP stack in MACRAW mode using Port 0, but that is not the road well travelled.

There are two major issues with interfacing with the W5100. First, the SPI interface is only specified to run at 4MHz. And second, the SPI interface supports only a byte mode transmission.

The limitation in SPI rate to 4MHz means that the standard 16MHz Arduino board SPI bus cannot be driven at any speed greater than SCK/4, if it is to remain within specification for driving the W5100. 20MHz boards, such as the Goldilocks, it must drop to SCK/8 if they are to remain within specification.

Also, the W5100 byte mode transmission requires a 4 byte SPI bus transaction for each byte of data to be transferred into and out of the network interface.

Counting the (unachievable) theoretical best case rate for the W5100, it means that 4 * 8 * 4 = 128 system clocks elapse to transfer a single byte of data. Ugh! Slow.

What to do?

I guess Wiznet must have realised this performance issue (which is more apparent with more capable 32 bit MCUs which run at higher system clocks than the slow old 8 bit AVR ATmega range) and they’ve recently released the W5200 as a replacement (specific to SPI bus interfacing) for the W5100 chip.

Wiznet 5200

Wiznet 5200

The W5200 brings a number of new performance features to the game, based on the well known and understood IPv4 network engine of the W5100. The table below contrasts the two chips.

Comparison Table showing Wiznet W5100 vs W5200

Key features comparison W5200 vs W5100

The W5200 is a much smaller and simpler chip to locate on the board, and it is easier to solder for those interested in private SMD constructions. Importantly for networking performance, the W5200 has twice as much Tx/Rx buffer memory for IP packets, and supports 8 simultaneous hardware IP sockets. These features make the W5200 a great performance increment on the W5100, and already sufficient to make a switch. An example of the size of the two chips compared can be found below, with the Elecrow W5200 on the left and an old DF Robot W5100 v1.0 on the right.

Image

Elecrow W5200 and DF Robot W5100 v1

However, the greatest improvement in the W5200 lies in the area of the SPI bus interface. Wiznet has ditched the Direct addressing mechanisms (that took all the pins) on the W5100, and made the W5200 a SPI specialist, capable of running at up to 80MHz clock. That is a 20x increment.

Additionally, the W5200 supports SPI burst mode transmission. This means that up to the full Tx/Rx buffer (32kByte) could be read or written written in one transaction.

In the Arduino situation the W5200 can be driven at SCK/2, the maximum SPI speed achievable on an AVR ATmega MCU, and each byte takes one SPI byte to transfer. This means we can achieve a rate of 2 * 8 * 1 = 16 system clocks to transfer a byte of data.

This means the W5200 is 8x faster for the Arduino, and for Goldilocks 20MHz boards it will be 16x faster than the W5100 – fast as a leopard!

A practical analysis of the speed difference between the two Wiznet chips is here.

Easy to use.

The W5200 is easy to use, and easy to get.

Wiznet have provided some ready made W5200 driver files to include into the Arduino IDE. These replacement drivers for the existing W5100 driver files provided within the IDE just have to be substituted (or overwritten) to enable the slightly different SPI interfacing requirements of the W5200. They also provide C code drivers, which I used as a basis for my AVR freeRTOS code.

The Socket API provided by the W5100, and utilised by the Arduino IDE remains unchanged in the W5200. This means that it is only the performance enhanced SPI bus interface that needs to be rewritten to take advantage of the burst mode transmission, and the slightly different register locations associated with the increased Tx/Rx buffer and number of sockets available.

W5200 functional blocks

I was waiting for a long time for the W5200 to be put onto an Arduino compatible shield, so that I could use it easily. Suddenly, there are two on the market. One from W5200 Shield from Elecrow in China, and the other W5200 Shield from Wiznet.

I decided to purchase some of the Elecrow W5200 Shields. They looked to have a much better design than the Wiznet version, because Elecrow have utilised proper 5V to 3.3V buffers to ensure the safety of the on board uSD card, and have designed using the Arduino R3 standard.

The key and unique (afaik) feature of the Elecrow W5200 boards is the use of the lowered RJ45 jack, that allows the Ethernet shield to between other boards with no clearance problems. I have taken some pictures to show the difference between the standard RJ45 jack and the Elecrow W5200 board version, mounted on a Goldilocks board, and a standard Arduino Uno, with a LCD Touch Shield (even with under-slung SD Card cage) mounted over the top.

Image Image

Some small improvements.

I spent some time working with the Elecrow W5200, and have been in discussion with Richard and David (Tech Support) at Elecrow about the implementation. They have been very helpful in resolving some issues I have found in using their design.

Firstly, they have used quite a high resistance on the PWDN pin (which is intended to allow the W5200 to be powered down to reduce energy consumption). There is insufficient current on this resistor to hold ground, and sometimes the W5200 slips into PWDN mode and can’t be addressed. This can be solved by pulling jumper J2-2 to ground, or (permanently) by bridging Pin 1 and Pin 2 on U6 which is the buffer chip controlling the PWDN line. Check the schematics to see why this is so.

Secondly, the buffer chips used are driven from from 3.3V for Vcc. They are a little slow (100ns/V skew) at this supply voltage, and for the return data path on the MISO line they should properly be driven from 5V Vcc. At 5V Vcc the buffer chips are also much faster (20ns/V skew). The slower buffer chips, the LPF characteristic generated by the sensibly included output resistors, and the lower logic level compared to the AVR 5V TTL levels all combine to reduce the speed at which the SPI bus can work. Whilst the correct resolution is to drive the buffer chip at 5V Vcc for the inbound (AVR point of view) signal lines, I have found it is sufficient to remove and bridge the R24 resistor to achieve the SCK/2 SPI rate we desire.

This view of the Elecrow W5200 board shows the modifications in detail. I believe that later versions of the board will resolve these issues. And, with the Elecrow W5200 Shield’s unique recessed RF45 connector’s advantages and the speed of the W5200 MCU, all other sins are forgiven.

Elecrow W5200 showing R24 delete and U6 Pin 1-2 bridge.

Elecrow W5200 showing R24 delete and U6 Pin 1-2 bridge.

TL;DR

The Elecrow W5200 is a very speedy and easy to use alternative to the standard Arduino W5100 based solution. It is a great addition to my collection of IPv4 networking shields.

A practical analysis of the speed difference between the two Wiznet chips is here.

My code, as usual, on Sourceforge.

Goldilocks 1284p plus Elecrow W5200 Ethernet Shield

Goldilocks 1284p plus Elecrow W5200 Ethernet Shield

Making waves – Open Music Labs’ DSP Shield – Arduino – freeRTOS

There’s a great new Arduino Uno (pre-R3) Shield available from Open Music Labs. Their Audio Codec Shield is an Arduino shield that uses the Wolfson WM8731 codec. It is capable of sampling and reproducing audio up to 88kHz, 24bit stereo, but for use with the Arduino it is practically limited to 44kHz, 16bit stereo. The Audio Codec Shield has 1/8″ stereo input and headphone output jacks, a single pole analogue input aliasing filter, and 2 potentiometer for varying parameters in the program on the fly.

Open Music Labs WM8731

The Open Music Labs provides a some libraries and code examples for use with the Arduino IDE, and also with the Maple IDE. But, rather than just use the existing code, I thought it would be fun to develop some freeRTOS libraries from their basis code.

I spent quite some time understanding exactly how the WM8731 worked, and what was needed to make it perform, in a RTOS environment. It is clear, that to work at the audio rate of 44.1kHz, that the Arduino needs to be clocked by a hard interrupt, rather than by a soft timer. So, I spent some time designing and playing with different methods of driving the board.

Initially, I thought it would be good to limit the interrupt processing to constant clock in and clock out of data, that MUST happen every sample (at 44.1kHz) or the sound sampling or playback is simply broken, and allow the interrupt to semaphore a further processing task to wake it up. However, once I understood just how limited the time available is for processing, it became apparent that (at least for the 16MHz Arduino) there is no time left to muck about with a RTOS, and everything has to be kept as simple and regular as possible.

Never the less, the freeRTOS code is useful to provide serial and I2C libraries to set up the board, and possibly to do some other tasks where possible.

The resulting code consists of just one freeRTOS Task, that initialises the Shield, and then suspends itself indefinitely. The freeRTOS Scheduler keeps on running, but finding no available task will just pend itself until its next timer tick.

  AudioCodec_ADC_init();    // initialise the potentiometer sampling.
  AudioCodec_SPI_init();    // initialise the SPI bus for special purpose Audio Codec use.
  AudioCodec_init();        // initialise the Audio Codec using I2C bus.
  AudioCodec_Timer1_init(); // set up the sampling Timer1, runs at audio sampling rate.
  vTaskSuspend(NULL);       // well, we're pretty much done here...

First the Arduino ADC is initialised into free running mode, to provide inputs from the two potentiometers on the Shield. The Open Music Labs have provided an analysis of the Arduino ADC, and they show that the free running mode provides the lowest noise floor. Not that it is important to have a low noise floor for this purpose, as it is just potentiometer sampling. However, they missed the trick of using decimation to improve the sampling resolution, choosing instead to use a dead-band for the sampling. I’ve changed the ADC initialisation to do variable sample decimation, depending on the bit depth desired.

This is the example code for a single potentiometer.

static inline void AudioCodec_ADC(uint16_t* _mod0value)
{
  if (ADCSRA & (1 << ADIF))     // check if sample ready
  {
    _mod0temp += ADCW;  // fetch ADCL first to freeze sample is done by the compiler
    ADCSRA = 0xf7;      // reset the interrupt flag
    if (--_i == 0)      // check if enough samples have been collected
    {
      _mod0temp >>= DECIMATE;   // Decimate the summed samples
                                // (to get better accuracy), see AVR8003.doc
      *_mod0value = _mod0temp;  // move temp value to the output
      _mod0temp = 0x0000;       // reset temp value
      _i = _BV(2 * DECIMATE);   // reset loop counter
    }
  }
}

Then the SPI bus is configured to sample the data from the ADC on the WM8731, and to write data back to the DAC. Since we’re using the DSP interface, which is very similar to the SPI bus interface, with 16 bit transfers, the SPI mechanics can be used effectively, removing the need to bit-bang the interface. I found that although SPI Mode 0 nominally looks to be correct, it would lose the most significant bit of most transactions, being the left channel input values. I needed to use Mode 3 to get effective transactions.

The I2C bus is used on pins A4 and A5, which is pre-R3 format. I would digress to say that decision not to continue to support the SDA/SCL pins being available on A4 and A5 is a very bad one, in my opinion. There are many old, and this quite new, Shields that will simply be broken by this decision. Simply, bad for the Arduino legacy.

I have completed the register and pin definitions in the header file, to allow simple selection of the configuration, by adding the appropriate bit values into the register settings.

The initial I2C command transaction looks like this.

I2C command preamble

Here is a bit more detail on the DIGITAL_PATH_CONTROL command.

Vol I2C preamble detail

The true heart of the project lies within the use of Timer 1 to signal the 44.1kHz timing required to produce the sound samples. An interrupt driven by the Timer 1 counter signals the transfer of data, performing any audio processing required on the incoming data, and writing it to the output ready for the next transfer, and sampling the analogue potentiometers to use them as as mod inputs on the signal. The Timer 1 counter is incremented by counting the CLKOUT line coming from the Shield.

ISR(TIMER1_COMPA_vect)
{
  // WM8731 data transfer routine
  // move data from and to the WM8731 - done first for regularity (reduced jitter).
  AudioCodec_data(&left_in, &right_in, left_out, right_out);

  // audio processing routine - do processing on input - prepare output
  AudioCodec_dsp();

  // adc sampling routine
  // sampling the potentiometers (no sound here)
  AudioCodec_ADC(&mod0_value, &mod1_value);

  // end mark - check for end of interrupt - for debugging only
  PORTD |= _BV(PORTD6);   // Ping Audio Shield buffer line.
  PORTD &= ~_BV(PORTD6);
}

As I noted above, timing is everything. Based on the plots below, it takes exactly 6us for the AudioCodec_data() function to transfer the data from and to the WM8731. This doesn’t seem like very long, but to maintain a sample rate of 44.1kHz, each transaction must be completed in less than 22.7us, as shown below.

Vol 44kHz mid

The logic trace below shows the situation with the simplest AudioCodec_dsp() function available. Here the DSP processing is completed with over 15.7us to spare. The actual AudioCodec_data() function takes exactly 6us to complete (T1-T2), and can be used as a scale for other logic traces below.

inline void AudioCodec_dsp(void) // straight through connection I-O
{
  left_out  =  left_in; // put in to out on left channel
  right_out =  right_in; // put in to out on right channel
}

Simple IO Snapshot

Other more complicated routines, such as a sine-wave Voltage Controlled Oscillator (digital of course) take a little more time from our limited budget, needing 9.6us to complete.

VCO snapshot

I have used the same code on a Freetronics Eleven, an Arduino Uno clone, overclocked to 22.1184MHz, and as can be seen below, it results in the AudioCodec_data() function taking 4.33us (vs 6us standard) and the VCO code taking 6.125us (vs 9.6us standard). Whilst these savings are relatively small, by comparing the two logic traces, I think they do change the result enough to make it worthwhile for this application.

VCO Overclocked Arduino

Code is as usual on Sourceforge in avrfreeRTOS.

In further work, I will build some useful DSP programs from the examples provided, such as a reverb or flanger filter.

UPDATE

After reading this article on the MicroMonsterModular I’m going to play with adding some new sequences.

This WurstCaptures web site can help to build them quickly, and CounterComplex has a few ideas too.

output = t * (t >> (pot1>>4) | t >> (pot2>>4) )&((pot3>>3) + 16)
output = (t * (( t>>9 | t>>13 ) & 15)) & 129
output = (t * (t>>8 + t>>9)*100) + sin(t)
output = t * (((t>>12)|(t>>8))&(63&(t>>4)))

ArduSat SD Card Prototyping

Since my last post on the ArduSat and the idea I had to use the Supervisor node, an ATmega2561, as the core of a centralised eXtended RAM system for the Client nodes, ATmega328p “Arduino” devices, I’ve been thinking and working on a solution for building a centralised non-volatile SD Card based storage solution.

With design, sometimes it is necessary to let an idea stew for a while before the right answer just sort of distils out of the soup. For the solution for this problem, this was the case. There was some thinking space required…

thinking space

The Question

There are 16 Client nodes in the ArduSat platform. Each and any of them may wish to use the central SD Card to store information at the same, or at different times. How would it be possible to allow more than 16 files to be open on the one SD Card (connected to the Supervisor node) whilst maintaining consistency in the file system? How would access to the file system be scheduled?

The Tools

I have been using the ChaN FatFs file system libraries now for some time. They are fully featured and have a very clean design, fully separating the file system layer from the underlying physical media access layer (the drivers). This means that the file system tools can be implemented on many different architectures, with only changes to the driver layer (DiskIO) needed for each platform.

The Thought Process

My initial thought was that the Supervisor node should maintain the file system, and that I should write packaging for the FatFs file system commands to allow them to be remotely implemented across the SPI bus, in a similar manner as described in the XRAMFS post.

The idea of writing these “remote controls” for the file system commands was scary, as I recognised that there are 33 commands in the interface, and each of them has their own characteristics. Also, maintaining these interfaces would likely be problematic, as I would have to test each command extensively to ensure that there were no “thick thumb” errors introduced into the stable and proven FatFs library.

Some weeks passed…

Then at about 3am, I realised that the right answer was to write a “shim” between the standard FatF file system commands and the standard physical media drivers, and to have this shim operate across the SPI bus in exactly the same manner as the XRAMFS solution.

So, I wrote it.

The Solution

The solution separates the ChaN libraries into two parts. The file system part is resident on the Client node. Each Client node maintains its own view of the file system on the Supervisor SD Card. As the ChaN FatFs library is written for low memory devices, the file directory tree is refreshed each time a change in the working file is done. The Supervisor node only does the DiskIO under the command of each of the Clients.

There are only 5 relevant driver layer DiskIO commands. These commands are used in the Supervisor node to execute requests sent over the SPI bus from the individual Clients. Since there are only a small number of commands, and they are static and dependent on the architecture of the machine they’re running on, their functionality is quite constant. The Supervisor has no knowledge of the file system at all. It simply implements DiskIO commands on sectors of the SD Card as requested, one a time, as requested by Clients.

The Supervisor implementation simply expands on the existing Task loop established for the XRAMFS system, by adding in the 5 additional DiskIO commands. The added complexity, that the SD Card is accessed over the SAME SPI bus as the communications between Client and Supervisor, means that I had to introduce an interim “Pending” state for commands to allow the Client to wait for confirmation that a task has been completed or, in the case of disk_read or disk_ioctl, to recover the waiting data from the Supervisor.

The Client implementation inserts different shim DiskIO commands for the FatF system to call. These commands use the SPI bus to call the Supervisor, and enter a request. Some commands return immediately, allowing the Supervisor to continue with the command, once the command and any required data has been transferred. Other commands wait until they can retrieve information from the Supervisor, before returning to the FatF file system layer of the library.

In this solution, the XRAMFS was instrumental in simplifying the transfer of information. The exclusive availability of 16kB of RAM for each Client meant that disk_write or disk_read commands could cache their data in XRAMFS whilst it was actually written to or read from the SDCard. Because the RAM is available exclusively, there is no consideration that another Client may overwrite the results of a command, or that memory exhaustion may corrupt data.

The code is available at Sourceforge in the usual location.

How does it work?

When a Client program calls one of the FatFs library commands, it in turn calls one of the special ArduSat SPI DiskIO shim routines. These routines signal the Supervisor in the normal manner, and transfer any data associated with the command into the Page of XRAMFS assigned to the Client.

The Supervisor will then undertake the standard DiskIO command, retaining the result of the command and any data resulting from the command in XRAMFS.

Both Client DiskIO routines, and the Task running in the Supervisor are aware of the “Pending” state, which is where a DiskIO command has been completed on the Supervisor and there is data waiting in the XRAMFS for the Client to recover.

Once the Client DiskIO command completes, it returns the normal interface information to the calling FatFs command.

Here a monitor program on a Client is initialising the SD Card. If the Supervisor notices that the SD Card is not initialised, it will return Error, and then undertake to initialise the card. The second call for initialisation will then be successful. This decoupling method ensures that Clients cannot reinitialise the card, whilst other Clients may be using the Card.

The file system (on the Client) is then initialised Then, the SD Card status is read. Finally, the current working directory is read and printed.

Initialisation

In this screenshot, a file is opened for reading, and the file pointer set to the start of the file. A dump of the first 64 Bytes of the file is read and printed. Then the file is closed.

readfile

Here, the same file as above is opened for writing, and 45 bytes of 0x10 (16) are being written. The result is checked by opening the file for reading, and dumping the relevant bytes to the screen. Success!

writefile

Issues

The Client (Arduino) ATmega328p has so little Flash and RAM that implementing the FatFs consumes a significant proportion of the available resources. From the ChaN FatFs web site, at least 13 kByte of Flash (of 32 kByte on the Arduino), and 600 Bytes of RAM (of 2048 Bytes on the Arduino) are consumed by the library alone. This is excluding the working buffers necessary to prepare or process data for storage.

I was unable to fully test the FatFs solution, because of RAM and Flash limitations. I simply couldn’t turn on all the features. However, I have some confidence that the solution fully works, because the actual FatFs library is unchanged from the working solution that I’ve tested on the Arduino Mega platform. It is only the DiskIO routines that have been tampered with, and since they produce reliable results for some of the FatFs functions, there is every reason to believe they would work for all of the functions.

Thank you

Jon for providing a new Freetronics EtherMega, so that I could complete the prototyping work.

ArduSat and NanoSatisfi for running a great project, which inspired this thought process. Possibly, this work might be useful for one of the launches over the coming years.